Hey Hubert, I sadly missed the last meeting, which I'm guessing provides some necessary context here.

I'll try to answer some of these things in a vacuum anyway, i like to live dangerously!

Generally, I'd be rather concerned about the need to reason about literal encoding at runtime.

Without a way to track in the type system whether a string was produced at runtime or at compile time (including those who are produced by constant evaluation),

(which i think we actively do not want as, in addition of the added complexity, we, i think, prefer not have constant evaluation behave differently from the runtime),

It seems generally impossible to do.

In which case do you think the distinction matters?

I realize that it is a situation that can occur in practice, and which we can diagnose... but not necessarily do more about it in the general case.

> 2. The set of characters considered separators or non-printable characters

Properties of characters are agnostic of their encoding, and I'd be rather opposed to having that tied to locale or encoding. SPACE is always going to be a whitespace whether in UTF-8

or EUC-KR, so we really only need properties for unicode codepoints. Which is good as other character sets are usually defined in terms of glyphs that do not have properties attached to them.

> What are people's thoughts on POSIX locale, ICU, or iconv dependencies from C++ standard libraries as the way to support non-Unicode encodings?

If we want to support generalized conversions from and to encodings that are outside of the current set (narrow and wide execution, utf-N), the use of a library by an implementation is going to be unavoidable, but i don't think we can rely on specific one, so we can't mandate a set of supported encodings or a specific mapping. I have no opinion on whether we should.

On Thu, Jul 13, 2023 at 1:38 AM Hubert Tong via SG16 <sg16@lists.isocpp.org> wrote:

Hi SG 16:

When escaping strings (an operation likely done at runtime), some information about the literal encoding (a property of the compilation environment) is needed.

For "ideal behaviour", it seems to me that the ability to hardcode/capture at compile time/deploy with the runtime is needed for the following:
1. Understanding of the encoding scheme (e.g., valid initial code units, valid continuation code units, etc.)
2. The set of characters considered separators or non-printable characters

It seems to me that (1) is going to need some database of encodings already.

Additionally, I am not sure that the policy chosen for unassigned codepoints should be the same between Unicode and non-Unicode encodings.

Is my analysis reasonable? What are people's thoughts on POSIX locale, ICU, or iconv dependencies from C++ standard libraries as the way to support non-Unicode encodings? Since the specified formatting operation "cannot fail", what is the story when the underlying runtime environment lacks support for the literal encoding (violation of implementation-defined limits due to invalid runtime environment setup)?

The alternative (for non-Unicode encodings) seems to be "handle code units that match the encoding of a member of the basic character set, numeric escape everything else".

-- HT
--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16