On Sun, Jul 16, 2023 at 10:29 AM Victor Zverovich <victor.zverovich@gmail.com> wrote:

Failures and unsupported encodings can always be handled by falling back on overescaping.

Thanks. That is useful clarification in terms of expectations (I did understand that, conformance-wise, that is allowed).

> Additionally, I am not sure that the policy chosen for unassigned codepoints should be the same between Unicode and non-Unicode encodings.

Could you elaborate?

My statement arose because I was thinking that Unicode can assign codepoints in the future (and other encodings, not so much); however, I failed to account for pre-existing extensions for various non-Unicode encodings having differences in the assigned area of the codespace (and for user-defined cases).

My understanding of the status quo is that the intent is for unassigned codepoints to be passed through unescaped.

Cheers,
Victor

On Wed, Jul 12, 2023 at 4:38 PM Hubert Tong via SG16 <sg16@lists.isocpp.org> wrote:
Hi SG 16:

When escaping strings (an operation likely done at runtime), some information about the literal encoding (a property of the compilation environment) is needed.

For "ideal behaviour", it seems to me that the ability to hardcode/capture at compile time/deploy with the runtime is needed for the following:
1. Understanding of the encoding scheme (e.g., valid initial code units, valid continuation code units, etc.)
2. The set of characters considered separators or non-printable characters

It seems to me that (1) is going to need some database of encodings already.

Additionally, I am not sure that the policy chosen for unassigned codepoints should be the same between Unicode and non-Unicode encodings.

Is my analysis reasonable? What are people's thoughts on POSIX locale, ICU, or iconv dependencies from C++ standard libraries as the way to support non-Unicode encodings? Since the specified formatting operation "cannot fail", what is the story when the underlying runtime environment lacks support for the literal encoding (violation of implementation-defined limits due to invalid runtime environment setup)?

The alternative (for non-Unicode encodings) seems to be "handle code units that match the encoding of a member of the basic character set, numeric escape everything else".

-- HT
--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16