C++ Logo

sg16

Advanced search

QoI for escaped formatting of non-Unicode-encoding strings: deployment overhead versus ideal behaviour

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Wed, 12 Jul 2023 19:38:21 -0400
Hi SG 16:

When escaping strings (an operation likely done at runtime), some
information about the literal encoding (a property of the compilation
environment) is needed.

For "ideal behaviour", it seems to me that the ability to hardcode/capture
at compile time/deploy with the runtime is needed for the following:
1. Understanding of the encoding scheme (e.g., valid initial code units,
valid continuation code units, etc.)
2. The set of characters considered separators or non-printable characters

It seems to me that (1) is going to need some database of encodings already.

Additionally, I am not sure that the policy chosen for unassigned
codepoints should be the same between Unicode and non-Unicode encodings.

Is my analysis reasonable? What are people's thoughts on POSIX locale, ICU,
or iconv dependencies from C++ standard libraries as the way to support
non-Unicode encodings? Since the specified formatting operation "cannot
fail", what is the story when the underlying runtime environment lacks
support for the literal encoding (violation of implementation-defined
limits due to invalid runtime environment setup)?

The alternative (for non-Unicode encodings) seems to be "handle code units
that match the encoding of a member of the basic character set, numeric
escape everything else".

-- HT

Received on 2023-07-12 23:38:50