C++ Logo

sg16

Advanced search

Re: QoI for escaped formatting of non-Unicode-encoding strings: deployment overhead versus ideal behaviour

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Thu, 13 Jul 2023 09:59:42 +0200
Hey Hubert, I sadly missed the last meeting, which I'm guessing provides
some necessary context here.
I'll try to answer some of these things in a vacuum anyway, i like to live
dangerously!

Generally, I'd be rather concerned about the need to reason about literal
encoding at runtime.
Without a way to track in the type system whether a string was produced at
runtime or at compile time (including those who are produced by constant
evaluation),
(which i think we actively do not want as, in addition of the added
complexity, we, i think, prefer not have constant evaluation behave
differently from the runtime),
It seems generally impossible to do.

In which case do you think the distinction matters?
I realize that it is a situation that can occur in practice, and which we
can diagnose... but not necessarily do more about it in the general case.

> 2. The set of characters considered separators or non-printable characters

Properties of characters are agnostic of their encoding, and I'd be rather
opposed to having that tied to locale or encoding. SPACE is always going to
be a whitespace whether in UTF-8
or EUC-KR, so we really only need properties for unicode codepoints. Which
is good as other character sets are usually defined in terms of glyphs that
do not have properties attached to them.

> What are people's thoughts on POSIX locale, ICU, or iconv dependencies
from C++ standard libraries as the way to support non-Unicode encodings?

If we want to support generalized conversions from and to encodings that
are outside of the current set (narrow and wide execution, utf-N), the use
of a library by an implementation is going to be unavoidable, but i don't
think we can rely on specific one, so we can't mandate a set of supported
encodings or a specific mapping. I have no opinion on whether we should.

On Thu, Jul 13, 2023 at 1:38 AM Hubert Tong via SG16 <sg16_at_[hidden]>
wrote:

> Hi SG 16:
>
> When escaping strings (an operation likely done at runtime), some
> information about the literal encoding (a property of the compilation
> environment) is needed.
>
> For "ideal behaviour", it seems to me that the ability to hardcode/capture
> at compile time/deploy with the runtime is needed for the following:
> 1. Understanding of the encoding scheme (e.g., valid initial code units,
> valid continuation code units, etc.)
> 2. The set of characters considered separators or non-printable characters
>
> It seems to me that (1) is going to need some database of encodings
> already.
>
> Additionally, I am not sure that the policy chosen for unassigned
> codepoints should be the same between Unicode and non-Unicode encodings.
>
> Is my analysis reasonable? What are people's thoughts on POSIX locale,
> ICU, or iconv dependencies from C++ standard libraries as the way to
> support non-Unicode encodings? Since the specified formatting operation
> "cannot fail", what is the story when the underlying runtime environment
> lacks support for the literal encoding (violation of implementation-defined
> limits due to invalid runtime environment setup)?
>
> The alternative (for non-Unicode encodings) seems to be "handle code units
> that match the encoding of a member of the basic character set, numeric
> escape everything else".
>
> -- HT
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2023-07-13 07:59:56