https://isocpp.org/files/papers/P1859R0.html is my attempt at disentangling the wording around character sets and encodings. Since the values of literals are self-evidently fixed at translation time, any interpretation that involves changing the values of a literal based on the current locale does not make sense. I believe that the intent of lex.charset/3 was to use the locale specified for the compiler to produce the values of literals when encoding from the internal representation of characters. I'm asking that that be termed "{narrow,wide} literal encoding", as opposed the the "dynamic encoding" controlled by the conversion facet of the currently set locale.

The interpretation in the standard seems to vary considerably. Fortunately there is not, I believe, implementation divergence.

On Mon, Nov 4, 2019 at 9:33 AM Jean-Marc Bourguet <jm@bourguet.org> wrote:

On 04.11.2019 09:45, Tom Honermann wrote:
> On 11/4/19 7:18 AM, Howard Hinnant wrote:
>> On Nov 4, 2019, at 12:27 AM, Tom Honermann <tom@honermann.net> wrote:
>>> I suggest the following wording: (using terminology from P1859R0)
>>>
>>> If Period::type is micro, but the character U+00B5 <del>cannot be
>>> represented in the encoding used</del><ins>lacks representation in
>>> the execution character set</ins> for charT, the unit suffix "us" is
>>> used instead of "μs". <ins>If
>>> "μs" is used but the dynamic encoding lacks representation for U+00B5
>>> and the stream is associated with a terminal or console, or if the
>>> stream is imbued with a std::codecvt facet that lacks conversion
>>> support for the character, then the result is unspecified.</ins>
>>>
>> I’ve no objection to an issue, but your proposed wording explicitly
>> involves two things I’m strongly against:
>>
>> 1. Now the code has to check the locale, for this precision only.
>>
>> 2. Now the code has different behavior between cout and
>> ostringstream. And the result of ostringstream is very commonly
>> subsequently sent to cout (ostringstream is a common formatting aid).
>>
>> Imo, the proposed wording is much, much worse than the status-quo and
>> I would vote strongly against it.
>
> No, the wording I proposed doesn't check for locale. The execution
> character set is the character set used for string literals and is
> known
> at compile time; it is not the locale dependent run-time character set.

lex.charset/3 states

The values of the members of the execution character sets and the
sets of additional members are locale-specific.

apparently making the execution character sets run-time dependent.

But lex.ccon/2 states

An ordinary character literal that contains a single c-char
representable in the execution character set has type char, with value
equal to the numerical value of the encoding of the c-char in the
execution character set.

apparently making it fixed.

I've not looked at that more in-depth to see which interpretation is the
more pervasive.

Yours,

-- Jean-Marc Bourguet
_______________________________________________
SG16 Unicode mailing list
Unicode@isocpp.open-std.org
http://www.open-std.org/mailman/listinfo/unicode