sg16: Re: [SG16-Unicode] Hidden locale dependency in [time.duration.io]?

From: Tom Honermann <tom_at_[hidden]>
Date: Mon, 4 Nov 2019 09:56:13 +0000

On 11/4/19 8:57 AM, Jean-Marc Bourguet wrote:
> On 04.11.2019 09:45, Tom Honermann wrote:
>> On 11/4/19 7:18 AM, Howard Hinnant wrote:
>>> On Nov 4, 2019, at 12:27 AM, Tom Honermann <tom_at_[hidden]> wrote:
>>>> I suggest the following wording: (using terminology from P1859R0)
>>>>
>>>> If Period::type is micro, but the character U+00B5 <del>cannot be
>>>> represented in the encoding used</del><ins>lacks representation in
>>>> the execution character set</ins> for charT, the unit suffix "us"
>>>> is used instead of "μs". <ins>If
>>>> "μs" is used but the dynamic encoding lacks representation for
>>>> U+00B5 and the stream is associated with a terminal or console, or
>>>> if the stream is imbued with a std::codecvt facet that lacks
>>>> conversion support for the character, then the result is
>>>> unspecified.</ins>
>>>>
>>> I’ve no objection to an issue, but your proposed wording explicitly
>>> involves two things I’m strongly against:
>>>
>>> 1. Now the code has to check the locale, for this precision only.
>>>
>>> 2. Now the code has different behavior between cout and
>>> ostringstream. And the result of ostringstream is very commonly
>>> subsequently sent to cout (ostringstream is a common formatting aid).
>>>
>>> Imo, the proposed wording is much, much worse than the status-quo
>>> and I would vote strongly against it.
>>
>> No, the wording I proposed doesn't check for locale. The execution
>> character set is the character set used for string literals and is known
>> at compile time; it is not the locale dependent run-time character set.
>
> lex.charset/3 states
>
> The values of the members of the execution character sets and the
> sets of additional members are locale-specific.
>
> apparently making the execution character sets run-time dependent.

There are two ways to interpret this:

1) The members of the execution character set are dependent on the
locale that the compiler is running with. This is the case for the
Microsoft compiler. For example, when running the Microsoft compiler on
a Windows system configured for the US region, the default execution
character set is Windows-1252. This can be overridden with the
/execution-charset option.

2) The members of the execution character set are dependent on the
locale that the program runs with. If "execution character set" is read
to govern the behavior of the character classification functions such as
std::tolower, then this is true. The C++ standard mostly defers to the C
standard for these functions. The C standard uses terminology like
"locale-specific".

We need to clean up our terminology at some point.

>
> But lex.ccon/2 states
>
> An ordinary character literal that contains a single c-char
> representable in the execution character set has type char, with value
> equal to the numerical value of the encoding of the c-char in the
> execution character set.
>
> apparently making it fixed.
>
> I've not looked at that more in-depth to see which interpretation is
> the more pervasive.

I have looked on several occasions and the best I can say is that we
have wording work to do; this is one of the goals of P1859R0. I *think*
the standard is mostly consistent with use of the term "execution
character set" referring to the character set that governs string
literals (which obviously can't be dependent on the locale a compiled
program runs with). In cases where the standard refers to locale
dependencies, it generally explicitly states the locale dependency, or
uses language like "extended character set" or, as above, "additional
members".

Tom.

>
> Yours,
>
> -- Jean-Marc Bourguet

Received on 2019-11-04 10:56:19