C++ Logo


Advanced search

Re: [SG16-Unicode] Hidden locale dependency in [time.duration.io]?

From: Steve Downey <sdowney_at_[hidden]>
Date: Mon, 4 Nov 2019 09:50:41 +0000
https://isocpp.org/files/papers/P1859R0.html is my attempt at disentangling
the wording around character sets and encodings. Since the values of
literals are self-evidently fixed at translation time, any interpretation
that involves changing the values of a literal based on the current locale
does not make sense. I believe that the intent of lex.charset/3 was to use
the locale specified for the compiler to produce the values of literals
when encoding from the internal representation of characters. I'm asking
that that be termed "{narrow,wide} literal encoding", as opposed the the
"dynamic encoding" controlled by the conversion facet of the currently set

The interpretation in the standard seems to vary considerably. Fortunately
there is not, I believe, implementation divergence.

On Mon, Nov 4, 2019 at 9:33 AM Jean-Marc Bourguet <jm_at_[hidden]> wrote:

> On 04.11.2019 09:45, Tom Honermann wrote:
> > On 11/4/19 7:18 AM, Howard Hinnant wrote:
> >> On Nov 4, 2019, at 12:27 AM, Tom Honermann <tom_at_[hidden]> wrote:
> >>> I suggest the following wording: (using terminology from P1859R0)
> >>>
> >>> If Period​::​type is micro, but the character U+00B5 <del>cannot be
> >>> represented in the encoding used</del><ins>lacks representation in
> >>> the execution character set</ins> for charT, the unit suffix "us" is
> >>> used instead of "μs". <ins>If
> >>> "μs" is used but the dynamic encoding lacks representation for U+00B5
> >>> and the stream is associated with a terminal or console, or if the
> >>> stream is imbued with a std::codecvt facet that lacks conversion
> >>> support for the character, then the result is unspecified.</ins>
> >>>
> >> I’ve no objection to an issue, but your proposed wording explicitly
> >> involves two things I’m strongly against:
> >>
> >> 1. Now the code has to check the locale, for this precision only.
> >>
> >> 2. Now the code has different behavior between cout and
> >> ostringstream. And the result of ostringstream is very commonly
> >> subsequently sent to cout (ostringstream is a common formatting aid).
> >>
> >> Imo, the proposed wording is much, much worse than the status-quo and
> >> I would vote strongly against it.
> >
> > No, the wording I proposed doesn't check for locale. The execution
> > character set is the character set used for string literals and is
> > known
> > at compile time; it is not the locale dependent run-time character set.
> lex.charset/3 states
> The values of the members of the execution character sets and the
> sets of additional members are locale-specific.
> apparently making the execution character sets run-time dependent.
> But lex.ccon/2 states
> An ordinary character literal that contains a single c-char
> representable in the execution character set has type char, with value
> equal to the numerical value of the encoding of the c-char in the
> execution character set.
> apparently making it fixed.
> I've not looked at that more in-depth to see which interpretation is the
> more pervasive.
> Yours,
> -- Jean-Marc Bourguet
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode

Received on 2019-11-04 10:50:55