Date: Mon, 4 Nov 2019 09:50:41 +0000
https://isocpp.org/files/papers/P1859R0.html is my attempt at disentangling
the wording around character sets and encodings. Since the values of
literals are self-evidently fixed at translation time, any interpretation
that involves changing the values of a literal based on the current locale
does not make sense. I believe that the intent of lex.charset/3 was to use
the locale specified for the compiler to produce the values of literals
when encoding from the internal representation of characters. I'm asking
that that be termed "{narrow,wide} literal encoding", as opposed the the
"dynamic encoding" controlled by the conversion facet of the currently set
locale.
The interpretation in the standard seems to vary considerably. Fortunately
there is not, I believe, implementation divergence.
On Mon, Nov 4, 2019 at 9:33 AM Jean-Marc Bourguet <jm_at_[hidden]> wrote:
> On 04.11.2019 09:45, Tom Honermann wrote:
> > On 11/4/19 7:18 AM, Howard Hinnant wrote:
> >> On Nov 4, 2019, at 12:27 AM, Tom Honermann <tom_at_[hidden]> wrote:
> >>> I suggest the following wording: (using terminology from P1859R0)
> >>>
> >>> If Period::type is micro, but the character U+00B5 <del>cannot be
> >>> represented in the encoding used</del><ins>lacks representation in
> >>> the execution character set</ins> for charT, the unit suffix "us" is
> >>> used instead of "μs". <ins>If
> >>> "μs" is used but the dynamic encoding lacks representation for U+00B5
> >>> and the stream is associated with a terminal or console, or if the
> >>> stream is imbued with a std::codecvt facet that lacks conversion
> >>> support for the character, then the result is unspecified.</ins>
> >>>
> >> I’ve no objection to an issue, but your proposed wording explicitly
> >> involves two things I’m strongly against:
> >>
> >> 1. Now the code has to check the locale, for this precision only.
> >>
> >> 2. Now the code has different behavior between cout and
> >> ostringstream. And the result of ostringstream is very commonly
> >> subsequently sent to cout (ostringstream is a common formatting aid).
> >>
> >> Imo, the proposed wording is much, much worse than the status-quo and
> >> I would vote strongly against it.
> >
> > No, the wording I proposed doesn't check for locale. The execution
> > character set is the character set used for string literals and is
> > known
> > at compile time; it is not the locale dependent run-time character set.
>
> lex.charset/3 states
>
> The values of the members of the execution character sets and the
> sets of additional members are locale-specific.
>
> apparently making the execution character sets run-time dependent.
>
> But lex.ccon/2 states
>
> An ordinary character literal that contains a single c-char
> representable in the execution character set has type char, with value
> equal to the numerical value of the encoding of the c-char in the
> execution character set.
>
> apparently making it fixed.
>
> I've not looked at that more in-depth to see which interpretation is the
> more pervasive.
>
> Yours,
>
> -- Jean-Marc Bourguet
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode
>
the wording around character sets and encodings. Since the values of
literals are self-evidently fixed at translation time, any interpretation
that involves changing the values of a literal based on the current locale
does not make sense. I believe that the intent of lex.charset/3 was to use
the locale specified for the compiler to produce the values of literals
when encoding from the internal representation of characters. I'm asking
that that be termed "{narrow,wide} literal encoding", as opposed the the
"dynamic encoding" controlled by the conversion facet of the currently set
locale.
The interpretation in the standard seems to vary considerably. Fortunately
there is not, I believe, implementation divergence.
On Mon, Nov 4, 2019 at 9:33 AM Jean-Marc Bourguet <jm_at_[hidden]> wrote:
> On 04.11.2019 09:45, Tom Honermann wrote:
> > On 11/4/19 7:18 AM, Howard Hinnant wrote:
> >> On Nov 4, 2019, at 12:27 AM, Tom Honermann <tom_at_[hidden]> wrote:
> >>> I suggest the following wording: (using terminology from P1859R0)
> >>>
> >>> If Period::type is micro, but the character U+00B5 <del>cannot be
> >>> represented in the encoding used</del><ins>lacks representation in
> >>> the execution character set</ins> for charT, the unit suffix "us" is
> >>> used instead of "μs". <ins>If
> >>> "μs" is used but the dynamic encoding lacks representation for U+00B5
> >>> and the stream is associated with a terminal or console, or if the
> >>> stream is imbued with a std::codecvt facet that lacks conversion
> >>> support for the character, then the result is unspecified.</ins>
> >>>
> >> I’ve no objection to an issue, but your proposed wording explicitly
> >> involves two things I’m strongly against:
> >>
> >> 1. Now the code has to check the locale, for this precision only.
> >>
> >> 2. Now the code has different behavior between cout and
> >> ostringstream. And the result of ostringstream is very commonly
> >> subsequently sent to cout (ostringstream is a common formatting aid).
> >>
> >> Imo, the proposed wording is much, much worse than the status-quo and
> >> I would vote strongly against it.
> >
> > No, the wording I proposed doesn't check for locale. The execution
> > character set is the character set used for string literals and is
> > known
> > at compile time; it is not the locale dependent run-time character set.
>
> lex.charset/3 states
>
> The values of the members of the execution character sets and the
> sets of additional members are locale-specific.
>
> apparently making the execution character sets run-time dependent.
>
> But lex.ccon/2 states
>
> An ordinary character literal that contains a single c-char
> representable in the execution character set has type char, with value
> equal to the numerical value of the encoding of the c-char in the
> execution character set.
>
> apparently making it fixed.
>
> I've not looked at that more in-depth to see which interpretation is the
> more pervasive.
>
> Yours,
>
> -- Jean-Marc Bourguet
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode
>
Received on 2019-11-04 10:50:55
