C++ Logo

sg16

Advanced search

Re: CWG 2779: Restrictions on the ordinary literal encoding

From: Jens Maurer <jens.maurer_at_[hidden]>
Date: Mon, 31 Jul 2023 19:57:06 +0200
On 31/07/2023 16.00, Corentin Jabot via SG16 wrote:
> I think this is NaD
>
> These two sentences seem clear enough to me
> https://eel.is/c++draft/lex.charset#8.sentence-1 <https://eel.is/c++draft/lex.charset#8.sentence-1>
> https://eel.is/c++draft/lex.charset#9.sentence-6 <https://eel.is/c++draft/lex.charset#9.sentence-6>

It doesn't exactly say that the code units for the ordinary literal encoding
are of char type; they might be of type (say) char32_t (and are then converted
to char in a lossy manner).

But maybe the proposed resolution can be shortened to something like
"; its code units are of type char / wchar_t."

Jens


> On Mon, Jul 31, 2023 at 3:36 PM Tom Honermann via SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
> The suggested resolution for CWG 2779 (Restrictions on the ordinary literal encoding) <https://cplusplus.github.io/CWG/issues/2779.html> modifies [lex.charset]p8 <https://eel.is/c++draft/lex.charset#8> as follows:
>
> ... The /ordinary literal encoding/ is the implementation-defined encoding applied to an ordinary character or string literal; its code unit values are required to be representable as distinct values of type signed char or unsigned char. The /wide literal encoding/ is the implementation-defined encoding applied to a wide character or string literal; its code unit values are required to be representable as distinct values of type wchar_t.
>
> I am uncertain about the proposed "signed char or unsigned char" wording. I think the intent is to state that each code unit value must have a distinct representation in char regardless of its underlying type. However, the wording could be read as permitting (distinct) values in the range [SCHAR_MIN, UCHAR_MAX]. I think it would be simpler to restrict representation to char with an implicit reliance on [basic.fundamental]p6 <http://eel.is/c++draft/basic.fundamental#6> for mapping code unit values from the signed char and unsigned char ranges to representable values in char.
>
> ... The /ordinary literal encoding/ is the implementation-defined encoding applied to an ordinary character or string literal; its code unit values are required to be representable as distinct values of type signed char or unsigned char. The /wide literal encoding/ is the implementation-defined encoding applied to a wide character or string literal; its code unit values are required to be representable as distinct values of type wchar_t.
>
> If desired, we could add a note to clarify the relationship to the signed char and unsigned char value ranges.
>
> Tom.
>
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16 <https://lists.isocpp.org/mailman/listinfo.cgi/sg16>
>
>

Received on 2023-07-31 17:57:13