ISOCPP sg16 List: CWG 2779: Restrictions on the ordinary literal encoding

From: Tom Honermann <tom_at_[hidden]>
Date: Mon, 31 Jul 2023 09:36:44 -0400

The suggested resolution for CWG 2779 (Restrictions on the ordinary
literal encoding) <https://cplusplus.github.io/CWG/issues/2779.html>
modifies [lex.charset]p8 <https://eel.is/c++draft/lex.charset#8> as follows:

    ... The /ordinary literal encoding/ is the implementation-defined
    encoding applied to an ordinary character or string literal; its
    code unit values are required to be representable as distinct values
    of type signed char or unsigned char. The /wide literal encoding/ is
    the implementation-defined encoding applied to a wide character or
    string literal; its code unit values are required to be
    representable as distinct values of type wchar_t.

I am uncertain about the proposed "signed char or unsigned char"
wording. I think the intent is to state that each code unit value must
have a distinct representation in char regardless of its underlying
type. However, the wording could be read as permitting (distinct) values
in the range [SCHAR_MIN, UCHAR_MAX]. I think it would be simpler to
restrict representation to char with an implicit reliance on
[basic.fundamental]p6 <http://eel.is/c++draft/basic.fundamental#6> for
mapping code unit values from the signed char and unsigned char ranges
to representable values in char.

    ... The /ordinary literal encoding/ is the implementation-defined
    encoding applied to an ordinary character or string literal; its
    code unit values are required to be representable as distinct values
    of type signed char or unsigned char. The /wide literal encoding/ is
    the implementation-defined encoding applied to a wide character or
    string literal; its code unit values are required to be
    representable as distinct values of type wchar_t.

If desired, we could add a note to clarify the relationship to the
signed char and unsigned char value ranges.

Tom.

Received on 2023-07-31 13:36:47