C++ Logo

SG16

Advanced search

Subject: Re: Towards a better description of the execution encoding
From: Steve Downey (sdowney_at_[hidden])
Date: 2021-03-01 10:27:05


Could we perhaps make use of the encoding used by the "C" locale to talk
about how the "encoding of the execution character set" is meant to be
interpreted? http://eel.is/c++draft/lex.ccon#2
Execution encoding isn't currently used in the standard as that exact
phrase, although lex.ccon does come close, as does
http://eel.is/c++draft/tab:lex.string.literal

Incidentally http://eel.is/c++draft/fs.req.general#4 [*Note 1
<http://eel.is/c++draft/full#fs.req.general-note-1>*:
Use of an encoded character type implies an associated character set and
encoding. <http://eel.is/c++draft/full#fs.req.general-4.sentence-1>

Since signed char and unsigned char have no implied character set and
encoding, they are not included as permitted types.
<http://eel.is/c++draft/full#fs.req.general-4.sentence-2>
 â€” *end note*]

is contradicted by lex.ccon.

On Mon, Mar 1, 2021 at 10:24 AM Corentin via SG16 <sg16_at_[hidden]>
wrote:

> Hey folks!
> Last meeting we talked about the relation between the literal & execution
> encoding.
>
> I think there is pressure to solve this issue (encoding names, std::print,
> other features).
> In P2297, I suggested that we say the execution character set is a
> superset of the literal character set, such that any character in the
> literal character set results in the same code unit sequence
> whether it is encoded in the literal encoding or execution encoding.
>
> Hubert was concerned this was too restrictive because some ebcdic &
> iso 646 have codepoints reserved for "national symbols".
> Even Shift-JIS is not 100% ascii compatible (Yen instead of backslash,
> overline instead of tilde)
>
> I've been thinking about that over the past few days, I think the solution
> is to not have requirements on the literal character set but rather on the
> literals themselves.
>
> If the execution encoding is UTF8, "ABC" is interpreted identically
> whether its encoding is ASCII, ISO 646-IT, or Shift-JS.
>
> However, "C:\\" would be interpreted as "C:\\", "C:ç" and "C:¥"
> respectively.
>
> So we need to only put requirements on the content of individual literals
> rather than on the entiere literal set (which, P1885 non whistanding, is
> not observable during execution anyhow)
>
>
> *A way to word that:*
>
> The execution encoding is the locale-specific encoding used to interpret
> character and NTMBS parameters in character functions, multibyte characters
> functions and other locale-specific functions.
>
> If character literals and string literals used as arguments to character
> functions and locale specific functions do not represent the same sequence
> of abstract characters whether they are interpreted with the literal
> encoding or the execution encoding the behavior is undefined.
>
> I hope that this resolves Hubert concerns and that we can refine the
> general idea and put that in a paper :)
>
> Have a great week,
> Corentin
>
>
>
>
>
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>



SG16 list run by sg16-owner@lists.isocpp.org