C++ Logo

sg16

Advanced search

[SG16] Towards a better description of the execution encoding

From: Corentin <corentin.jabot_at_[hidden]>
Date: Mon, 1 Mar 2021 16:24:04 +0100
Hey folks!
Last meeting we talked about the relation between the literal & execution
encoding.

I think there is pressure to solve this issue (encoding names, std::print,
other features).
In P2297, I suggested that we say the execution character set is a superset
of the literal character set, such that any character in the literal
character set results in the same code unit sequence
whether it is encoded in the literal encoding or execution encoding.

Hubert was concerned this was too restrictive because some ebcdic & iso 646
have codepoints reserved for "national symbols".
Even Shift-JIS is not 100% ascii compatible (Yen instead of backslash,
overline instead of tilde)

I've been thinking about that over the past few days, I think the solution
is to not have requirements on the literal character set but rather on the
literals themselves.

If the execution encoding is UTF8, "ABC" is interpreted identically whether
its encoding is ASCII, ISO 646-IT, or Shift-JS.

However, "C:\\" would be interpreted as "C:\\", "C:ç" and "C:¥"
respectively.

So we need to only put requirements on the content of individual literals
rather than on the entiere literal set (which, P1885 non whistanding, is
not observable during execution anyhow)


*A way to word that:*

The execution encoding is the locale-specific encoding used to interpret
character and NTMBS parameters in character functions, multibyte characters
functions and other locale-specific functions.

If character literals and string literals used as arguments to character
functions and locale specific functions do not represent the same sequence
of abstract characters whether they are interpreted with the literal
encoding or the execution encoding the behavior is undefined.

I hope that this resolves Hubert concerns and that we can refine the
general idea and put that in a paper :)

Have a great week,
Corentin

Received on 2021-03-01 09:24:18