On Sat, Jan 30, 2021 at 5:39 AM Hubert Tong <hubert.reinterpretcast@gmail.com> wrote:

On Wed, Jan 27, 2021 at 3:57 AM Corentin via SG16 <sg16@lists.isocpp.org> wrote:

Hello,

Very quick reminder, using C++20 terminology

We have:

- basic source character set, which, while of limited use in the core language is used quite a bit in the library as a proxy for "displayable characters available in all encodings", which removal would then be slightly more involved.

- The execution character set(s) which describe actual character sets used during evaluation and are therefore necessary.

- The basic execution character set, which is a super set of the basic source character set

and a subset of all execution character sets.

It's strictly basic source character set + alert + backspace + carriage return + NULL

Nowhere is it used in the library.

It is not used in the core language either, except of course that we need to prescribe that NULL is encoded as 0 and that digits are encoded sequentially.

While alert + backspace + carriage return are mentioned in escape sequences, if a theoretical encoding would miss these characters, there would be no further ill-effect on the behavior of the standard.

The main change on top of the C++20 wording would be as follow

The ~~basic~~ execution character set and the ~~basic~~ execution wide-character set shall each contain all the members of the basic source character set, ~~plus control characters representing alert, backspace, and carriage return,~~ plus a null character (respectively, null wide character), whose value is 0. For each ~~basic~~ execution character set, the values of the members shall be non-negative and distinct from one another. In both the source and execution basic character sets,

You missed a "basic" as applied to "execution character set" here.

the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous. The execution character set and the execution wide-character set are implementation-defined supersets of the basic execution character set and the basic execution wide-character set, respectively. The values of the members of the execution character sets and the sets of additional members are locale-specific.

Any reason why we should not do this?

Because the above does not update [intro.memory] and leaves a dangling reference to the meaning of "basic execution character set".

Are you talking about 3.35 [defns..multibyte] ?

> sequence of one or more bytes representing a member of the extended character set of either the source or the execution environment

[Note 1: The extended character set is a superset of the basic character set ([lex.charset]). — end note]

If so, sorry I miss that, and yes that would need rewriting, good catch, thanks!

Also, the above wording is currently meant to say (in part) that the characters required as members of the basic execution character sets, when encoded within a "narrow" possibly-multibyte string in any execution coded character set supported by the implementation, are single bytes whose value as read via a glvalue of type `char` is positive. The proposal seems to leave the relevant sentence in a sad state.

I don't think my proposed change (which I meant to be more illustrative) does alter the current meaning significantly. If it does, I am not seeing it.

If you are saying that this could benefit from a more extensive rewrite?

Because I think I'd agree with that.

Maybe listing all the requirements more explicitly?

------

The execution character set and wide execution character set are implementation-defined character encodings such that: