On Wed, Jan 27, 2021 at 9:59 AM Peter Brett <pbrett@cadence.com> wrote:

Hi Corentin,


This certainly seems like a possible simplification to me. Out of interest, did you manage to find out *why* the concept of the basic execution character set was added to the standard in the first place?

Alas I didn't, but it goes back to at least C89




From: SG16 <sg16-bounces@lists.isocpp.org> On Behalf Of Corentin via SG16
Sent: 27 January 2021 08:57
To: SG16 <sg16@lists.isocpp.org>
Cc: Corentin <corentin.jabot@gmail.com>
Subject: [SG16] Is the concept of basic execution character sets useful?





Very quick reminder, using C++20 terminology

We have:


- basic source character set, which, while of limited use in the core language is used quite a bit in the library as a proxy for  "displayable characters available in all encodings", which removal would then be slightly more involved.


- The execution character set(s) which describe actual character sets used during evaluation and are therefore necessary.


- The basic execution character set, which is a super set of the basic source character set

and a subset of all execution character sets.


It's strictly basic source character set +  alert +  backspace + carriage return + NULL


Nowhere is it used in the library.

It is not used in the core language either, except of course that we need to prescribe that NULL is encoded as 0 and that digits are encoded sequentially.


While  alert +  backspace + carriage return are mentioned in escape sequences, if a theoretical encoding would miss these characters, there would be no further ill-effect on the behavior of the standard.


The main change on top of the C++20 wording would be as follow


The basic execution character set and the basic execution wide-character set shall each contain all the members of the basic source character set, plus control characters representing alert, backspace, and carriage return, plus a null character (respectively, null wide character), whose value is 0. For each basic execution character set, the values of the members shall be non-negative and distinct from one another. In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous. The execution character set and the execution wide-character set are implementation-defined supersets of the basic execution character set and the basic execution wide-character set, respectively. The values of the members of the execution character sets and the sets of additional members are locale-specific.


Any reason why we should not do this?


(As always, I'm interested in having a simple model with no unnecessary terminology as, as observed these past few months, it has a tendency to hinder our collective understanding)