C++ Logo

sg16

Advanced search

Re: [SG16] Is the concept of basic execution character sets useful?

From: Corentin <corentin.jabot_at_[hidden]>
Date: Thu, 28 Jan 2021 21:12:56 +0100
const char* a = "\U0001F4BB";

Is also a valid C++ source which might be ill-formed on many platforms, for
example when compiling for shift jis with GCC

On Thu, Jan 28, 2021 at 8:51 PM Steve Downey <sdowney_at_[hidden]> wrote:

> Valid C++ source couldn't be processed on that target.
> char c = '\a';
>
> On Wed, Jan 27, 2021 at 11:54 AM Corentin <corentin.jabot_at_[hidden]>
> wrote:
>
>>
>>
>> On Wed, Jan 27, 2021 at 5:50 PM Steve Downey <sdowney_at_[hidden]> wrote:
>>
>>> The basic execution character set is the basic source execution
>>> character set plus the mandatory control codes, and is what you need to
>>> express the mandatory C characters in the execution space, without
>>> discussing what the encoding actually is. When there was much wider
>>> variance in what character sets included which characters it was far more
>>> important in figuring out how to port the language.
>>>
>>
>> If there was a theoretical encoding that had no bell for example, would
>> that break C++ in anyway?
>>
>>
>>>
>>> On Wed, Jan 27, 2021 at 4:04 AM Corentin via SG16 <sg16_at_[hidden]>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Jan 27, 2021 at 9:59 AM Peter Brett <pbrett_at_[hidden]> wrote:
>>>>
>>>>> Hi Corentin,
>>>>>
>>>>>
>>>>>
>>>>> This certainly seems like a possible simplification to me. Out of
>>>>> interest, did you manage to find out **why** the concept of the basic
>>>>> execution character set was added to the standard in the first place?
>>>>>
>>>>
>>>> Alas I didn't, but it goes back to at least C89
>>>>
>>>>
>>>>>
>>>>>
>>>>> Peter
>>>>>
>>>>>
>>>>>
>>>>> *From:* SG16 <sg16-bounces_at_[hidden]> *On Behalf Of *Corentin
>>>>> via SG16
>>>>> *Sent:* 27 January 2021 08:57
>>>>> *To:* SG16 <sg16_at_[hidden]>
>>>>> *Cc:* Corentin <corentin.jabot_at_[hidden]>
>>>>> *Subject:* [SG16] Is the concept of basic execution character sets
>>>>> useful?
>>>>>
>>>>>
>>>>>
>>>>> EXTERNAL MAIL
>>>>>
>>>>> Hello,
>>>>>
>>>>>
>>>>>
>>>>> Very quick reminder, using C++20 terminology
>>>>>
>>>>> We have:
>>>>>
>>>>>
>>>>>
>>>>> - basic source character set, which, while of limited use in the core
>>>>> language is used quite a bit in the library as a proxy for "displayable
>>>>> characters available in all encodings", which removal would then be
>>>>> slightly more involved.
>>>>>
>>>>>
>>>>>
>>>>> - The execution character set(s) which describe actual character sets
>>>>> used during evaluation and are therefore necessary.
>>>>>
>>>>>
>>>>>
>>>>> - The basic execution character set, which is a super set of the basic
>>>>> source character set
>>>>>
>>>>> and a subset of all execution character sets.
>>>>>
>>>>>
>>>>>
>>>>> It's strictly basic source character set + alert + backspace +
>>>>> carriage return + NULL
>>>>>
>>>>>
>>>>>
>>>>> Nowhere is it used in the library.
>>>>>
>>>>> It is not used in the core language either, except of course that we
>>>>> need to prescribe that NULL is encoded as 0 and that digits are encoded
>>>>> sequentially.
>>>>>
>>>>>
>>>>>
>>>>> While alert + backspace + carriage return are mentioned in escape
>>>>> sequences, if a theoretical encoding would miss these characters, there
>>>>> would be no further ill-effect on the behavior of the standard.
>>>>>
>>>>>
>>>>>
>>>>> The main change on top of the C++20 wording would be as follow
>>>>>
>>>>>
>>>>>
>>>>> The basic execution character set and the basic execution
>>>>> wide-character set shall each contain all the members of the basic source
>>>>> character set, plus control characters representing alert, backspace,
>>>>> and carriage return, plus a null character (respectively, null wide
>>>>> character), whose value is 0. For each basic execution character set,
>>>>> the values of the members shall be non-negative and distinct from one
>>>>> another. In both the source and execution basic character sets, the value
>>>>> of each character after 0 in the above list of decimal digits shall be one
>>>>> greater than the value of the previous. The execution character
>>>>> set and the execution wide-character set are implementation-defined
>>>>> supersets of the basic execution character set and the basic execution
>>>>> wide-character set, respectively. The values of the members of the
>>>>> execution character sets and the sets of additional members are
>>>>> locale-specific.
>>>>>
>>>>>
>>>>>
>>>>> Any reason why we should not do this?
>>>>>
>>>>>
>>>>>
>>>>> (As always, I'm interested in having a simple model with no
>>>>> unnecessary terminology as, as observed these past few months, it has a
>>>>> tendency to hinder our collective understanding)
>>>>>
>>>>>
>>>>>
>>>>> Corentin
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> --
>>>> SG16 mailing list
>>>> SG16_at_[hidden]
>>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>>
>>>

Received on 2021-01-28 14:13:10