sg16: Re: [SG16] Handling of non-basic characters in early translation phases

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Sat, 20 Jun 2020 15:26:45 -0400

On Sat, Jun 20, 2020 at 9:48 AM Corentin Jabot via SG16 <
sg16_at_[hidden]> wrote:

>
>
> On Sat, 20 Jun 2020 at 15:16, Jens Maurer <Jens.Maurer_at_[hidden]> wrote:
>
>>
>> Again, they don't have semantic from a Unicode viewpoint (which is fine),
>> but in a larger system context, they sure have semantics (otherwise
>> they wouldn't have a reason to exist in the first place). How much of
>> those semantics is known to the compiler is a separate question.
>>
>> > A compiler could flag C0/C1 ucn escape sequences in literal, if they
>> wanted too.
>> > And again I'm trying to be pragmatic here. The work IBM is doing to get
>> clang to support ebcdic is converting that ebcdic to utf-8.
>>
> Until that compiler is relatively complete and substantial adoption of it
occurs, the amount of feedback available from users would be minimal.

>
>> Maybe that's because it's the only option under the status quo of C++,
>> which needs to tunnel everything through UCNs.
>>
>
> I think it's more about the cost/benefits of supporting that use case.
> I really would like to know from IBM people if and how much they are
> actually concerned about this point as it is driving many decisions.
>
Standardization is not meant to produce short-term decisions. The ability
to have the choice of differentiating between an EBCDIC control character
physically present in the source and a UCN physically present in the
source, even if they are the same C1 control according to the CDRA mapping,
should not be prevented by the process of standardization. The choice is a
point of design that the implementer should be able to make in consultation
with their users.

A design that requires funnelling through UCNs would mean that there are no
characters that cannot appear in an "Unicode" string; however, the user
intent may be that the EBCDIC control characters (if physically present)
are not okay in that context (whereas the UCN would be).

Received on 2020-06-20 14:30:16