Subject: Re: Agreeing with Corentin's point re: problem with strict use of abstract characters
From: Jens Maurer (Jens.Maurer_at_[hidden])
Date: 2020-06-14 01:59:25
On 11/06/2020 00.06, Hubert Tong wrote:
> On Wed, Jun 10, 2020 at 5:39 PM Jens Maurer <Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]>> wrote:
> On 10/06/2020 23.23, Hubert Tong via SG16 wrote:
> > I agree with Corentin's point that the strict use of abstract characters introduces problems where a coded character set contains multiple values for a single abstract character/contains characters that are canonically the same but assigned different values.
> I have a hard time imagining such a thing.Â Can you give an example?
> Yes, U+FA9A as described in https://en.wikipedia.org/wiki/Han_unification has this situation with U+6F22.
> These characters are distinct as members of a coded character set, but as abstract characters, I do not believe we can easily say the same.
I would expect these to be two different abstract characters in the C++ sense.
Roughly, anything you can distinguish in the source should be a different
"abstract character", if only for the benefit of raw string literals.
I don't think we should entertain any notion of "same character" in C++,
beyond value comparisons in the execution encoding and "identity" as
needed for "same identifier".
For example, if some hypothetical input format differentiates red and
green letters that are otherwise "the same", I'd still expect a red A
to be a different abstract character than a green A. (Ok, that doesn't
work for the basic source character set, but should work for anything
If that means the term "character" or "abstract character" is too loaded
to be used here, so be it. (The terminology space is already fairly
crowded due to Unicode, so it's hard to find unused phrases that give the
In general, I'm still hoping that a compiler in an EBCDIC-only world
can fit seamlessly in our future model.
SG16 list run by email@example.com