sg16: Re: [SG16] Agreeing with Corentin's point re: problem with strict use of abstract characters

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Sun, 14 Jun 2020 11:03:16 +0200

On Sun, 14 Jun 2020 at 08:59, Jens Maurer via SG16 <sg16_at_[hidden]>
wrote:

> On 11/06/2020 00.06, Hubert Tong wrote:
> > On Wed, Jun 10, 2020 at 5:39 PM Jens Maurer <Jens.Maurer_at_[hidden]
> <mailto:Jens.Maurer_at_[hidden]>> wrote:
> >
> > On 10/06/2020 23.23, Hubert Tong via SG16 wrote:
> > > I agree with Corentin's point that the strict use of abstract
> characters introduces problems where a coded character set contains
> multiple values for a single abstract character/contains characters that
> are canonically the same but assigned different values.
> >
> > I have a hard time imagining such a thing. Can you give an example?
> >
> > Yes, U+FA9A as described in
> https://en.wikipedia.org/wiki/Han_unification has this situation with
> U+6F22.
> > These characters are distinct as members of a coded character set, but
> as abstract characters, I do not believe we can easily say the same.
>
> I would expect these to be two different abstract characters in the C++
> sense.
> Roughly, anything you can distinguish in the source should be a different
> "abstract character", if only for the benefit of raw string literals.
>

"abstract character" is a notion for humans and to talk about mapping from
one set to another.
After phase 1, C++ deals with code points such that two sequences of code
points are identical if and only if they have the same values

I don't think we should entertain any notion of "same character" in C++,
> beyond value comparisons in the execution encoding and "identity" as
> needed for "same identifier".
>

We need to in/before phase 1, but I think we reached the consensus that we
otherwise
shouldn't and wouldn't

>
> For example, if some hypothetical input format differentiates red and
> green letters that are otherwise "the same", I'd still expect a red A
> to be a different abstract character than a green A. (Ok, that doesn't
> work for the basic source character set, but should work for anything
> beyond that.)
>

It doesn't work as there isn't any culture on earth that make that
distinction such that there exist no universal-character-name to make that
distinction.
It is best left to people of letter to decide whether colors carry meaning
(and they sometimes do https://en.wikipedia.org/wiki/Ersu_Shaba_script)

If that means the term "character" or "abstract character" is too loaded
> to be used here, so be it. (The terminology space is already fairly
> crowded due to Unicode, so it's hard to find unused phrases that give the
> right connotation.)
>

The terminology used by Unicode people isn't Unicode specific. In
particular, "abstract character" is meaningful independently of
any computer system.

> In general, I'm still hoping that a compiler in an EBCDIC-only world
> can fit seamlessly in our future model.
>

Yes, that is definitively a primary goal :)
In general we shouldn't restrict the set of possible source character sets
or execution character sets more than they are currently.

>
> Jens
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2020-06-14 04:06:37