Subject: Re: Agreeing with Corentin's point re: problem with strict use of abstract characters
From: Corentin Jabot (corentinjabot_at_[hidden])
Date: 2020-06-14 16:21:48
On Sun, 14 Jun 2020 at 23:12, Hubert Tong <hubert.reinterpretcast_at_[hidden]>
> On Sun, Jun 14, 2020 at 2:48 PM Corentin Jabot <corentinjabot_at_[hidden]>
>> On Sun, 14 Jun 2020 at 20:03, Hubert Tong <
>> hubert.reinterpretcast_at_[hidden]> wrote:
>>> On Sun, Jun 14, 2020 at 5:03 AM Corentin Jabot <corentinjabot_at_[hidden]>
>>>> On Sun, 14 Jun 2020 at 08:59, Jens Maurer via SG16 <
>>>> sg16_at_[hidden]> wrote:
>>>> I don't think we should entertain any notion of "same character" in C++,
>>>>> beyond value comparisons in the execution encoding and "identity" as
>>>>> needed for "same identifier".
>>>> We need to in/before phase 1, but I think we reached the consensus that
>>>> we otherwise
>>>> shouldn't and wouldn't
>>> To be clear, we need to make sure we are on the same page with respect
>>> to the meta (notion of) notion of "same character":
>>> By "character", do we mean an "abstract character" or a "coded
>> abstract character in phase 1 ( to get rid of "abstract character" in
>> phase 1, we would have to assume that we have encoded text already - I
>> think that would be a reasonable assumption )
> I think our definition of the members of the "basic source character set"
> would still be in terms of abstract characters. The input, I believe, needs
> to be considered encoded text in order to encapsulate all of the perceived
> relevant differences between characters.
>>> I think that the relationships between terms represent an ideal that is
>>> not met in practice. "Abstract character" is a meaningful notion; however,
>>> the ideal that coded character sets are a bijective function between values
>>> in a codespace and abstract characters has not been clearly attained.
>> Coded characters sets encode a set of abstract characters (unicode has
>> non-characters) .
> I believe the U+00C5/U+212B situation points out why we have a problem
> when trying to handle abstract characters. At the lower technical leve, the
> coded character set has them as different abstract characters. At a higher
> level, they are considered the same. If we deal in coded characters, we
> would not need to handle the "philosophical questions".
To be very down to earth, i think we want to ensure a 1-1 mapping from
Unicode ( which would map each code point independently ) in a bijective
and identity preserving fashion,
while in the general case implementations should be allowed to map one
abstract character (or N source coded characters/codepoints) to multiple
>> Somer abstract characters do not exist in any coded character set. There
>> are abstract characters not yet represented in computers that cannot be
>> handled by a C++ implementation
> With a "wetware" implementation, the formality of defining a coded
> character set is not a requirement for the coded character set to "exist".
SG16 list run by email@example.com