Date: Sun, 14 Jun 2020 23:21:48 +0200
On Sun, 14 Jun 2020 at 23:12, Hubert Tong <hubert.reinterpretcast_at_[hidden]>
wrote:
> On Sun, Jun 14, 2020 at 2:48 PM Corentin Jabot <corentinjabot_at_[hidden]>
> wrote:
>
>>
>> On Sun, 14 Jun 2020 at 20:03, Hubert Tong <
>> hubert.reinterpretcast_at_[hidden]> wrote:
>>
>>> On Sun, Jun 14, 2020 at 5:03 AM Corentin Jabot <corentinjabot_at_[hidden]>
>>> wrote:
>>>
>>>>
>>>> On Sun, 14 Jun 2020 at 08:59, Jens Maurer via SG16 <
>>>> sg16_at_[hidden]> wrote:
>>>>
>>>> I don't think we should entertain any notion of "same character" in C++,
>>>>> beyond value comparisons in the execution encoding and "identity" as
>>>>> needed for "same identifier".
>>>>>
>>>>
>>>> We need to in/before phase 1, but I think we reached the consensus that
>>>> we otherwise
>>>> shouldn't and wouldn't
>>>>
>>> To be clear, we need to make sure we are on the same page with respect
>>> to the meta (notion of) notion of "same character":
>>> By "character", do we mean an "abstract character" or a "coded
>>> character"?
>>>
>>
>> abstract character in phase 1 ( to get rid of "abstract character" in
>> phase 1, we would have to assume that we have encoded text already - I
>> think that would be a reasonable assumption )
>>
> I think our definition of the members of the "basic source character set"
> would still be in terms of abstract characters. The input, I believe, needs
> to be considered encoded text in order to encapsulate all of the perceived
> relevant differences between characters.
>
>
>>
>>
>>>
>>> I think that the relationships between terms represent an ideal that is
>>> not met in practice. "Abstract character" is a meaningful notion; however,
>>> the ideal that coded character sets are a bijective function between values
>>> in a codespace and abstract characters has not been clearly attained.
>>>
>>
>> Coded characters sets encode a set of abstract characters (unicode has
>> non-characters) .
>>
> I believe the U+00C5/U+212B situation points out why we have a problem
> when trying to handle abstract characters. At the lower technical leve, the
> coded character set has them as different abstract characters. At a higher
> level, they are considered the same. If we deal in coded characters, we
> would not need to handle the "philosophical questions".
>
To be very down to earth, i think we want to ensure a 1-1 mapping from
Unicode ( which would map each code point independently ) in a bijective
and identity preserving fashion,
while in the general case implementations should be allowed to map one
abstract character (or N source coded characters/codepoints) to multiple
UCNs
>
>
>>
>> Somer abstract characters do not exist in any coded character set. There
>> are abstract characters not yet represented in computers that cannot be
>> handled by a C++ implementation
>>
> With a "wetware" implementation, the formality of defining a coded
> character set is not a requirement for the coded character set to "exist".
>
wrote:
> On Sun, Jun 14, 2020 at 2:48 PM Corentin Jabot <corentinjabot_at_[hidden]>
> wrote:
>
>>
>> On Sun, 14 Jun 2020 at 20:03, Hubert Tong <
>> hubert.reinterpretcast_at_[hidden]> wrote:
>>
>>> On Sun, Jun 14, 2020 at 5:03 AM Corentin Jabot <corentinjabot_at_[hidden]>
>>> wrote:
>>>
>>>>
>>>> On Sun, 14 Jun 2020 at 08:59, Jens Maurer via SG16 <
>>>> sg16_at_[hidden]> wrote:
>>>>
>>>> I don't think we should entertain any notion of "same character" in C++,
>>>>> beyond value comparisons in the execution encoding and "identity" as
>>>>> needed for "same identifier".
>>>>>
>>>>
>>>> We need to in/before phase 1, but I think we reached the consensus that
>>>> we otherwise
>>>> shouldn't and wouldn't
>>>>
>>> To be clear, we need to make sure we are on the same page with respect
>>> to the meta (notion of) notion of "same character":
>>> By "character", do we mean an "abstract character" or a "coded
>>> character"?
>>>
>>
>> abstract character in phase 1 ( to get rid of "abstract character" in
>> phase 1, we would have to assume that we have encoded text already - I
>> think that would be a reasonable assumption )
>>
> I think our definition of the members of the "basic source character set"
> would still be in terms of abstract characters. The input, I believe, needs
> to be considered encoded text in order to encapsulate all of the perceived
> relevant differences between characters.
>
>
>>
>>
>>>
>>> I think that the relationships between terms represent an ideal that is
>>> not met in practice. "Abstract character" is a meaningful notion; however,
>>> the ideal that coded character sets are a bijective function between values
>>> in a codespace and abstract characters has not been clearly attained.
>>>
>>
>> Coded characters sets encode a set of abstract characters (unicode has
>> non-characters) .
>>
> I believe the U+00C5/U+212B situation points out why we have a problem
> when trying to handle abstract characters. At the lower technical leve, the
> coded character set has them as different abstract characters. At a higher
> level, they are considered the same. If we deal in coded characters, we
> would not need to handle the "philosophical questions".
>
To be very down to earth, i think we want to ensure a 1-1 mapping from
Unicode ( which would map each code point independently ) in a bijective
and identity preserving fashion,
while in the general case implementations should be allowed to map one
abstract character (or N source coded characters/codepoints) to multiple
UCNs
>
>
>>
>> Somer abstract characters do not exist in any coded character set. There
>> are abstract characters not yet represented in computers that cannot be
>> handled by a C++ implementation
>>
> With a "wetware" implementation, the formality of defining a coded
> character set is not a requirement for the coded character set to "exist".
>
Received on 2020-06-14 16:25:12