On Sun, 14 Jun 2020 at 23:12, Hubert Tong <hubert.reinterpretcast@gmail.com> wrote:

On Sun, Jun 14, 2020 at 2:48 PM Corentin Jabot <corentinjabot@gmail.com> wrote:

On Sun, 14 Jun 2020 at 20:03, Hubert Tong <hubert.reinterpretcast@gmail.com> wrote:
On Sun, Jun 14, 2020 at 5:03 AM Corentin Jabot <corentinjabot@gmail.com> wrote:

On Sun, 14 Jun 2020 at 08:59, Jens Maurer via SG16 <sg16@lists.isocpp.org> wrote:

I don't think we should entertain any notion of "same character" in C++,
beyond value comparisons in the execution encoding and "identity" as
needed for "same identifier".

We need to in/before phase 1, but I think we reached the consensus that we otherwise
shouldn't and wouldn't
To be clear, we need to make sure we are on the same page with respect to the meta (notion of) notion of "same character":
By "character", do we mean an "abstract character" or a "coded character"?

abstract character in phase 1 ( to get rid of "abstract character" in phase 1, we would have to assume that we have encoded text already - I think that would be a reasonable assumption )
I think our definition of the members of the "basic source character set" would still be in terms of abstract characters. The input, I believe, needs to be considered encoded text in order to encapsulate all of the perceived relevant differences between characters.

I think that the relationships between terms represent an ideal that is not met in practice. "Abstract character" is a meaningful notion; however, the ideal that coded character sets are a bijective function between values in a codespace and abstract characters has not been clearly attained.

Coded characters sets encode a set of abstract characters (unicode has non-characters) .
I believe the U+00C5/U+212B situation points out why we have a problem when trying to handle abstract characters. At the lower technical leve, the coded character set has them as different abstract characters. At a higher level, they are considered the same. If we deal in coded characters, we would not need to handle the "philosophical questions".

To be very down to earth, i think we want to ensure a 1-1 mapping from Unicode ( which would map each code point independently ) in a bijective and identity preserving fashion,

while in the general case implementations should be allowed to map one abstract character (or N source coded characters/codepoints) to multiple UCNs

Somer abstract characters do not exist in any coded character set. There are abstract characters not yet represented in computers that cannot be handled by a C++ implementation
With a "wetware" implementation, the formality of defining a coded character set is not a requirement for the coded character set to "exist".