C++ Logo

SG16

Advanced search

Subject: Re: Agreeing with Corentin's point re: problem with strict use of abstract characters
From: Jens Maurer (Jens.Maurer_at_[hidden])
Date: 2020-06-15 02:59:17


On 14/06/2020 22.57, Corentin Jabot wrote:
>
>
> On Sun, 14 Jun 2020 at 22:45, Jens Maurer <Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]>> wrote:
>
> On 14/06/2020 22.19, Corentin Jabot wrote:
> >
> >
> > On Sun, 14 Jun 2020 at 21:55, Jens Maurer <Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]> <mailto:Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]>>> wrote:
>
> >     No, each code point in a sequence (given Unicode input) is a separate abstract character
> >     in my view (after combining surrogate pairs, of course).
> >
> >
> > For example diatrics, when preceded by a letter are not considered abstract characters of their own.
>
> "Abstract character" is defined in https://www.unicode.org/glossary/ as follows:
>
> "A unit of information used for the organization, control, or representation of textual data."
> (ISO 10646 does not appear to have a definition in its clause 3.)
>
> I'm not seeing a conflict between that definition and my view that a diacritic,
> preceded by a letter, can be viewed as two different abstract characters.
> I agree that the alternate viewpoint "single abstract character" is not
> in conflict with the definition, either.
>
> What is your statement "are not considered abstract characters of their own"
> (which seems to leave little room for alternatives) based on?
>
>
> Right the glossary, is very much incomplete
>
> The definition is given in Unicode 13. 3.4 ( http://www.unicode.org/versions/Unicode13.0.0/ch03.pdf%c2 ) 

Hm... ISO 10646:2017 doesn't even contain a definition of
"abstract character" in its clause 3.

> Abstract character: A unit of information used for the organization, control, or representation of textual data.
> • When representing data, the nature of that data is generally symbolic as
> opposed to some other kind of data (for example, aural or visual). Examples of
> such symbolic data include letters, ideographs, digits, punctuation, technical
> symbols, and dingbats.
> • An abstract character has no concrete form and should not be confused with a
> glyph.
> • An abstract character does not necessarily correspond to what a user thinks of
> as a “character” and should not be confused with a grapheme.
> • The abstract characters encoded by the Unicode Standard are known as Unicode abstract characters.
> *• Abstract characters not directly encoded by the Unicode Standard can often be represented by the use of combining character sequences.

From the standpoint of normative wording, it's a bit unfortunate that
the "terms and definitions" section of ISO 10646 does not properly
define the term as it appears to be commonly understood.

I thought ISO 10646 would be a direct representation of Unicode something,
but apparently, it isn't.

Also, "what a user thinks" is more than sloppy wording for a standard.

Jens


SG16 list run by sg16-owner@lists.isocpp.org