Subject: Re: Agreeing with Corentin's point re: problem with strict use of abstract characters
From: Jens Maurer (Jens.Maurer_at_[hidden])
Date: 2020-06-15 02:59:17
On 14/06/2020 22.57, Corentin Jabot wrote:
> On Sun, 14 Jun 2020 at 22:45, Jens Maurer <Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]>> wrote:
> On 14/06/2020 22.19, Corentin Jabot wrote:
> > On Sun, 14 Jun 2020 at 21:55, Jens Maurer <Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]> <mailto:Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]>>> wrote:
> >Â Â Â No, each code point in a sequence (given Unicode input) is a separate abstract character
> >Â Â Â in my view (after combining surrogate pairs, of course).
> > For example diatrics, when preceded by a letter are not considered abstract characters of their own.
> "Abstract character" is defined in https://www.unicode.org/glossary/ as follows:
> "A unit of information used for the organization, control, or representation of textual data."
> (ISO 10646 does not appear to have a definition in its clause 3.)
> I'm not seeing a conflict between that definition and my view that a diacritic,
> preceded by a letter, can be viewed as two different abstract characters.
> I agree that the alternate viewpoint "single abstract character" is not
> in conflict with the definition, either.
> What is your statement "are not considered abstract characters of their own"
> (which seems to leave little room for alternatives) based on?
> Right the glossary, is very muchÂ incomplete
> TheÂ definition is given in Unicode 13. 3.4 (Â http://www.unicode.org/versions/Unicode13.0.0/ch03.pdf%c2 )Â
Hm... ISO 10646:2017 doesn't even contain a definition of
"abstract character" in its clause 3.
> Abstract character: A unit of information used for the organization, control, or representation of textual data.
> â¢ When representing data, the nature of that data is generally symbolic as
> opposed to some other kind of data (for example, aural or visual). Examples of
> such symbolic data include letters, ideographs, digits, punctuation, technical
> symbols, and dingbats.
> â¢ An abstract character has no concrete form and should not be confused with a
> â¢ An abstract character does not necessarily correspond to what a user thinks of
> as a âcharacterâ and should not be confused with a grapheme.
> â¢ The abstract characters encoded by the Unicode Standard are known as Unicode abstract characters.
> *â¢ Abstract characters not directly encoded by the Unicode Standard can often be represented by the use of combining character sequences.
From the standpoint of normative wording, it's a bit unfortunate that
the "terms and definitions" section of ISO 10646 does not properly
define the term as it appears to be commonly understood.
I thought ISO 10646 would be a direct representation of Unicode something,
but apparently, it isn't.
Also, "what a user thinks" is more than sloppy wording for a standard.
SG16 list run by email@example.com