C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] Abstract and notes for D1859R0: Standard terminology for execution character set encodings

From: Tom Honermann <tom_at_[hidden]>
Date: Mon, 9 Sep 2019 16:25:44 -0400
On 9/9/19 4:04 PM, Corentin Jabot wrote:
>
>
> On Mon, 9 Sep 2019 at 21:44, Tom Honermann <tom_at_[hidden]
> <mailto:tom_at_[hidden]>> wrote:
>
> On 9/9/19 2:48 PM, Corentin Jabot wrote:
>> Character Repertoire. The collection of characters included in a
>> character set.
>> Character Set. A collection of elements used to represent textual
>> information
>> Coded Character Set. A character set in which each character is
>> assigned a numeric code point. Frequently abbreviated as
>> character set, charset, or code set; the acronym CCS is also used.
>> Abstract Character. A unit of information used for the
>> organization, control, or representation of textual data.
> Where did the above terms come from?
>
>
> Sorry, I should quote my sources
> https://unicode.org/glossary/
>
>>
>> I will admit i am confused. It's either Character Set or
>> Character Repertoire
>
> I suppose the above definitions could be read such that a
> character set may include members that cannot exist in any
> character repertoire. For example, escape characters or other
> not-really-a-character things like variation selectors.
>
> That does make sense

Another interpretation is that a character set might contain only 'A'
U+0041 { LATIN CAPITAL LETTER A } and ' ́' U+0301 { COMBINING ACUTE
ACCENT }, but its character repertoire contains 'A' and 'Á' because both
can be represented using the elements of the character set.

Tom.

> Tom.
>
>>
>>
>>
>> On Mon, 9 Sep 2019 at 20:37, Zach Laine
>> <whatwasthataddress_at_[hidden]
>> <mailto:whatwasthataddress_at_[hidden]>> wrote:
>>
>> On Sun, Sep 8, 2019 at 8:16 PM Tom Honermann
>> <tom_at_[hidden] <mailto:tom_at_[hidden]>> wrote:
>>
>> On 9/8/19 12:02 PM, Steve Downey wrote:
>>> Character repertoire sounds good, and I will eventually
>>> learn to spell it. Character set is
>>> definitely terminology from the pre-unicode times, and
>>> unfortunately tends to merge the repertoire and
>>> encoding,
>>> https://www.iana.org/assignments/character-sets/character-sets.xhtml
>>
>> I think I was a little over zealous earlier in stating
>> that Unicode uses "character repertoire" as I described.
>> I looked again and don't find that term formally defined
>> in the standard. However, "repertoire" is used throughout
>> the standard in ways that I believe are consistent with
>> my description. I wasn't able to find an alternative
>> formal term.
>>
>> I fully endorse overzelousness as applied to Unicode discussions.
>>
>> The way I've been thinking about it is that a "character
>> repertoire" describes a set of /abstract characters/ (a
>> formal Unicode term) and a "character set" describes a
>> set of /encoded characters/ (a formal Unicode term) that
>> associate each /abstract character/ member of a
>> "character repertoire" with a /code point/ (a formal
>> Unicode term) within a /codespace/ (A formal Unicode
>> term). See sections 2.4 and 3.4 of Unicode 12 and uses
>> of the word "repertoire" within those chapters. The
>> Unicode standard does use the term "character set", but I
>> didn't find a formal definition.
>>
>> I think I follow, except that I don't see whether there is a
>> distinction between "character repertoire" and "abstract
>> characters". Is there? I'm asking because if there is not,
>> I'd prefer to standardize the formally described term, which
>> sounds like is "abstract characters".
>> Zach
>>
>


Received on 2019-09-09 22:25:47