C++ Logo


Advanced search

Re: [SG16-Unicode] Abstract and notes for D1859R0: Standard terminology for execution character set encodings

From: Tom Honermann <tom_at_[hidden]>
Date: Mon, 9 Sep 2019 15:44:07 -0400
On 9/9/19 2:48 PM, Corentin Jabot wrote:
> Character Repertoire. The collection of characters included in a
> character set.
> Character Set. A collection of elements used to represent textual
> information
> Coded Character Set. A character set in which each character is
> assigned a numeric code point. Frequently abbreviated as character
> set, charset, or code set; the acronym CCS is also used.
> Abstract Character. A unit of information used for the organization,
> control, or representation of textual data.
Where did the above terms come from?
> I will admit i am confused. It's either Character Set or Character
> Repertoire

I suppose the above definitions could be read such that a character set
may include members that cannot exist in any character repertoire. For
example, escape characters or other not-really-a-character things like
variation selectors.


> On Mon, 9 Sep 2019 at 20:37, Zach Laine <whatwasthataddress_at_[hidden]
> <mailto:whatwasthataddress_at_[hidden]>> wrote:
> On Sun, Sep 8, 2019 at 8:16 PM Tom Honermann <tom_at_[hidden]
> <mailto:tom_at_[hidden]>> wrote:
> On 9/8/19 12:02 PM, Steve Downey wrote:
>> Character repertoire sounds good, and I will eventually learn
>> to spell it. Character set is definitely terminology from the
>> pre-unicode times, and unfortunately tends to merge the
>> repertoire and encoding,
>> https://www.iana.org/assignments/character-sets/character-sets.xhtml
> I think I was a little over zealous earlier in stating that
> Unicode uses "character repertoire" as I described. I looked
> again and don't find that term formally defined in the
> standard. However, "repertoire" is used throughout the
> standard in ways that I believe are consistent with my
> description. I wasn't able to find an alternative formal term.
> I fully endorse overzelousness as applied to Unicode discussions.
> The way I've been thinking about it is that a "character
> repertoire" describes a set of /abstract characters/ (a formal
> Unicode term) and a "character set" describes a set of
> /encoded characters/ (a formal Unicode term) that associate
> each /abstract character/ member of a "character repertoire"
> with a /code point/ (a formal Unicode term) within a
> /codespace/ (A formal Unicode term). See sections 2.4 and 3.4
> of Unicode 12 and uses of the word "repertoire" within those
> chapters. The Unicode standard does use the term "character
> set", but I didn't find a formal definition.
> I think I follow, except that I don't see whether there is a
> distinction between "character repertoire" and "abstract
> characters". Is there? I'm asking because if there is not, I'd
> prefer to standardize the formally described term, which sounds
> like is "abstract characters".
> Zach

Received on 2019-09-09 21:44:12