C++ Logo


Advanced search

Re: [SG16-Unicode] Abstract and notes for D1859R0: Standard terminology for execution character set encodings

From: Zach Laine <whatwasthataddress_at_[hidden]>
Date: Mon, 9 Sep 2019 11:37:24 -0700
On Sun, Sep 8, 2019 at 8:16 PM Tom Honermann <tom_at_[hidden]> wrote:

> On 9/8/19 12:02 PM, Steve Downey wrote:
> Character repertoire sounds good, and I will eventually learn to spell it.
> Character set is definitely terminology from the pre-unicode times, and
> unfortunately tends to merge the repertoire and encoding,
> https://www.iana.org/assignments/character-sets/character-sets.xhtml
> I think I was a little over zealous earlier in stating that Unicode uses
> "character repertoire" as I described. I looked again and don't find that
> term formally defined in the standard. However, "repertoire" is used
> throughout the standard in ways that I believe are consistent with my
> description. I wasn't able to find an alternative formal term.
I fully endorse overzelousness as applied to Unicode discussions.

> The way I've been thinking about it is that a "character repertoire"
> describes a set of *abstract characters* (a formal Unicode term) and a
> "character set" describes a set of *encoded characters* (a formal Unicode
> term) that associate each *abstract character* member of a "character
> repertoire" with a *code point* (a formal Unicode term) within a
> *codespace* (A formal Unicode term). See sections 2.4 and 3.4 of Unicode
> 12 and uses of the word "repertoire" within those chapters. The Unicode
> standard does use the term "character set", but I didn't find a formal
> definition.
I think I follow, except that I don't see whether there is a distinction
between "character repertoire" and "abstract characters". Is there? I'm
asking because if there is not, I'd prefer to standardize the formally
described term, which sounds like is "abstract characters".


Received on 2019-09-09 20:37:36