C++ Logo


Advanced search

Re: [SG16-Unicode] Abstract and notes for D1859R0: Standard terminology for execution character set encodings

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Mon, 9 Sep 2019 22:04:59 +0200
On Mon, 9 Sep 2019 at 21:44, Tom Honermann <tom_at_[hidden]> wrote:

> On 9/9/19 2:48 PM, Corentin Jabot wrote:
> Character Repertoire. The collection of characters included in a character
> set.
> Character Set. A collection of elements used to represent textual
> information
> Coded Character Set. A character set in which each character is assigned a
> numeric code point. Frequently abbreviated as character set, charset, or
> code set; the acronym CCS is also used.
> Abstract Character. A unit of information used for the organization,
> control, or representation of textual data.
> Where did the above terms come from?

Sorry, I should quote my sources

> I will admit i am confused. It's either Character Set or Character
> Repertoire
> I suppose the above definitions could be read such that a character set
> may include members that cannot exist in any character repertoire. For
> example, escape characters or other not-really-a-character things like
> variation selectors.
That does make sense

> Tom.
> On Mon, 9 Sep 2019 at 20:37, Zach Laine <whatwasthataddress_at_[hidden]>
> wrote:
>> On Sun, Sep 8, 2019 at 8:16 PM Tom Honermann <tom_at_[hidden]> wrote:
>>> On 9/8/19 12:02 PM, Steve Downey wrote:
>>> Character repertoire sounds good, and I will eventually learn to spell
>>> it. Character set is definitely terminology from the pre-unicode times, and
>>> unfortunately tends to merge the repertoire and encoding,
>>> https://www.iana.org/assignments/character-sets/character-sets.xhtml
>>> I think I was a little over zealous earlier in stating that Unicode uses
>>> "character repertoire" as I described. I looked again and don't find that
>>> term formally defined in the standard. However, "repertoire" is used
>>> throughout the standard in ways that I believe are consistent with my
>>> description. I wasn't able to find an alternative formal term.
>> I fully endorse overzelousness as applied to Unicode discussions.
>>> The way I've been thinking about it is that a "character repertoire"
>>> describes a set of *abstract characters* (a formal Unicode term) and a
>>> "character set" describes a set of *encoded characters* (a formal
>>> Unicode term) that associate each *abstract character* member of a
>>> "character repertoire" with a *code point* (a formal Unicode term)
>>> within a *codespace* (A formal Unicode term). See sections 2.4 and 3.4
>>> of Unicode 12 and uses of the word "repertoire" within those chapters. The
>>> Unicode standard does use the term "character set", but I didn't find a
>>> formal definition.
>> I think I follow, except that I don't see whether there is a distinction
>> between "character repertoire" and "abstract characters". Is there? I'm
>> asking because if there is not, I'd prefer to standardize the formally
>> described term, which sounds like is "abstract characters".
>> Zach

Received on 2019-09-09 22:05:13