C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] Abstract and notes for D1859R0: Standard terminology for execution character set encodings

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Mon, 9 Sep 2019 20:48:03 +0200
Character Repertoire. The collection of characters included in a character
set.
Character Set. A collection of elements used to represent textual
information
Coded Character Set. A character set in which each character is assigned a
numeric code point. Frequently abbreviated as character set, charset, or
code set; the acronym CCS is also used.
Abstract Character. A unit of information used for the organization,
control, or representation of textual data.

I will admit i am confused. It's either Character Set or Character
Repertoire



On Mon, 9 Sep 2019 at 20:37, Zach Laine <whatwasthataddress_at_[hidden]>
wrote:

> On Sun, Sep 8, 2019 at 8:16 PM Tom Honermann <tom_at_[hidden]> wrote:
>
>> On 9/8/19 12:02 PM, Steve Downey wrote:
>>
>> Character repertoire sounds good, and I will eventually learn to spell
>> it. Character set is definitely terminology from the pre-unicode times, and
>> unfortunately tends to merge the repertoire and encoding,
>> https://www.iana.org/assignments/character-sets/character-sets.xhtml
>>
>> I think I was a little over zealous earlier in stating that Unicode uses
>> "character repertoire" as I described. I looked again and don't find that
>> term formally defined in the standard. However, "repertoire" is used
>> throughout the standard in ways that I believe are consistent with my
>> description. I wasn't able to find an alternative formal term.
>>
> I fully endorse overzelousness as applied to Unicode discussions.
>
>> The way I've been thinking about it is that a "character repertoire"
>> describes a set of *abstract characters* (a formal Unicode term) and a
>> "character set" describes a set of *encoded characters* (a formal
>> Unicode term) that associate each *abstract character* member of a
>> "character repertoire" with a *code point* (a formal Unicode term)
>> within a *codespace* (A formal Unicode term). See sections 2.4 and 3.4
>> of Unicode 12 and uses of the word "repertoire" within those chapters. The
>> Unicode standard does use the term "character set", but I didn't find a
>> formal definition.
>>
> I think I follow, except that I don't see whether there is a distinction
> between "character repertoire" and "abstract characters". Is there? I'm
> asking because if there is not, I'd prefer to standardize the formally
> described term, which sounds like is "abstract characters".
>
> Zach
>

Received on 2019-09-09 20:48:16