On 9/9/19 4:04 PM, Corentin Jabot wrote:


On Mon, 9 Sep 2019 at 21:44, Tom Honermann <tom@honermann.net> wrote:
On 9/9/19 2:48 PM, Corentin Jabot wrote:
Character Repertoire. The collection of characters included in a character set.
Character Set. A collection of elements used to represent textual information
Coded Character Set. A character set in which each character is assigned a numeric code point. Frequently abbreviated as character set, charset, or code set; the acronym CCS is also used.
Abstract Character. A unit of information used for the organization, control, or representation of textual data.
Where did the above terms come from?

Sorry, I should quote my sources
https://unicode.org/glossary/ 

I will admit i am confused. It's either Character Set or Character Repertoire

I suppose the above definitions could be read such that a character set may include members that cannot exist in any character repertoire.  For example, escape characters or other not-really-a-character things like variation selectors.

That does make sense

Another interpretation is that a character set might contain only 'A' U+0041 { LATIN CAPITAL LETTER A } and ' ́' U+0301 { COMBINING ACUTE ACCENT }, but its character repertoire contains 'A' and 'Á' because both can be represented using the elements of the character set.

Tom.

Tom.




On Mon, 9 Sep 2019 at 20:37, Zach Laine <whatwasthataddress@gmail.com> wrote:
On Sun, Sep 8, 2019 at 8:16 PM Tom Honermann <tom@honermann.net> wrote:
On 9/8/19 12:02 PM, Steve Downey wrote:
Character repertoire sounds good, and I will eventually learn to spell it. Character set is definitely terminology from the pre-unicode times, and unfortunately tends to merge the repertoire and encoding, https://www.iana.org/assignments/character-sets/character-sets.xhtml

I think I was a little over zealous earlier in stating that Unicode uses "character repertoire" as I described.  I looked again and don't find that term formally defined in the standard.  However, "repertoire" is used throughout the standard in ways that I believe are consistent with my description.  I wasn't able to find an alternative formal term.

I fully endorse overzelousness as applied to Unicode discussions.

The way I've been thinking about it is that a "character repertoire" describes a set of abstract characters (a formal Unicode term) and a "character set" describes a set of encoded characters (a formal Unicode term) that associate each abstract character member of a "character repertoire" with a code point (a formal Unicode term) within a codespace (A formal Unicode term).  See sections 2.4 and 3.4 of Unicode 12 and uses of the word "repertoire" within those chapters.  The Unicode standard does use the term "character set", but I didn't find a formal definition.

I think I follow, except that I don't see whether there is a distinction between "character repertoire" and "abstract characters".  Is there?  I'm asking because if there is not, I'd prefer to standardize the formally described term, which sounds like is "abstract characters".
 
Zach