sg16: Re: [SG16] Bike shedding for Christmas: P1885 Naming Text Encodings

From: Tom Honermann <tom_at_[hidden]>
Date: Sat, 28 Dec 2019 16:21:32 -0500

On 12/27/19 6:28 AM, Corentin Jabot via SG16 wrote:
> Hello
>
> In P1885, I introduce the name "text_encoding" for the class
> representing the name of a text encoding.
> I wonder whether that might conflict or interfere with actual
> encoding/decoder classes and would like your opinion.
>
> Here are a few possible names:
> * Charset (IANA nomenclature, posix)
> * text_codec (Qt)
> * text_encoding
> * text_encoding_name (encoding is used by posix / python /
>
> Unicode nomenclature would favor encoding (Unicode is a charset of
> which utf-8 and utf-16 are both are encodings)

I suggest text_encoding_id. I'd like to preserve text_encoding for a
tag type (or concept) that can be used at compile time to specify a
(compile-time) encoding as in a template parameter to std::text.

Tangent 1: the proposed text_encoding is not extensible, at least not in
a very meaningful way. I suggest we do one of the following:

1. Remove the text_encoding(const char*) constructor. It doesn't allow
    setting the MIB ID, so is unsatisfactory at present.
2. Allow first class extension by, for example, reserving the full
    range of IANA MIB values, defining a "private use" range of values,
    and modifying the text_encoding(const char*) constructor to also
    accept a MIB value (and perhaps make the name parameter optional
    such that, if specified, it would override the internally known name
    for the provided MIB value and if not specified, name() would return
    a suitable default).

>
> if text_encoding remains the name of that class, encoder/decoder can
> be used for the class doing the actual conversions.
>
>
> I will further rename "system" to "environment" to be more generic and
> aligned with POSIX.
Is text_encoding::system() intended to be equivalent to
text_encoding::for_locale(std::locale{})? (I think the answer is, and
should be, no; e.g., on Windows, this would query GetACP()).
> (user, environment and system are, for our purpose synonym and
> intended to mean "the encoding assumed and expected by whatever
> launched our program).
> Environment has the added benefit that it implies neither user or
> systems which makes it more friendly to embedded platforms

Since locale settings are generally determined by environment
(variables), use of the term "environment" may be confusing. I prefer
system.

Tangent 2: I don't recall if we discussed this in Belfast, but the paper
identifies three sets of encodings to expose (literals, system,
locale). A fourth would be terminal/console encoding. This encoding can
be easily queried on Windows, but not on Linux/UNIX (though terminal
encoding rarely differs from locale there, so it would be reasonable to
just return the system encoding).

Tom.

>
> Thanks for your input,
>
> Corentin
>
>
>
>

Received on 2019-12-28 15:24:02