C++ Logo

sg16

Advanced search

Re: [SG16] Bike shedding for Christmas: P1885 Naming Text Encodings

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Sun, 29 Dec 2019 00:38:00 +0100
On Sun, Dec 29, 2019, 00:32 Corentin Jabot <corentinjabot_at_[hidden]> wrote:

>
>
> On Sat, Dec 28, 2019, 22:21 Tom Honermann <tom_at_[hidden]> wrote:
>
>> On 12/27/19 6:28 AM, Corentin Jabot via SG16 wrote:
>>
>> Hello
>>
>> In P1885, I introduce the name "text_encoding" for the class
>> representing the name of a text encoding.
>> I wonder whether that might conflict or interfere with actual
>> encoding/decoder classes and would like your opinion.
>>
>> Here are a few possible names:
>> * Charset (IANA nomenclature, posix)
>> * text_codec (Qt)
>> * text_encoding
>> * text_encoding_name (encoding is used by posix / python /
>>
>> Unicode nomenclature would favor encoding (Unicode is a charset of which
>> utf-8 and utf-16 are both are encodings)
>>
>> I suggest text_encoding_id. I'd like to preserve text_encoding for a
>> tag type (or concept) that can be used at compile time to specify a
>> (compile-time) encoding as in a template parameter to std::text.
>>
>> Tangent 1: the proposed text_encoding is not extensible, at least not in
>> a very meaningful way. I suggest we do one of the following:
>>
>> 1. Remove the text_encoding(const char*) constructor. It doesn't
>> allow setting the MIB ID, so is unsatisfactory at present.
>> 2. Allow first class extension by, for example, reserving the full
>> range of IANA MIB values, defining a "private use" range of values, and
>> modifying the text_encoding(const char*) constructor to also accept a
>> MIB value (and perhaps make the name parameter optional such that, if
>> specified, it would override the internally known name for the provided MIB
>> value and if not specified, name() would return a suitable default).
>>
>>
>
> It is extensible in multiple-choice ways:
> - implentation can provide their own aliases for existing mib
> - the other mib + custom name can be used to use a non register encoding.
> Hence the existence of both unknown and other
>
> I will not support custom mib as it is not in line with the rfc - the mib
> being a way to standardize names. Encoding have names, mib is very close to
> an implentation details). For that same reason the name cannot be optional.
> The name in parameter _always_ take precedence over the iana name so it can
> roundtrip to iconv or similar APIs
>

To rephrase that:

Different names may map to the same mib but two encoding with the same
names have to compare equal.


>> 1.
>>
>>
>> if text_encoding remains the name of that class, encoder/decoder can be
>> used for the class doing the actual conversions.
>>
>>
>> I will further rename "system" to "environment" to be more generic and
>> aligned with POSIX.
>>
>> Is text_encoding::system() intended to be equivalent to
>> text_encoding::for_locale(std::locale{})? (I think the answer is, and
>> should be, no; e.g., on Windows, this would query GetACP()).
>>
>
>
> It would query getacp which is equivalent to query the user ("") locale at
> the start of the program.
>
>> (user, environment and system are, for our purpose synonym and intended
>> to mean "the encoding assumed and expected by whatever launched our
>> program).
>> Environment has the added benefit that it implies neither user or systems
>> which makes it more friendly to embedded platforms
>>
>> Since locale settings are generally determined by environment
>> (variables), use of the term "environment" may be confusing. I prefer
>> system.
>>
>> Tangent 2: I don't recall if we discussed this in Belfast, but the paper
>> identifies three sets of encodings to expose (literals, system, locale). A
>> fourth would be terminal/console encoding. This encoding can be easily
>> queried on Windows, but not on Linux/UNIX (though terminal encoding rarely
>> differs from locale there, so it would be reasonable to just return the
>> system encoding).
>>
> I considered that a few days ago. I am not aware of it being a thing on
> other platforms than windows (I am probably wrong) and I don't believe we
> can come up with a nice API in the short term to query that nicely.
>
>> Tom.
>>
>>
>> Thanks for your input,
>>
>> Corentin
>>
>>
>>
>>
>>
>>

Received on 2019-12-28 17:40:42