C++ Logo


Advanced search

Subject: Re: Bike shedding for Christmas: P1885 Naming Text Encodings
From: Thiago Macieira (thiago_at_[hidden])
Date: 2020-01-04 15:20:05

On Saturday, 4 January 2020 12:32:00 -03 Corentin Jabot wrote:
> Comparison.... basically the only use (2 text encoding with the same mib are
> identical, although the primary iana name might do), and you can compare a
> MiB to a text_encoding directly. I still think it's useful to have a method
> that gives you the mib when it is known so you can interface with a few
> libraries that use it. But beyond that I don't think we should focus on
> MiB.

Sure, but then we'd have something like:

enum class mib : uint32_t {
    // names match
    // https://www.iana.org/assignments/ianacharset-mib/ianacharset-mib
    other = 1,
    unknown = 2,
    csASCII = 3,
    csISOLatin1 = 4,
    csUTF8 = 106,
    csUTF16BE = 1013,
    csUTF16LE = 1014,
    csUTF16 = 1015,
    csUTF32BE = 1017,
    csUTF32LE = 1018,
    csUTF32 = 1019

However, a more powerful way for comparison would be to have a
text_encoding_id class that can compare to mib and to itself. It would be able
to tell whether two unlisted (and possibly unregistered) encodings are the
same, whereas mib can possibly fail. This tex_encoding_id class can have a
mib() accessor that returns a mib number, but may return mib::other. Hence,
direct mib comparison should be discouraged in favour of text_encoding_id.

> Otherwise I agree, an implementation needs to only provide the encoding it
> cares about. (important for embedded platforms where the encoding never
> changes)

Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel System Software Products

SG16 list run by herb.sutter at gmail.com