sg16: Re: [SG16] Bike shedding for Christmas: P1885 Naming Text Encodings

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Sun, 5 Jan 2020 13:12:35 +0100

On Sun, 5 Jan 2020 at 12:40, Thiago Macieira <thiago_at_[hidden]> wrote:

> On Saturday, 4 January 2020 19:55:21 -03 Corentin Jabot wrote:
> > On Sat, 4 Jan 2020 at 22:20, Thiago Macieira <thiago_at_[hidden]>
> wrote:
> > > enum class mib : uint32_t {
> > > // names match
> > > //
> https://www.iana.org/assignments/ianacharset-mib/ianacharset-mib
> > > other = 1,
> > > unknown = 2,
> > > csASCII = 3,
> > > csISOLatin1 = 4,
> > > csUTF8 = 106,
> > > csUTF16BE = 1013,
> > > csUTF16LE = 1014,
> > > csUTF16 = 1015,
> > > csUTF32BE = 1017,
> > > csUTF32LE = 1018,
> > > csUTF32 = 1019
> > > };
> > >
> > > However, a more powerful way for comparison would be to have a
> > > text_encoding_id class that can compare to mib and to itself. It would
> be
> > > able
> > > to tell whether two unlisted (and possibly unregistered) encodings are
> the
> > > same, whereas mib can possibly fail. This tex_encoding_id class can
> have a
> > > mib() accessor that returns a mib number, but may return mib::other.
> > > Hence,
> > > direct mib comparison should be discouraged in favour of
> text_encoding_id.
> >
> > This is *exactly* what is proposed.
> > They compare equal if:
> > * they have tyhe same mib
> > * they have the other mib and their name compare equal (under the
> > comparison algorithm ignoring case dash and a few other things)
>
> Maybe with the same effect, but my specification would be that the
> text_encoding_id has an internal representation that is looked up or
> calculated on creation and that's what's compared, not the MIB or text
> name.
> That implies that an implementation is not required to compare equal any
> ID it
> doesn't know about.The only required ones are the names and aliases as
> currently defined by IANA of the mandatory character sets as listed above.
>
> That has the side-effect that when
> cs1.mib() == mib::unknown
> cs2.mib() == mib::unknown
> then cs1 == cs2, regardless of how cs1 and cs2 were created. That means on
> some implementations, text_encoding_id("WTF-8") ==
> text_encoding_id("SJIS").
>

The way I have done it:
If there is a name which is not known from the implementation the mib will
be `other` rather than unknown
text_encoding with unknown mib have no name (if two text_encoding are
unknown they compare equal)

if you construct a text encoding with an unregistered or other wise name
not known from the implementationm, it will have the "other" mib
ie:

text_encoding_id id("Rubbish");
id.mib() == text_encoding_id::other;
id.name() == "Rubbish"

My interpretation of the rfc is that the mib unknown is meant to to mean
"we cannot tell you what the mib is" (maybe you asked for the console
encoding and there is no console on this device or no api to query it),
rather than "this this is not registered" (which is what other is used for)

> That implies that an implementation is not required to compare equal any
ID it doesn't know about.

Currently, it can not even be constructed with a mib it doesn't know about.
Either: It knows about the encoding or the mib will be other. There is no
scenario in which you can construct a text_encoding with a mib which it
doesn't know about.
This is why I do not want to have a constructor taking an arbitrary mib.
If that is actually useful, I'd rather have a static method which either
return optional<text_encoding> or a text_encoding with mib unknown in case
of the mib is not known of the implementation

> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> Software Architect - Intel System Software Products
>
>
>
>

Received on 2020-01-05 06:15:17