On Sun, 5 Jan 2020 at 12:40, Thiago Macieira <thiago@macieira.org> wrote:
On Saturday, 4 January 2020 19:55:21 -03 Corentin Jabot wrote:
> On Sat, 4 Jan 2020 at 22:20, Thiago Macieira <thiago@macieira.org> wrote:
> > enum class mib : uint32_t {
> >     // names match
> >     // https://www.iana.org/assignments/ianacharset-mib/ianacharset-mib
> >     other = 1,
> >     unknown = 2,
> >     csASCII = 3,
> >     csISOLatin1 = 4,
> >     csUTF8 = 106,
> >     csUTF16BE = 1013,
> >     csUTF16LE = 1014,
> >     csUTF16 = 1015,
> >     csUTF32BE = 1017,
> >     csUTF32LE = 1018,
> >     csUTF32 = 1019
> > };
> >
> > However, a more powerful way for comparison would be to have a
> > text_encoding_id class that can compare to mib and to itself. It would be
> > able
> > to tell whether two unlisted (and possibly unregistered) encodings are the
> > same, whereas mib can possibly fail. This tex_encoding_id class can have a
> > mib() accessor that returns a mib number, but may return mib::other.
> > Hence,
> > direct mib comparison should be discouraged in favour of text_encoding_id.
>
> This is *exactly* what is proposed.
> They compare equal if:
>  * they have tyhe same mib
>  * they have the other mib and their name compare equal (under the
> comparison algorithm ignoring case dash and a few other things)

Maybe with the same effect, but my specification would be that the
text_encoding_id has an internal representation that is looked up or
calculated on creation and that's what's compared, not the MIB or text name.
That implies that an implementation is not required to compare equal any ID it
doesn't know about.The only required ones are the names and aliases as
currently defined by IANA of the mandatory character sets as listed above.

That has the side-effect that when
        cs1.mib() == mib::unknown
        cs2.mib() == mib::unknown
then cs1 == cs2, regardless of how cs1 and cs2 were created. That means on
some implementations, text_encoding_id("WTF-8") == text_encoding_id("SJIS").

The way I have done it:
If there is a name which is not known from the implementation the mib will be `other` rather than unknown
text_encoding with unknown mib have no name (if two text_encoding are unknown they compare equal)

if you construct a text encoding with an unregistered or other wise name not known from the implementationm, it will have the "other" mib
ie:

text_encoding_id id("Rubbish");
id.mib() == text_encoding_id::other;
id.name() == "Rubbish"

My interpretation of the rfc is that the mib unknown is meant to to mean "we cannot tell you what the mib is" (maybe you asked for the console encoding and there is no console on this device or no api to query it),
rather than "this this is not registered" (which is what other is used for)

> That implies that an implementation is not required to compare equal any ID it doesn't know about.

Currently, it can not even be constructed with a mib it doesn't know about.
Either: It knows about the encoding or the mib will be other. There is no scenario in which you can construct a text_encoding with a mib which it doesn't know about.
This is why I do not want to have a constructor taking an arbitrary mib.
If that is actually useful,  I'd rather have a static method which either return optional<text_encoding> or a text_encoding with mib unknown in case of the mib is not known of the implementation




 
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel System Software Products