Date: Sun, 05 Jan 2020 08:40:03 -0300
On Saturday, 4 January 2020 19:55:21 -03 Corentin Jabot wrote:
> On Sat, 4 Jan 2020 at 22:20, Thiago Macieira <thiago_at_[hidden]> wrote:
> > enum class mib : uint32_t {
> > // names match
> > // https://www.iana.org/assignments/ianacharset-mib/ianacharset-mib
> > other = 1,
> > unknown = 2,
> > csASCII = 3,
> > csISOLatin1 = 4,
> > csUTF8 = 106,
> > csUTF16BE = 1013,
> > csUTF16LE = 1014,
> > csUTF16 = 1015,
> > csUTF32BE = 1017,
> > csUTF32LE = 1018,
> > csUTF32 = 1019
> > };
> >
> > However, a more powerful way for comparison would be to have a
> > text_encoding_id class that can compare to mib and to itself. It would be
> > able
> > to tell whether two unlisted (and possibly unregistered) encodings are the
> > same, whereas mib can possibly fail. This tex_encoding_id class can have a
> > mib() accessor that returns a mib number, but may return mib::other.
> > Hence,
> > direct mib comparison should be discouraged in favour of text_encoding_id.
>
> This is *exactly* what is proposed.
> They compare equal if:
> * they have tyhe same mib
> * they have the other mib and their name compare equal (under the
> comparison algorithm ignoring case dash and a few other things)
Maybe with the same effect, but my specification would be that the
text_encoding_id has an internal representation that is looked up or
calculated on creation and that's what's compared, not the MIB or text name.
That implies that an implementation is not required to compare equal any ID it
doesn't know about. The only required ones are the names and aliases as
currently defined by IANA of the mandatory character sets as listed above.
That has the side-effect that when
cs1.mib() == mib::unknown
cs2.mib() == mib::unknown
then cs1 == cs2, regardless of how cs1 and cs2 were created. That means on
some implementations, text_encoding_id("WTF-8") == text_encoding_id("SJIS").
> On Sat, 4 Jan 2020 at 22:20, Thiago Macieira <thiago_at_[hidden]> wrote:
> > enum class mib : uint32_t {
> > // names match
> > // https://www.iana.org/assignments/ianacharset-mib/ianacharset-mib
> > other = 1,
> > unknown = 2,
> > csASCII = 3,
> > csISOLatin1 = 4,
> > csUTF8 = 106,
> > csUTF16BE = 1013,
> > csUTF16LE = 1014,
> > csUTF16 = 1015,
> > csUTF32BE = 1017,
> > csUTF32LE = 1018,
> > csUTF32 = 1019
> > };
> >
> > However, a more powerful way for comparison would be to have a
> > text_encoding_id class that can compare to mib and to itself. It would be
> > able
> > to tell whether two unlisted (and possibly unregistered) encodings are the
> > same, whereas mib can possibly fail. This tex_encoding_id class can have a
> > mib() accessor that returns a mib number, but may return mib::other.
> > Hence,
> > direct mib comparison should be discouraged in favour of text_encoding_id.
>
> This is *exactly* what is proposed.
> They compare equal if:
> * they have tyhe same mib
> * they have the other mib and their name compare equal (under the
> comparison algorithm ignoring case dash and a few other things)
Maybe with the same effect, but my specification would be that the
text_encoding_id has an internal representation that is looked up or
calculated on creation and that's what's compared, not the MIB or text name.
That implies that an implementation is not required to compare equal any ID it
doesn't know about. The only required ones are the names and aliases as
currently defined by IANA of the mandatory character sets as listed above.
That has the side-effect that when
cs1.mib() == mib::unknown
cs2.mib() == mib::unknown
then cs1 == cs2, regardless of how cs1 and cs2 were created. That means on
some implementations, text_encoding_id("WTF-8") == text_encoding_id("SJIS").
-- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Software Architect - Intel System Software Products
Received on 2020-01-05 05:42:35