C++ Logo

sg16

Advanced search

Re: [SG16] Bike shedding for Christmas: P1885 Naming Text Encodings

From: Thiago Macieira <thiago_at_[hidden]>
Date: Sun, 05 Jan 2020 09:28:17 -0300
On Sunday, 5 January 2020 09:12:35 -03 Corentin Jabot wrote:
> The way I have done it:
> If there is a name which is not known from the implementation the mib will
> be `other` rather than unknown
> text_encoding with unknown mib have no name (if two text_encoding are
> unknown they compare equal)

Sorry, I disagree. If the implementation doesn't know this encoding, then by
definition it's "unknown". "other" should only be used for encodings it knows
about but which are not registered with IANA.

> if you construct a text encoding with an unregistered or other wise name
> not known from the implementationm, it will have the "other" mib
> ie:
>
> text_encoding_id id("Rubbish");
> id.mib() == text_encoding_id::other;
> id.name() == "Rubbish"

I wouldn't do that. And if mib() == unknown, name() is not required to return
the original string or anything that was deduced from it.

I really do not see the point in text_encoding_id being able to handle
encodings the implementation doesn't know about. It's never going to be able
to encode to them or decode from them. It won't know what the official name
should be to write a Content-Type MIME header line.

> My interpretation of the rfc is that the mib unknown is meant to to mean
> "we cannot tell you what the mib is" (maybe you asked for the console
> encoding and there is no console on this device or no api to query it),
> rather than "this this is not registered" (which is what other is used for)

Right. "other" is used when the implementation knows the encoding but not its
MIB (because it isn't registered).

Note also the value 0 is left unused and could be used to stand for "invalid".

> > That implies that an implementation is not required to compare equal any
> > ID it doesn't know about.
>
> Currently, it can not even be constructed with a mib it doesn't know about.
> Either: It knows about the encoding or the mib will be other. There is no
> scenario in which you can construct a text_encoding with a mib which it
> doesn't know about.
> This is why I do not want to have a constructor taking an arbitrary mib.
> If that is actually useful, I'd rather have a static method which either
> return optional<text_encoding> or a text_encoding with mib unknown in case
> of the mib is not known of the implementation

I'm not talking about creating from a MIB, but instead about creating from a
text name.

And I don't see how you can prohibit an arbitrary MIB. If you allow a
constructor taking mib::csUTF8, then you allow the same constructor taking
mib(5). The only thing you can do is say that's implementation-defined.

text_encoding_id(mib(5)).mib() could be mib(5) or mib::unknown.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel System Software Products

Received on 2020-01-05 06:30:49