C++ Logo

SG16

Advanced search

Subject: Re: Bike shedding for Christmas: P1885 Naming Text Encodings
From: Thiago Macieira (thiago_at_[hidden])
Date: 2020-01-06 06:24:08


On Sunday, 5 January 2020 09:56:29 -03 Corentin Jabot wrote:
> > Sorry, I disagree. If the implementation doesn't know this encoding, then
> > by
> > definition it's "unknown". "other" should only be used for encodings it
> > knows
> > about but which are not registered with IANA.
>
> I am not sure I see the value in that?
> It would mean the implementation needs to maintain a list of non registered
> encodings it knows about (which my implementation doesn't do)\
> And then we have 3 states : unknown, other, invalid. I am not sure
> differentiating unknown and invalid is pertinent?

Yes, of course. Which is how I described it: upon creation, it sets an
internal handle to the description, which gets used to retrieve the official
name, the aliases, and the MIB number. Moreover, that internal handle can be
used by the encoder and decoder to get the mapping tables or conversion
routines. And if the implementation knows about an unregistered codec, then
such internal handle exists and it must have a number for the MIB field. It
can only be mib::other.

The point is I don't see the value in handling names the implementation
doesn't have an encoding for. An encoding ID is going to be used either to be
used literally, in a context like the Content-Type line of a MIME header or
it's going to be used to encode or decode content. So what use is there for an
encoding ID that can't be reliably used for either?

But as I write this, I realise that the Standard Library is different. Its
text_encoding_id must be designed so it can be used with other libraries,
which may contain more codecs than the Standard Library itself carries. Which
means the text_encoding_id should be able to handle an arbitrary codec name.

And yet that is nonsense. It can't convert a codec name to its MIB number
unless that is in a table somewhere the implementation has access to. So by
definition, the text_encoding_id is limited to the codecs the Standard Library
knows about. Other libraries should deploy their own text_encoding_id
equivalents.

> > I really do not see the point in text_encoding_id being able to handle
> > encodings the implementation doesn't know about. It's never going to be
> > able
> > to encode to them or decode from them. It won't know what the official
> > name
> > should be to write a Content-Type MIME header line.
>
> There is right know no relation whatsoever between my proposal and
> encoding/decoding facilities
> It is _just_ a name

And I'm claiming such a proposal in isolation is not useful. It should be tied
to further functionality that gives the name a purpose.

This whole thread is the result of having to figure out all the possible uses
under the Sun (and under other stars), current or future, instead of just
focusing on what people need to do. I understand the Standard Library needs to
foresee uses in 30 years time as the cost of deprecation is high.

But maybe this indicates the functionality should not be in the Standard
Library in the first place. The Standard Library has no need to deprecate ICU
and shouldn't need to have support arbitrary codecs and codec names. Just as I
don't think it should go into 2D or 3D graphics, JSON processing, or machine
learning.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel System Software Products

SG16 list run by herb.sutter at gmail.com