C++ Logo

SG16

Advanced search

Subject: Re: Bike shedding for Christmas: P1885 Naming Text Encodings
From: Corentin Jabot (corentinjabot_at_[hidden])
Date: 2020-01-06 07:09:26


On Mon, 6 Jan 2020 at 13:24, Thiago Macieira <thiago_at_[hidden]> wrote:

> On Sunday, 5 January 2020 09:56:29 -03 Corentin Jabot wrote:
> > > Sorry, I disagree. If the implementation doesn't know this encoding,
> then
> > > by
> > > definition it's "unknown". "other" should only be used for encodings it
> > > knows
> > > about but which are not registered with IANA.
> >
> > I am not sure I see the value in that?
> > It would mean the implementation needs to maintain a list of non
> registered
> > encodings it knows about (which my implementation doesn't do)\
> > And then we have 3 states : unknown, other, invalid. I am not sure
> > differentiating unknown and invalid is pertinent?
>
> Yes, of course. Which is how I described it: upon creation, it sets an
> internal handle to the description, which gets used to retrieve the
> official
> name, the aliases, and the MIB number. Moreover, that internal handle can
> be
> used by the encoder and decoder to get the mapping tables or conversion
> routines. And if the implementation knows about an unregistered codec,
> then
> such internal handle exists and it must have a number for the MIB field.
> It
> can only be mib::other.
>
> The point is I don't see the value in handling names the implementation
> doesn't have an encoding for. An encoding ID is going to be used either to
> be
> used literally, in a context like the Content-Type line of a MIME header
> or
> it's going to be used to encode or decode content. So what use is there
> for an
> encoding ID that can't be reliably used for either?
>
> But as I write this, I realise that the Standard Library is different. Its
> text_encoding_id must be designed so it can be used with other libraries,
> which may contain more codecs than the Standard Library itself carries.
> Which
> means the text_encoding_id should be able to handle an arbitrary codec
> name.
>
> And yet that is nonsense. It can't convert a codec name to its MIB number
> unless that is in a table somewhere the implementation has access to. So
> by
> definition, the text_encoding_id is limited to the codecs the Standard
> Library
> knows about. Other libraries should deploy their own text_encoding_id
> equivalents.
>

That is a good point for which i think the solution might be to force
hosted implementation to
always provide the entire table (which is really not that big) ?

>
> > > I really do not see the point in text_encoding_id being able to handle
> > > encodings the implementation doesn't know about. It's never going to be
> > > able
> > > to encode to them or decode from them. It won't know what the official
> > > name
> > > should be to write a Content-Type MIME header line.
> >
> > There is right know no relation whatsoever between my proposal and
> > encoding/decoding facilities
> > It is _just_ a name
>
> And I'm claiming such a proposal in isolation is not useful. It should be
> tied
> to further functionality that gives the name a purpose.
>
> This whole thread is the result of having to figure out all the possible
> uses
> under the Sun (and under other stars), current or future, instead of just
> focusing on what people need to do. I understand the Standard Library
> needs to
> foresee uses in 30 years time as the cost of deprecation is high.
>
> But maybe this indicates the functionality should not be in the Standard
> Library in the first place. The Standard Library has no need to deprecate
> ICU
> and shouldn't need to have support arbitrary codecs and codec names. Just
> as I
> don't think it should go into 2D or 3D graphics, JSON processing, or
> machine
> learning.
>

I think at some point we lost track of what the proposal is about:
It's about answering:
- What is the execution character encoding (which only the implementation
can do)
- What is the environment encoding (which the implementation can do better)

And have that information be consistent across platform when possible (for
interaction with libraries such as Qt, icu, iconv) - everything else is
secondary.
Which means an implementation will provide informations about encoding
relevant to the platform.

Now, an encoding id is 3 things:
- A name,
- A mib when applicable
- Aliases when applicable

And the name used to construct the object is used to lookup the extra
optional informations.
I think the only reason to differentiate "unknown" and "other" in the way
you suggest is if
we need to support aliases for non registered encodings.
Is that the case?

I don't see why "this is an encoding that exists on my platform" and "i can
convert to/from that encoding" need to be coupled in that way.
Even if there is some overlap.

> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> Software Architect - Intel System Software Products
>
>
>
>



SG16 list run by herb.sutter at gmail.com