C++ Logo

sg16

Advanced search

Re: [SG16] Bike shedding for Christmas: P1885 Naming Text Encodings

From: Thiago Macieira <thiago_at_[hidden]>
Date: Mon, 06 Jan 2020 10:27:01 -0300
On Monday, 6 January 2020 10:09:26 -03 Corentin Jabot wrote:
> > And yet that is nonsense. It can't convert a codec name to its MIB number
> > unless that is in a table somewhere the implementation has access to. So
> > by
> > definition, the text_encoding_id is limited to the codecs the Standard
> > Library
> > knows about. Other libraries should deploy their own text_encoding_id
> > equivalents.
>
> That is a good point for which i think the solution might be to force
> hosted implementation to
> always provide the entire table (which is really not that big) ?

It's not, but even then you have the problem that the table in the vendor's
implementation may be out of date compared to what the application expects.
And are vendors allowed to extend the table with other names, such as WTF-8?

Like I said, if all you wanted was the table, you can get the table. I'll
write an XSL-T script for you to generate the table....

> I think at some point we lost track of what the proposal is about:
> It's about answering:
> - What is the execution character encoding (which only the implementation
> can do)
> - What is the environment encoding (which the implementation can do better)

Ok, good points. If we restrict text_encoding_id to those, then
text_encoding_id has no need to support the full table or unknown codecs. By
definition, it supports only what the implementation supports.

> And have that information be consistent across platform when possible (for
> interaction with libraries such as Qt, icu, iconv) - everything else is
> secondary.
> Which means an implementation will provide informations about encoding
> relevant to the platform.
>
> Now, an encoding id is 3 things:
> - A name,
> - A mib when applicable
> - Aliases when applicable

Agreed, though implementations should be wary that the alias list might be
empty. Portable applications should rely on the MIB and on the official name.

> And the name used to construct the object is used to lookup the extra
> optional informations.
> I think the only reason to differentiate "unknown" and "other" in the way
> you suggest is if
> we need to support aliases for non registered encodings.
> Is that the case?

I think the implementation should strive to never return "unknown", except in
case of an internal failure to determine what the encoding is. As a matter of
quality, implementations should be designed not to do that.

And yet providing a list of well-known MIBs is useful in and of itself. In
that case, mib::unknown is a valid and well-known value.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel System Software Products

Received on 2020-01-06 07:29:35