C++ Logo

sg16

Advanced search

Re: [SG16] P1885: Naming text encodings: Curation and provenance of aliases

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Wed, 8 Sep 2021 14:17:41 -0400
On Wed, Sep 8, 2021 at 1:08 PM Corentin <corentin.jabot_at_[hidden]> wrote:

>
>
> On Wed, Sep 8, 2021 at 6:50 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:
>
>> On 08/09/2021 18.08, Hubert Tong via SG16 wrote:
>> > As it is, I think it is worthwhile to revisit whether the generality of
>> the implementation-defined behaviour is advisable. It seems that, as the
>> paper evolved, at least one implementation-injected alias was meant to be
>> the "preferred name" on the system returned or recognized by various APIs
>> (e.g., iconv_open). Even that is problematic: There is a tendency in
>> converter applications to treat a de facto "reigning" extension as being
>> what is meant when the non-extended standard is requested. In highly
>> architected environments, the csShiftJIS and csWindows31J "problem" that is
>> present in ICU would manifest as there being only one API-recognized
>> "preferred name". The present design intent of P1885 in having
>> non-overlapping sets of aliases is in conflict with the desire to associate
>> the "preferred name" as an alias in such situations.
>>
>> You seem to be saying that the preferred name for both csShiftJIS and
>> csWindows31J is supposed to be "Shift-JIS" (or so), but an alias is
>> supposed to be globally unique under P1885.
>>
>
> Yes, aliases and primary names are globally unique in P1885
> As such an implementation that would return Windows31J for "Shift-JIS''
> would not be valid.
> Note that in a few cases, it is preferable to know both the name and the
> platform to derive an exact transcoding table.
>
> The implementation-defined aliases permission does not allow for
> duplication.
> It exists in case implementers want to add data from other sources like
> POSIX locale files or whatever GetCPInfoExA would return on windows. to the
> extent that it would not introduce duplication.
>

I believe that the "gotcha" on the ability to add data from other sources
waters down the desirability of that ability by a lot.


>
> I think Hubert point is that Windows users may not get the result they
> expect by constructing a text encoding from "Shift-JIS" - If what they want
> is actually "Windows-31J", given that most users will refer to
> "Windows-31J" as "Shift JIS" colloquially. Unfortunately, I am not sure
> that any amount of API design can make that scenario less confusing.
>

Insisting that "Shift-JIS" is ambiguous and making the user disambiguate
from a selection of choices is a possible direction to resolving this case;
however, I believe an API design that is less ambitious could also make
similar scenarios less confusing: That is, if the set of registered
character sets and their associated properties are strictly a
representation of the IANA character set registry.


>
>
>
>>
>> Jens
>>
>>

Received on 2021-09-08 13:18:10