C++ Logo

SG16

Advanced search

Subject: Re: Feedback re: P1885R5: Naming Text Encodings
From: Hubert Tong (hubert.reinterpretcast_at_[hidden])
Date: 2021-08-23 09:38:23


My apologies again for not sending this earlier.

On Mon, Jul 26, 2021 at 11:48 AM Corentin <corentin.jabot_at_[hidden]> wrote:

>
>
> On Mon, Jul 26, 2021 at 5:00 PM Hubert Tong <
> hubert.reinterpretcast_at_[hidden]> wrote:
>
>> Apologies for sending this so close to the start of the telecon.
>>
>> The paper refers to RFC 3808 which established the "IANA Charset MIB" and
>> specifically says:
>>
>>> However, [rfc3808] is from 2004 and has not been updated.
>>
>>
>> The paper should at least make some reference to the location of the
>> location where the "IANA Charset MIB" is maintained:
>> https://www.iana.org/assignments/ianacharset-mib/ianacharset-mib. As of
>> the document date indicated in P1885R5, the MIB module was last updated in
>> January of the current year (2021). It seems this location was mentioned in
>> passing on the SG16 reflector but with no emphasis on the significance.
>>
>
> Good point, I will update the paper :)
>

Thanks.

>
>>
>> Also, I find the choice of naming the accessor that produces the MIBenum
>> value from the IANA Charset Registry "mib" to be unfortunate. It seems
>> there are unambiguous precedent cases in libraries (including those for
>> other programming languages) using "mib" to refer to MIBenum values, so
>> this does not rise to the level of an objection. I see no reason for the
>> paper to use the term this way in prose though. I do note that RFCs do seem
>> to use the term not only to refer to MIBs but also to MIB modules (but a
>> MIBenum value is neither a MIB nor a MIB module).
>>
>
> I am open to better name suggestions!
>

Does `mib_enum()` make sense in place of `mib`? If it helps, the "enum"
part of the name comes from the "application domain" and not "from C++".

>
>
>> Regarding the naming of the enumerator values, I am not fond of excess
>> "invention" here. There are names (beginning with "cs") in the reference
>> documents. Using those names (including the "cs" prefix) makes even the
>> "csUnicode" case "merely following established practice".
>>
>
> Unicode is very problematic and goes back to a time when UCS-2 and Unicode
> were equivalent terms. I removed the cs prefix because C++ has enum classes
> making it non necessary.
>

I am going to again fall back on the "application domain" argument here.
Since we are deferring the actual meaning of what the encodings are to
"outside experts", then the "cs" prefix serves to reinforce that we mean
"whatever the external group is doing (including ambiguities or operating
environment particularities)".

>
>
>> Regarding the underlying type of the enumeration: The corresponding
>> definition in RFC 3808 uses ASN.1 INTEGER (which does not have a length
>> limit).
>>
>
> Using a specific length ensures forward compatibility - it's basically an
> abi concerns - i think there was justification for that in the paper
> otherwise we can explain it better
>

Oh, okay. Yes. I guess it is not common to hit 16-bit int nowadays, but I
agree 32-bits is better (and likely enough).

>
>
>> Regarding the "environment" functions, I think the wording needs more
>> work to address cases where the value of the LANG environment variable is
>> changed.
>>
>
> Do you have specific suggestions? Thanks!
>

Not yet, but the note is clear enough about the intent.

>
>
> Thanks for the feedback Hubert!
>



SG16 list run by sg16-owner@lists.isocpp.org