C++ Logo

sg16

Advanced search

Re: [SG16] Feedback re: P1885R5: Naming Text Encodings

From: Corentin <corentin.jabot_at_[hidden]>
Date: Mon, 26 Jul 2021 17:48:48 +0200
On Mon, Jul 26, 2021 at 5:00 PM Hubert Tong <
hubert.reinterpretcast_at_[hidden]> wrote:

> Apologies for sending this so close to the start of the telecon.
>
> The paper refers to RFC 3808 which established the "IANA Charset MIB" and
> specifically says:
>
>> However, [rfc3808] is from 2004 and has not been updated.
>
>
> The paper should at least make some reference to the location of the
> location where the "IANA Charset MIB" is maintained:
> https://www.iana.org/assignments/ianacharset-mib/ianacharset-mib. As of
> the document date indicated in P1885R5, the MIB module was last updated in
> January of the current year (2021). It seems this location was mentioned in
> passing on the SG16 reflector but with no emphasis on the significance.
>

Good point, I will update the paper :)


>
> Also, I find the choice of naming the accessor that produces the MIBenum
> value from the IANA Charset Registry "mib" to be unfortunate. It seems
> there are unambiguous precedent cases in libraries (including those for
> other programming languages) using "mib" to refer to MIBenum values, so
> this does not rise to the level of an objection. I see no reason for the
> paper to use the term this way in prose though. I do note that RFCs do seem
> to use the term not only to refer to MIBs but also to MIB modules (but a
> MIBenum value is neither a MIB nor a MIB module).
>

I am open to better name suggestions!


> Regarding the naming of the enumerator values, I am not fond of excess
> "invention" here. There are names (beginning with "cs") in the reference
> documents. Using those names (including the "cs" prefix) makes even the
> "csUnicode" case "merely following established practice".
>

Unicode is very problematic and goes back to a time when UCS-2 and Unicode
were equivalent terms. I removed the cs prefix because C++ has enum classes
making it non necessary.


> Regarding the underlying type of the enumeration: The corresponding
> definition in RFC 3808 uses ASN.1 INTEGER (which does not have a length
> limit).
>

Using a specific length ensures forward compatibility - it's basically an
abi concerns - i think there was justification for that in the paper
otherwise we can explain it better


> Regarding the "environment" functions, I think the wording needs more work
> to address cases where the value of the LANG environment variable is
> changed.
>

Do you have specific suggestions? Thanks!


Thanks for the feedback Hubert!

Received on 2021-07-26 10:49:04