C++ Logo


Advanced search

Subject: Re: Feedback re: P1885R5: Naming Text Encodings
From: Corentin (corentin.jabot_at_[hidden])
Date: 2021-07-26 10:48:48

On Mon, Jul 26, 2021 at 5:00 PM Hubert Tong <
hubert.reinterpretcast_at_[hidden]> wrote:

> Apologies for sending this so close to the start of the telecon.
> The paper refers to RFC 3808 which established the "IANA Charset MIB" and
> specifically says:
>> However, [rfc3808] is from 2004 and has not been updated.
> The paper should at least make some reference to the location of the
> location where the "IANA Charset MIB" is maintained:
> https://www.iana.org/assignments/ianacharset-mib/ianacharset-mib. As of
> the document date indicated in P1885R5, the MIB module was last updated in
> January of the current year (2021). It seems this location was mentioned in
> passing on the SG16 reflector but with no emphasis on the significance.

Good point, I will update the paper :)

> Also, I find the choice of naming the accessor that produces the MIBenum
> value from the IANA Charset Registry "mib" to be unfortunate. It seems
> there are unambiguous precedent cases in libraries (including those for
> other programming languages) using "mib" to refer to MIBenum values, so
> this does not rise to the level of an objection. I see no reason for the
> paper to use the term this way in prose though. I do note that RFCs do seem
> to use the term not only to refer to MIBs but also to MIB modules (but a
> MIBenum value is neither a MIB nor a MIB module).

I am open to better name suggestions!

> Regarding the naming of the enumerator values, I am not fond of excess
> "invention" here. There are names (beginning with "cs") in the reference
> documents. Using those names (including the "cs" prefix) makes even the
> "csUnicode" case "merely following established practice".

Unicode is very problematic and goes back to a time when UCS-2 and Unicode
were equivalent terms. I removed the cs prefix because C++ has enum classes
making it non necessary.

> Regarding the underlying type of the enumeration: The corresponding
> definition in RFC 3808 uses ASN.1 INTEGER (which does not have a length
> limit).

Using a specific length ensures forward compatibility - it's basically an
abi concerns - i think there was justification for that in the paper
otherwise we can explain it better

> Regarding the "environment" functions, I think the wording needs more work
> to address cases where the value of the LANG environment variable is
> changed.

Do you have specific suggestions? Thanks!

Thanks for the feedback Hubert!

SG16 list run by sg16-owner@lists.isocpp.org