C++ Logo

SG16

Advanced search

Subject: Re: LWG issue: Time formatters should not be locale sensitive by default
From: Tom Honermann (tom_at_[hidden])
Date: 2021-05-01 13:22:16


Thank you, Corentin. This is very useful.

For reference, here are the polls we took:

Poll: LWG3547 raises a valid design defect in [time.format] in C++20.

SF F N A SA
7 2 2 0 0

Attendance: 11

Consensus: Strong consensus that this is a design defect.

Poll: The proposed LWG3547 resolution as written should be applied to
C++23.

SF F N A SA
0 4 2 4 1

Attendance: 11

Consensus: No consensus for the resolution

SA motivation: Migrating things embedded in a string literal is very
difficult. There are options to deal with this in an additive way.
Needless break in backwards with compatibility.

Speaking for myself, my position in that 2nd poll was weak and I could have been influenced in either direction. I suspect that is true for some others as well. I therefore recommend attributing little weight to that poll, especially given new information.

It sounds like LEWG may take up this issue on Monday. We’ll see whether there is a need for SG16 to revisit.

Tom.

> On Apr 28, 2021, at 7:55 PM, Corentin via SG16 <sg16_at_[hidden]> wrote:
>
> I wanted to address the "locale-independent" specifications.
> There are *none*, in the POSIX strftime spec.
>
> https://pubs.opengroup.org/onlinepubs/009695399/functions/strftime.html
> The strftime() function shall place bytes into the array pointed to by s as controlled by the string pointed to by format. The format is a character string, beginning and ending in its initial shift state, if any. The format string consists of zero or more conversion specifications and ordinary characters. A conversion specification consists of a '%' character, possibly followed by an E or O modifier, and a terminating conversion specifier character that determines the conversion specification's behavior. All ordinary characters (including the terminating null byte) are copied unchanged into the array. If copying takes place between objects that overlap, the behavior is undefined. No more than maxsize bytes are placed into the array. Each conversion specifier is replaced by appropriate characters as described in the following list. The appropriate characters are determined using the LC_TIME category of the current locale and by the values of zero or more members of the broken-down time structure pointed to by timeptr, as specified in brackets in the description. If any of the specified values are outside the normal range, the characters stored are unspecified.
>
> The %O are an opt-in into the locale alternative numeral system.
> You might want to have dates with arabic numerals and names in hindi, for example.
>
> so "1 AM" can be either "१ पूर्वाह्न", or " 1 पूर्वाह्न," depending on whether you want to use the devanagari numerals or not.
>
>
> See also
> https://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_03_05
> alt_digits
> Define alternative symbols for digits, corresponding to the %O modified conversion specification. [...] The %O modifier shall indicate that the string corresponding to the value specified via the conversion specification shall be used instead of the value.
>
>
> The way this is handled by CLDR ( and therefore most PL), is that the desired numbering system is attached to the locale name, or is provided as part of supplementary options
>
>
> Here is an example using javascript (tested locally with node)
> <image.png>
>
> Notice that
> The concern of numeral system vs formatting is separate
> Most locales defaults to latin number ( but not arabic in this example), I am not exactly sure why
> Few programming languages offer a per specifier choice of numbering systems, these things are not usually mixed.
>
> Now whether the %O specifier of POSIX makes sense or not is an interesting question, but I wanted to point out they are no less or more depending on locale than other specifiers.
>
> {:L%u} formats a week day number using the locale primary numeral system
> {:L%Ou} formats a week day number using the locale alternative numeral system
>
> What if you pass the C locale ?
> Well, the C locale numeral primary system is arabic numbers, it does not have an alternative numeric system
>
> In all cases, It does what it says it does
>
> Sorry I didn't catch that concern during the meeting.
> I hope you will reconsider the second poll as we clearly missed some pretty critical information!
>
>
>
> PS:
> You will notice that this brings more questions than it answers.
> What if the globale locale uses a non-arabic numeral system? What is the default numeral system? Why is there a primary and alternative. What if you need a third?
> Why does time formatting care about that when none of the other locale facilities seem to?
>
> But this is clearly out of scope of this issue!
>
>
> More reference
>
> http://cldr.unicode.org/translation/-core-data/numbering-systems
> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/NumberFormat/NumberFormat
> https://lh.2xlibre.net/values/alt_digits/
> https://unicode-org.github.io/icu/userguide/locale/
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16



SG16 list run by sg16-owner@lists.isocpp.org