C++ Logo

sg16

Advanced search

Re: [SG16] LWG issue: Time formatters should not be locale sensitive by default

From: Corentin <corentin.jabot_at_[hidden]>
Date: Thu, 29 Apr 2021 01:55:48 +0200
I wanted to address the "locale-independent" specifications.
There are *none*, in the POSIX strftime spec.

https://pubs.opengroup.org/onlinepubs/009695399/functions/strftime.html
The strftime() function shall place bytes into the array pointed to by s as
controlled by the string pointed to by format. The format is a character
string, beginning and ending in its initial shift state, if any. The format
string consists of zero or more conversion specifications and ordinary
characters. A conversion specification consists of a '%' character,
possibly followed by an E or O modifier, and a terminating conversion
specifier character that determines the conversion specification's
behavior. All ordinary characters (including the terminating null byte) are
copied unchanged into the array. If copying takes place between objects
that overlap, the behavior is undefined. No more than maxsize bytes are
placed into the array. Each conversion specifier is replaced by appropriate
characters as described in the following list. The appropriate characters
are determined using the LC_TIME category of the current locale and by the
values of zero or more members of the broken-down time structure pointed to
by timeptr, as specified in brackets in the description. If any of the
specified values are outside the normal range, the characters stored are
unspecified.

The %O are an opt-in into the locale alternative numeral system.
You might want to have dates with arabic numerals and names in hindi, for
example.

so "1 AM" can be either "१ पूर्वाह्न", or " 1 पूर्वाह्न," depending on
whether you want to use the devanagari numerals or not.


See also
https://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_03_05
*alt_digits*Define alternative symbols for digits, corresponding to
the %O modified
conversion specification. [...] The %O modifier shall indicate that the
string corresponding to the value specified via the conversion
specification shall be used instead of the value.

The way this is handled by CLDR ( and therefore most PL), is that the
desired numbering system is attached to the locale name, or is provided as
part of supplementary options


Here is an example using javascript (tested locally with node)
[image: image.png]

Notice that

   - The concern of numeral system vs formatting is separate
   - Most locales defaults to latin number ( but not arabic in this
   example), I am not exactly sure why
   - Few programming languages offer a per specifier choice of numbering
   systems, these things are not usually mixed.


Now whether the %O specifier of POSIX makes sense or not is an interesting
question, but I wanted to point out they are no less or more depending on
locale than other specifiers.

 {:L%u} formats a week day number using the locale primary numeral system
{:L%Ou} formats a week day number using the locale alternative numeral
system

What if you pass the C locale ?
Well, the C locale numeral primary system is arabic numbers, it does not
have an alternative numeric system

In all cases, It does what it says it does

Sorry I didn't catch that concern during the meeting.
*I hope you will reconsider the second poll as we clearly missed some
pretty critical information! *



PS:
You will notice that this brings more questions than it answers.
What if the globale locale uses a non-arabic numeral system? What is the
default numeral system? Why is there a primary and alternative. What if you
need a third?
Why does time formatting care about that when none of the other locale
facilities seem to?

But this is clearly out of scope of this issue!


More reference

http://cldr.unicode.org/translation/-core-data/numbering-systems
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/NumberFormat/NumberFormat
https://lh.2xlibre.net/values/alt_digits/
https://unicode-org.github.io/icu/userguide/locale/

Received on 2021-04-28 18:56:02