C++ Logo

sg16

Advanced search

Re: [SG16] LWG issue: Time formatters should not be locale sensitive by default

From: Charlie Barto <Charles.Barto_at_[hidden]>
Date: Thu, 29 Apr 2021 02:51:57 +0000
If we were to adopt the “spirit” of the resolution presented today we’d probably end up with four (but really three since the alternate modes for the C locale are the same as the non-alternate modes) different “locale settings” for some specifiers.

I think that’s actually fine. (as long as we don’t break user’s code)


While the posix strftime is impossible to use in a locale insensitive manner (since locales are global and another thread could change it), ours is possible to use in such a way, since you can always just pass locale::classic() to format.

I like Jens’ idea of fixing it with a new syntax.

Charlie.

From: Corentin <corentin.jabot_at_[hidden]mail.com>
Sent: Wednesday, April 28, 2021 4:56 PM
To: Victor Zverovich <victor.zverovich_at_[hidden]>; Charlie Barto <Charles.Barto_at_[hidden]>
Cc: SG16 <sg16_at_[hidden]ocpp.org>
Subject: Re: [SG16] LWG issue: Time formatters should not be locale sensitive by default

I wanted to address the "locale-independent" specifications.
There are *none*, in the POSIX strftime spec.

https://pubs.opengroup.org/onlinepubs/009695399/functions/strftime.html<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpubs.opengroup.org%2Fonlinepubs%2F009695399%2Ffunctions%2Fstrftime.html&data=04%7C01%7CCharles.Barto%40microsoft.com%7C3471cd1c4f354e26e64e08d90aa12c8e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637552509647813149%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=udip5RFBK9JXyCuDcESSgoQ5bZFsPQRVaIUgYw%2BRbDM%3D&reserved=0>
The strftime() function shall place bytes into the array pointed to by s as controlled by the string pointed to by format. The format is a character string, beginning and ending in its initial shift state, if any. The format string consists of zero or more conversion specifications and ordinary characters. A conversion specification consists of a '%' character, possibly followed by an E or O modifier, and a terminating conversion specifier character that determines the conversion specification's behavior. All ordinary characters (including the terminating null byte) are copied unchanged into the array. If copying takes place between objects that overlap, the behavior is undefined. No more than maxsize bytes are placed into the array. Each conversion specifier is replaced by appropriate characters as described in the following list. The appropriate characters are determined using the LC_TIME category of the current locale and by the values of zero or more members of the broken-down time structure pointed to by timeptr, as specified in brackets in the description. If any of the specified values are outside the normal range, the characters stored are unspecified.

The %O are an opt-in into the locale alternative numeral system.
You might want to have dates with arabic numerals and names in hindi, for example.

so "1 AM" can be either "१ पूर्वाह्न", or " 1 पूर्वाह्न," depending on whether you want to use the devanagari numerals or not.


See also
https://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_03_05<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpubs.opengroup.org%2Fonlinepubs%2F009695399%2Fbasedefs%2Fxbd_chap07.html%23tag_07_03_05&data=04%7C01%7CCharles.Barto%40microsoft.com%7C3471cd1c4f354e26e64e08d90aa12c8e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637552509647813149%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=rt%2BBjyugBCGQGt1vBY5PPpQGOM5yjigWIjLGF7j337k%3D&reserved=0>
alt_digits
Define alternative symbols for digits, corresponding to the %O modified conversion specification. [...] The %O modifier shall indicate that the string corresponding to the value specified via the conversion specification shall be used instead of the value.


The way this is handled by CLDR ( and therefore most PL), is that the desired numbering system is attached to the locale name, or is provided as part of supplementary options


Here is an example using javascript (tested locally with node)
[cid:image001.png_at_[hidden]]

Notice that

  * The concern of numeral system vs formatting is separate
  * Most locales defaults to latin number ( but not arabic in this example), I am not exactly sure why
  * Few programming languages offer a per specifier choice of numbering systems, these things are not usually mixed.

Now whether the %O specifier of POSIX makes sense or not is an interesting question, but I wanted to point out they are no less or more depending on locale than other specifiers.

 {:L%u} formats a week day number using the locale primary numeral system
{:L%Ou} formats a week day number using the locale alternative numeral system

What if you pass the C locale ?
Well, the C locale numeral primary system is arabic numbers, it does not have an alternative numeric system

In all cases, It does what it says it does

Sorry I didn't catch that concern during the meeting.
I hope you will reconsider the second poll as we clearly missed some pretty critical information!



PS:
You will notice that this brings more questions than it answers.
What if the globale locale uses a non-arabic numeral system? What is the default numeral system? Why is there a primary and alternative. What if you need a third?
Why does time formatting care about that when none of the other locale facilities seem to?

But this is clearly out of scope of this issue!


More reference

http://cldr.unicode.org/translation/-core-data/numbering-systems<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcldr.unicode.org%2Ftranslation%2F-core-data%2Fnumbering-systems&data=04%7C01%7CCharles.Barto%40microsoft.com%7C3471cd1c4f354e26e64e08d90aa12c8e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637552509647823107%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=KBT0l0vSxN8kKycr0o9P6jQWVeWuwWeLFD41PBLKLVk%3D&reserved=0>
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/NumberFormat/NumberFormat<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeveloper.mozilla.org%2Fen-US%2Fdocs%2FWeb%2FJavaScript%2FReference%2FGlobal_Objects%2FIntl%2FNumberFormat%2FNumberFormat&data=04%7C01%7CCharles.Barto%40microsoft.com%7C3471cd1c4f354e26e64e08d90aa12c8e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637552509647823107%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=10iIo1jp0omsARhXosIX4eeB0jJ6WpVMPavhTP9Baas%3D&reserved=0>
https://lh.2xlibre.net/values/alt_digits/<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flh.2xlibre.net%2Fvalues%2Falt_digits%2F&data=04%7C01%7CCharles.Barto%40microsoft.com%7C3471cd1c4f354e26e64e08d90aa12c8e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637552509647823107%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=1DRNXkhVCmnwJA1DSyb98o09Hcm3vY2lWSqA%2BrnPKvc%3D&reserved=0>
https://unicode-org.github.io/icu/userguide/locale/<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Funicode-org.github.io%2Ficu%2Fuserguide%2Flocale%2F&data=04%7C01%7CCharles.Barto%40microsoft.com%7C3471cd1c4f354e26e64e08d90aa12c8e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637552509647833058%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=tLHh8710Rfac6WTQyd%2Fn0aLqGEFZWaTSZpGmdwX3gSk%3D&reserved=0>

Received on 2021-04-28 21:52:01