C++ Logo

SG16

Advanced search

Subject: Re: [SG16-Unicode] [isocpp-lib] [isocpp-lib-ext] [time.duration.io] : Is stream insertion behavior locale dependent when Period::type is micro?
From: Billy O'Neal (VC LIBS) (bion_at_[hidden])
Date: 2019-11-07 05:23:24


> The library doesn't need to assume. An example implementation (ignoring support for non-char types) could be: [...]
That does not do the correct thing because the locale on the target is often not the locale when compiling. At compile time we usually consider our 'execution character set' to be the ASCII subset for maximum resistance to changes in locale at runtime, but the compiler will generally pass through more strict settings if the user has set them.

> I think the Windows 10 comment is only relevant with respect to the run-time locale and choice of encoding for the console/terminal. Execution character set is independent of both of those.
It is dependent with both of those in that the choice of execution character set is constrained by the environment in which the program will run.

Billy3

________________________________
From: Tom Honermann <tom_at_[hidden]>
Sent: Wednesday, November 6, 2019 10:58:17 PM
To: Billy O'Neal (VC LIBS) <bion_at_[hidden]>; lib_at_[hidden] <lib_at_[hidden]>; Corentin <corentin.jabot_at_[hidden]>
Cc: C++ Library Evolution Working Group <lib-ext_at_[hidden]>; unicode_at_[hidden] <unicode_at_[hidden]>
Subject: Re: [isocpp-lib] [SG16-Unicode] [isocpp-lib-ext] [time.duration.io] : Is stream insertion behavior locale dependent when Period::type is micro?

On 11/6/19 10:20 PM, Billy O'Neal (VC LIBS) wrote:

> That isn't what it (is intended to) say, nor how I read it.

Then remove the qualifications about terminals or codecvt facets and talk only about the execution character set, and things are OK. (As Corentin's PR does)

> The intent of the wording was to allow Microsoft to use "µs" when the compiler is invoked with /execution-charset:utf-8 and to use "us" otherwise.

Given that UTF-8 support is still a rarely used user opt-in at this time only available on recent versions of Windows 10, it isn't an assumption the library is going to be able to make soon (i.e. the next decade)

The library doesn't need to assume. An example implementation (ignoring support for non-char types) could be:

template<class traits, class Rep, class Period>
void print_fancy_suffix(basic_ostream<char, traits>& os, const duration<Rep, Period>& d)
{
  static const char micro_sign[] = "\u00B5s";
  if (as_unsigned(micro_sign[0]) == 0xC2u &&
      as_unsigned(micro_sign[1]) == 0xB5u)
  {
    // execution character set smells like UTF-8.
    os << d.count() << micro_sign;
  } else {
    // execution character set smells like bad.
    os << d.count() << "us";
  }
}


There are, of course, better ways to do this if the compiler has the ability to inform the library what the execution character set really is (e.g., a predefined macro).

I'm not arguing for any particular choice on Microsoft's part.

I think the Windows 10 comment is only relevant with respect to the run-time locale and choice of encoding for the console/terminal. Execution character set is independent of both of those.

Tom.



Billy3



________________________________
From: Tom Honermann <tom_at_[hidden]><mailto:tom_at_[hidden]>
Sent: Wednesday, November 6, 2019 5:38:34 PM
To: Billy O'Neal (VC LIBS) <bion_at_[hidden]><mailto:bion_at_[hidden]>; lib_at_[hidden]<mailto:lib_at_[hidden]> <lib_at_[hidden]><mailto:lib_at_[hidden]>; Corentin <corentin.jabot_at_[hidden]><mailto:corentin.jabot_at_[hidden]>
Cc: C++ Library Evolution Working Group <lib-ext_at_[hidden]><mailto:lib-ext_at_[hidden]>; unicode_at_[hidden]<mailto:unicode_at_[hidden]> <unicode_at_[hidden]><mailto:unicode_at_[hidden]>
Subject: Re: [isocpp-lib] [SG16-Unicode] [isocpp-lib-ext] [time.duration.io] : Is stream insertion behavior locale dependent when Period::type is micro?

On 11/6/19 5:30 PM, Billy O'Neal (VC LIBS) wrote:

Corentin's PR says "if char (the execution encoding) can always represent µ for your implementation, use that. Otherwise use u." Which means on my implementation where char can't always represent such a thing as that is locale dependent. we will statically use u (and µ for wchar_t); but an implementation that assumes char is UTF-8 could use µ.


The LWG issue's PR says "if the stream can detect that it is targeting a console or codecvt facet that don't support µ, an implementation may use u, otherwise they use µ". But streams have no means of doing that detection. (And the answer can even change if someone changes the streambuf)

That isn't what it (is intended to) say, nor how I read it. It states that the suffix is determined by the execution character set (the character set used for string literals and known at compile time); that is in the first sentence. The second sentence acknowledges that if the native character set (the run-time locale dependent character set) lacks representation for the character, then all bets are off with regard to how the character is actually displayed (or converted by a codecvt facet).

The intent of the wording was to allow Microsoft to use "µs" when the compiler is invoked with /execution-charset:utf-8 and to use "us" otherwise.

Tom.



Billy3



________________________________
From: Tom Honermann <tom_at_[hidden]><mailto:tom_at_[hidden]>
Sent: Wednesday, November 6, 2019 5:14:18 PM
To: Billy O'Neal (VC LIBS) <bion_at_[hidden]><mailto:bion_at_[hidden]>; lib_at_[hidden]<mailto:lib_at_[hidden]> <lib_at_[hidden]><mailto:lib_at_[hidden]>; Corentin <corentin.jabot_at_[hidden]><mailto:corentin.jabot_at_[hidden]>
Cc: C++ Library Evolution Working Group <lib-ext_at_[hidden]><mailto:lib-ext_at_[hidden]>; unicode_at_[hidden]<mailto:unicode_at_[hidden]> <unicode_at_[hidden]><mailto:unicode_at_[hidden]>
Subject: Re: [isocpp-lib] [SG16-Unicode] [isocpp-lib-ext] [time.duration.io] : Is stream insertion behavior locale dependent when Period::type is micro?

On 11/6/19 4:30 PM, Billy O'Neal (VC LIBS) wrote:

> Please read the wording again. Note that it says that, if those conditions are true, then the result is unspecified.

If "the wording" means the P/R of https://cplusplus.github.io/LWG/issue331437086779033889953&sdata=gfv7uzxwY5Ol8guxD0C179G6xnDBcdbt2qA%2FVrE3AyU%3D&reserved=0>, the wording there implies that we must make some effort to determine that the condition is true, which in practice we cannot do because the interface between streams and streambufs is public.

Yes, that is the wording I meant. The intent is to ensure the implementation does *not* have to put forth such effort. I don't understand where such an implication is coming from, but that wording has confused at least three experienced wordsmiths, so I acknowledge there is an issue, but I don't understand what it is.

I think it is important to say something here. Otherwise, one could claim that the terminal failing to display "μs" because it is configured for an incompatible encoding is non-conforming. Well, to the extent that the standard addresses such devices.

Tom.



Corentin's P/R below seems to not have this concern.



Billy3



________________________________
From: Lib <lib-bounces_at_[hidden]><mailto:lib-bounces_at_[hidden]> on behalf of Tom Honermann via Lib <lib_at_[hidden]><mailto:lib_at_[hidden]>
Sent: Wednesday, November 6, 2019 1:12:48 PM
To: Corentin <corentin.jabot_at_[hidden]><mailto:corentin.jabot_at_[hidden]>
Cc: Tom Honermann <tom_at_[hidden]><mailto:tom_at_[hidden]>; C++ Library Evolution Working Group <lib-ext_at_[hidden]><mailto:lib-ext_at_[hidden]>; Library Working Group <lib_at_[hidden]><mailto:lib_at_[hidden]>; unicode_at_[hidden]<mailto:unicode_at_[hidden]> <unicode_at_[hidden]><mailto:unicode_at_[hidden]>
Subject: Re: [isocpp-lib] [SG16-Unicode] [isocpp-lib-ext] [time.duration.io] : Is stream insertion behavior locale dependent when Period::type is micro?

The intent of the wording is to say that implementors do *not* need to be aware of terminals or codecvt facets. Without this, the wording could be read that implementations must implement magic to make the character display correctly.

Please read the wording again. Note that it says that, if those conditions are true, then the result is unspecified.

Tom.

On Nov 6, 2019, at 12:07 PM, Corentin <corentin.jabot_at_[hidden]<mailto:corentin.jabot_at_[hidden]>> wrote:

Then I would just say associated execution encoding with charT

Extremely uncomfortable with involving stream, console or anything else not known at compile time

On Wed, 6 Nov 2019 at 04:51, Tom Honermann <tom_at_[hidden]<mailto:tom_at_[hidden]>> wrote:
On 11/6/19 8:30 AM, Howard Hinnant wrote:

You can comment the LWG issue (if you want) by emailing said comment to lwgchair_at_[hidden]<mailto:lwgchair_at_[hidden]>, specifying which issue you wish to comment and supplying the comment.

Howard

On Nov 5, 2019, at 10:32 PM, Corentin via Lib-Ext <lib-ext_at_[hidden]><mailto:lib-ext_at_[hidden]> wrote:


Not sure how to do that proceduraly but here is some alternative wording.
The "runtime" locale-tied encoding is *assumed to be* a super set of the execution encoding - to the extent the standard doesn't distinguish between the two


If Period::type is micro, but the <ins>abstract</ins> character <ins>µ , which has the universal character name </ins> U+00B5 cannot be represented in the <ins>execution</ins> encoding <del>used for</del><ins> associated with the character type </ins> charT, the unit suffix "us" is used instead of "µs".

Howard and I discussed the wording I proposed today and we're now on the same page with regard to the intent.

With regard to Corentin's suggested wording above, "abstract character" and "execution encoding" are not current terms in the standard (well, the former is inherited from our reference to the Unicode standard but is otherwise unused at present). P1859R0<
https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwg21.link%2Fp1859r0&data=02%7C01%7Cbion%40microsoft.com%7C371c6ae112934c8e66ca08d7630cd250%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086779033899949&sdata=oRDqgPM%2BQYpE7tvZ%2FNdfTgdtQfJ4IlCfccsiCFj3aWU%3D&reserved=0> does intend to standardize new terminology, but we don't yet have consensus for what the new terms should be named. I think we should avoid using candidate names until we have such consensus.

Tom.


On Mon, 4 Nov 2019 at 15:42, Tom Honermann via Lib-Ext <lib-ext_at_[hidden]><mailto:lib-ext_at_[hidden]> wrote:
A new LWG issue was filed for this question today:
- https://cplusplus.github.io/LWG/issue331437086779033899949&sdata=U5A%2BsZ8XsYQl6KIQpM%2FdifLb70Hs3igIHBHVdsMPFyI%3D&reserved=0>

This issue concerns the ostream inserters added for std::chrono::duration in C++20 and what the intended behavior is for a duration when period::type is micro.

[time.duration.io<
https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftime.duration.io&data=02%7C01%7Cbion%40microsoft.com%7C371c6ae112934c8e66ca08d7630cd250%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086779033909944&sdata=GX4dXIwJ%2FLbhh%2BIOPS8nm0WqPZDRGbW38BEd450UsFw%3D&reserved=0>]p4 states:




If Period::type is micro, but the character U+00B5 cannot be represented in the encoding used for charT, the unit suffix "us" is used instead of "μs".



The question is with regard to which one of the encodings used for charT is referred to here; the compile-time execution character set or the run-time locale dependent native character set?

The proposed resolution specifies that the compile-time execution character set is the intended one. My expectation is that this aligns with existing implementations, but I haven't checked.

Tom.



_______________________________________________
Lib-Ext mailing list
Lib-Ext_at_[hidden]<mailto:Lib-Ext_at_[hidden]>
Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/lib-extab2d7cd011db47%7C1%7C0%7C637086779033909944&sdata=ChXa5r4gFfKFLSCp5W0r5KxJp2wQXITkyc%2Fl4qj7T%2FU%3D&reserved=0>
Link to this post:
http://lists.isocpp.org/lib-ext/2019/11/13309.phpcd011db47%7C1%7C0%7C637086779033919938&sdata=LvQSK0LbtvfCPYA%2BJEUGBQcc4xgqYrqIVOW%2BfzKZFNA%3D&reserved=0>
_______________________________________________
Lib-Ext mailing list
Lib-Ext_at_[hidden]<mailto:Lib-Ext_at_[hidden]>
Subscription:
https://lists.isocpp.org/mailman/listinfo.cgi/lib-extab2d7cd011db47%7C1%7C0%7C637086779033919938&sdata=HwWb%2F5ULhnKvs1vwyWfcE4fOrit5SFLKBLIyJp13VHA%3D&reserved=0>
Link to this post:
http://lists.isocpp.org/lib-ext/2019/11/13325.phpcd011db47%7C1%7C0%7C637086779033929933&sdata=zNBIMvgu6Y7ljSTA37qaM%2Fs6n7hs4CKqXLplDGiQ0TY%3D&reserved=0>




_______________________________________________
SG16 Unicode mailing list
Unicode_at_[hidden]<mailto:Unicode_at_[hidden]>
http://www.open-std.org/mailman/listinfo/unicode1db47%7C1%7C0%7C637086779033929933&sdata=ggFj6DMw%2FETMywUoNGjMBw1Fp5ZsWRJHDmCf05Kohtg%3D&reserved=0>







SG16 list run by herb.sutter at gmail.com