C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] [isocpp-lib] [isocpp-lib-ext] [time.duration.io] : Is stream insertion behavior locale dependent when Period::type is micro?

From: Billy O'Neal (VC LIBS) <"Billy>
Date: Tue, 12 Nov 2019 19:46:25 +0000
> This is exactly why the original wording I proposed stated that the result is unspecified if the run-time locale encoding is not compatible with the encoding used for the execution character set.

The problem is that wording doesn’t say “us might be used”, it says “if one of these specific conditions happens, us may be used,” and streams have no means of detecting those specific conditions.

“If "µs" is used but the implementation's native character set lacks representation for U+00B5 and the stream is associated with a terminal or console, or if the stream is imbued with a std::codecvt facet that lacks conversion support for the character, then the result is unspecified.”.

Streams can’t detect either of the red conditions. basic_filebuf might know if the target is a console, but the stream certainly doesn’t. And streams don’t talk to std::codecvt facets at all; again that’s in basic_filebuf. It seems the P/R without this second sentience resolves the issue completely?

I think

Billy3

From: Tom Honermann<mailto:tom_at_[hidden]t>
Sent: Thursday, November 7, 2019 3:37 AM
To: Billy O'Neal (VC LIBS)<mailto:bion_at_[hidden]>; lib_at_[hidden]<mailto:lib_at_[hidden]>; Corentin<mailto:corentin.jabot_at_[hidden]>
Cc: C++ Library Evolution Working Group<mailto:lib-ext_at_[hidden]>; unicode_at_[hidden]<mailto:unicode_at_[hidden]>
Subject: Re: [isocpp-lib] [SG16-Unicode] [isocpp-lib-ext] [time.duration.io] : Is stream insertion behavior locale dependent when Period::type is micro?

On 11/7/19 11:23 AM, Billy O'Neal (VC LIBS) wrote:

> The library doesn't need to assume. An example implementation (ignoring support for non-char types) could be: […]
That does not do the correct thing because the locale on the target is often not the locale when compiling. At compile time we usually consider our ‘execution character set’ to be the ASCII subset for maximum resistance to changes in locale at runtime, but the compiler will generally pass through more strict settings if the user has set them.
This is exactly why the original wording I proposed stated that the result is unspecified if the run-time locale encoding is not compatible with the encoding used for the execution character set.


> I think the Windows 10 comment is only relevant with respect to the run-time locale and choice of encoding for the console/terminal. Execution character set is independent of both of those.
It is dependent with both of those in that the choice of execution character set is constrained by the environment in which the program will run.

Indeed. But if a programmer compiles their code with /execution-charset:utf-8, it seems a clear indication that they intend to constrain the environment in which the program is run to one that supports UTF-8 (e.g., Windows 10, with UTF-8 ACP, and the new Windows Terminal). I recognize that such a deployment target is an uncommon reality today, but that is a direction to be encouraged.

Tom.

Billy3


From: Tom Honermann <tom_at_honermann.net><mailto:tom_at_[hidden]>
Sent: Wednesday, November 6, 2019 10:58:17 PM
To: Billy O'Neal (VC LIBS) <bion_at_[hidden]><mailto:bion_at_[hidden]>; lib_at_[hidden]<mailto:lib_at_[hidden]> <lib_at_[hidden]><mailto:lib_at_[hidden]>; Corentin <corentin.jabot_at_[hidden]><mailto:corentin.jabot_at_[hidden]>
Cc: C++ Library Evolution Working Group <lib-ext_at_[hidden]><mailto:lib-ext_at_[hidden]>; unicode_at_[hidden]<mailto:unicode_at_[hidden]> <unicode_at_[hidden]><mailto:unicode_at_[hidden]>
Subject: Re: [isocpp-lib] [SG16-Unicode] [isocpp-lib-ext] [time.duration.io] : Is stream insertion behavior locale dependent when Period::type is micro?

On 11/6/19 10:20 PM, Billy O'Neal (VC LIBS) wrote:

> That isn't what it (is intended to) say, nor how I read it.

Then remove the qualifications about terminals or codecvt facets and talk only about the execution character set, and things are OK. (As Corentin’s PR does)

> The intent of the wording was to allow Microsoft to use "µs" when the compiler is invoked with /execution-charset:utf-8 and to use "us" otherwise.

Given that UTF-8 support is still a rarely used user opt-in at this time only available on recent versions of Windows 10, it isn’t an assumption the library is going to be able to make soon (i.e. the next decade)

The library doesn't need to assume. An example implementation (ignoring support for non-char types) could be:

template<class traits, class Rep, class Period>

void print_fancy_suffix(basic_ostream<char, traits>& os, const duration<Rep, Period>& d)

{

  static const char micro_sign[] = "\u00B5s";

  if (as_unsigned(micro_sign[0]) == 0xC2u &&

      as_unsigned(micro_sign[1]) == 0xB5u)

  {

    // execution character set smells like UTF-8.

    os << d.count() << micro_sign;

  } else {

    // execution character set smells like bad.

    os << d.count() << "us";

  }

}

There are, of course, better ways to do this if the compiler has the ability to inform the library what the execution character set really is (e.g., a predefined macro).

I'm not arguing for any particular choice on Microsoft's part.

I think the Windows 10 comment is only relevant with respect to the run-time locale and choice of encoding for the console/terminal. Execution character set is independent of both of those.

Tom.



Billy3



From: Tom Honermann <tom_at_[hidden]><mailto:tom_at_[hidden]>
Sent: Wednesday, November 6, 2019 5:38:34 PM
To: Billy O'Neal (VC LIBS) <bion_at_[hidden]><mailto:bion_at_[hidden]>; lib_at_[hidden]<mailto:lib_at_[hidden]> <lib_at_[hidden]><mailto:lib_at_[hidden]>; Corentin <corentin.jabot_at_[hidden]><mailto:corentin.jabot_at_[hidden]>
Cc: C++ Library Evolution Working Group <lib-ext_at_[hidden]><mailto:lib-ext_at_[hidden]>; unicode_at_[hidden]<mailto:unicode_at_[hidden]> <unicode_at_[hidden]><mailto:unicode_at_[hidden]>
Subject: Re: [isocpp-lib] [SG16-Unicode] [isocpp-lib-ext] [time.duration.io] : Is stream insertion behavior locale dependent when Period::type is micro?

On 11/6/19 5:30 PM, Billy O'Neal (VC LIBS) wrote:

Corentin’s PR says “if char (the execution encoding) can always represent µ for your implementation, use that. Otherwise use u.” Which means on my implementation where char can’t always represent such a thing as that is locale dependent. we will statically use u (and µ for wchar_t); but an implementation that assumes char is UTF-8 could use µ.

The LWG issue’s PR says “if the stream can detect that it is targeting a console or codecvt facet that don’t support µ, an implementation may use u, otherwise they use µ”. But streams have no means of doing that detection. (And the answer can even change if someone changes the streambuf)

That isn't what it (is intended to) say, nor how I read it. It states that the suffix is determined by the execution character set (the character set used for string literals and known at compile time); that is in the first sentence. The second sentence acknowledges that if the native character set (the run-time locale dependent character set) lacks representation for the character, then all bets are off with regard to how the character is actually displayed (or converted by a codecvt facet).

The intent of the wording was to allow Microsoft to use "µs" when the compiler is invoked with /execution-charset:utf-8 and to use "us" otherwise.

Tom.



Billy3



From: Tom Honermann <tom_at_[hidden]><mailto:tom_at_honermann.net>
Sent: Wednesday, November 6, 2019 5:14:18 PM
To: Billy O'Neal (VC LIBS) <bion_at_[hidden]><mailto:bion_at_[hidden]>; lib_at_[hidden]<mailto:lib_at_[hidden]> <lib_at_[hidden]><mailto:lib_at_[hidden]>; Corentin <corentin.jabot_at_[hidden]><mailto:corentin.jabot_at_[hidden]>
Cc: C++ Library Evolution Working Group <lib-ext_at_[hidden]><mailto:lib-ext_at_[hidden]>; unicode_at_[hidden]<mailto:unicode_at_[hidden]> <unicode_at_[hidden]><mailto:unicode_at_[hidden]>
Subject: Re: [isocpp-lib] [SG16-Unicode] [isocpp-lib-ext] [time.duration.io] : Is stream insertion behavior locale dependent when Period::type is micro?

On 11/6/19 4:30 PM, Billy O'Neal (VC LIBS) wrote:

> Please read the wording again. Note that it says that, if those conditions are true, then the result is unspecified.

If “the wording” means the P/R of https://cplusplus.github.io/LWG/issue3314<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcplusplus.github.io%2FLWG%2Fissue3314&data=02%7C01%7Cbion%40microsoft.com%7Cfa7a7c5a0fdb4025e2f508d76376e659%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637087234642923282&sdata=G%2FjD0wLAhNgaM%2B3FBWdMK93mVP%2F0lWEKaty41iDQSrc%3D&reserved=0>, the wording there implies that we must make some effort to determine that the condition is true, which in practice we cannot do because the interface between streams and streambufs is public.

Yes, that is the wording I meant. The intent is to ensure the implementation does *not* have to put forth such effort. I don't understand where such an implication is coming from, but that wording has confused at least three experienced wordsmiths, so I acknowledge there is an issue, but I don't understand what it is.

I think it is important to say something here. Otherwise, one could claim that the terminal failing to display "μs" because it is configured for an incompatible encoding is non-conforming. Well, to the extent that the standard addresses such devices.

Tom.



Corentin’s P/R below seems to not have this concern.



Billy3



From: Lib <lib-bounces_at_[hidden]><mailto:lib-bounces_at_[hidden]> on behalf of Tom Honermann via Lib <lib_at_[hidden]><mailto:lib_at_[hidden]>
Sent: Wednesday, November 6, 2019 1:12:48 PM
To: Corentin <corentin.jabot_at_[hidden]><mailto:corentin.jabot_at_[hidden]>
Cc: Tom Honermann <tom_at_[hidden]><mailto:tom_at_honermann.net>; C++ Library Evolution Working Group <lib-ext_at_[hidden]><mailto:lib-ext_at_[hidden]>; Library Working Group <lib_at_[hidden]><mailto:lib_at_[hidden]>; unicode_at_[hidden]<mailto:unicode_at_[hidden]> <unicode_at_[hidden]><mailto:unicode_at_[hidden]>
Subject: Re: [isocpp-lib] [SG16-Unicode] [isocpp-lib-ext] [time.duration.io] : Is stream insertion behavior locale dependent when Period::type is micro?

The intent of the wording is to say that implementors do *not* need to be aware of terminals or codecvt facets. Without this, the wording could be read that implementations must implement magic to make the character display correctly.

Please read the wording again. Note that it says that, if those conditions are true, then the result is unspecified.
Tom.

On Nov 6, 2019, at 12:07 PM, Corentin <corentin.jabot_at_[hidden]<mailto:corentin.jabot_at_[hidden]>> wrote:
Then I would just say associated execution encoding with charT

Extremely uncomfortable with involving stream, console or anything else not known at compile time

On Wed, 6 Nov 2019 at 04:51, Tom Honermann <tom_at_[hidden]<mailto:tom_at_[hidden]>> wrote:
On 11/6/19 8:30 AM, Howard Hinnant wrote:


You can comment the LWG issue (if you want) by emailing said comment to lwgchair_at_[hidden]<mailto:lwgchair_at_[hidden]>, specifying which issue you wish to comment and supplying the comment.



Howard



On Nov 5, 2019, at 10:32 PM, Corentin via Lib-Ext <lib-ext_at_[hidden]><mailto:lib-ext_at_[hidden]> wrote:

Not sure how to do that proceduraly but here is some alternative wording.

The "runtime" locale-tied encoding is *assumed to be* a super set of the execution encoding - to the extent the standard doesn't distinguish between the two





If Period::type is micro, but the <ins>abstract</ins> character <ins>µ , which has the universal character name </ins> U+00B5 cannot be represented in the <ins>execution</ins> encoding <del>used for</del><ins> associated with the character type </ins> charT, the unit suffix "us" is used instead of "µs".

Howard and I discussed the wording I proposed today and we're now on the same page with regard to the intent.

With regard to Corentin's suggested wording above, "abstract character" and "execution encoding" are not current terms in the standard (well, the former is inherited from our reference to the Unicode standard but is otherwise unused at present). P1859R0<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwg21.link%2Fp1859r0&data=02%7C01%7Cbion%40microsoft.com%7Cfa7a7c5a0fdb4025e2f508d76376e659%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637087234642933273&sdata=%2FORYjQ55sYl4GhsFX0CTEpocoAAAcy%2Bom7H%2BXX4XhDA%3D&reserved=0> does intend to standardize new terminology, but we don't yet have consensus for what the new terms should be named. I think we should avoid using candidate names until we have such consensus.

Tom.


On Mon, 4 Nov 2019 at 15:42, Tom Honermann via Lib-Ext <lib-ext_at_[hidden]><mailto:lib-ext_at_[hidden]> wrote:

A new LWG issue was filed for this question today:

- https://cplusplus.github.io/LWG/issue3314<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcplusplus.github.io%2FLWG%2Fissue3314&data=02%7C01%7Cbion%40microsoft.com%7Cfa7a7c5a0fdb4025e2f508d76376e659%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637087234642943267&sdata=Juy31tMGH2dxzmMF9Y%2FBRX1ra1%2BP4I1ZayTbEizrhQ0%3D&reserved=0>



This issue concerns the ostream inserters added for std::chrono::duration in C++20 and what the intended behavior is for a duration when period::type is micro.



[time.duration.io<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftime.duration.io&data=02%7C01%7Cbion%40microsoft.com%7Cfa7a7c5a0fdb4025e2f508d76376e659%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637087234642943267&sdata=l1AbcrR%2BwVrCF8kMgSQ4rctBTHv5oBJCtGH0RhISgAA%3D&reserved=0>]p4 states:





If Period​::​type is micro, but the character U+00B5 cannot be represented in the encoding used for charT, the unit suffix "us" is used instead of "μs".



The question is with regard to which one of the encodings used for charT is referred to here; the compile-time execution character set or the run-time locale dependent native character set?



The proposed resolution specifies that the compile-time execution character set is the intended one. My expectation is that this aligns with existing implementations, but I haven't checked.



Tom.



_______________________________________________

Lib-Ext mailing list

Lib-Ext_at_[hidden]<mailto:Lib-Ext_at_[hidden]>

Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/lib-ext<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Flib-ext&data=02%7C01%7Cbion%40microsoft.com%7Cfa7a7c5a0fdb4025e2f508d76376e659%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637087234642953264&sdata=4S9xhpGRJiGcLQUrPRTs87tEVJjoEK5p%2Bz99arrZKEA%3D&reserved=0>

Link to this post: http://lists.isocpp.org/lib-ext/2019/11/13309.php<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.isocpp.org%2Flib-ext%2F2019%2F11%2F13309.php&data=02%7C01%7Cbion%40microsoft.com%7Cfa7a7c5a0fdb4025e2f508d76376e659%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637087234642963257&sdata=aLsS3c9lQEeFIJSg7fGx8z0h3Ev7OqM2yM3zMRLnHSw%3D&reserved=0>

_______________________________________________

Lib-Ext mailing list

Lib-Ext_at_[hidden]<mailto:Lib-Ext_at_[hidden]g>

Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/lib-ext<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Flib-ext&data=02%7C01%7Cbion%40microsoft.com%7Cfa7a7c5a0fdb4025e2f508d76376e659%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637087234642963257&sdata=Mm5a4PcNosOjAvyu6iQkL%2Fp3fbMOZngnwWDTFvXfuvw%3D&reserved=0>

Link to this post: http://lists.isocpp.org/lib-ext/2019/11/13325.php<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.isocpp.org%2Flib-ext%2F2019%2F11%2F13325.php&data=02%7C01%7Cbion%40microsoft.com%7Cfa7a7c5a0fdb4025e2f508d76376e659%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637087234642973250&sdata=RfG9Uu%2BU8vgsIcPxAZkbLFfruv8WVKp%2Fn1ig3%2F7RP%2F4%3D&reserved=0>



_______________________________________________

SG16 Unicode mailing list

Unicode_at_[hidden]<mailto:Unicode_at_[hidden]>

http://www.open-std.org/mailman/listinfo/unicode<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.open-std.org%2Fmailman%2Flistinfo%2Funicode&data=02%7C01%7Cbion%40microsoft.com%7Cfa7a7c5a0fdb4025e2f508d76376e659%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637087234642973250&sdata=oD3lJy0hAQ3sSEogwFznvSjV0kmdkyCm%2BaeLIT8Prfw%3D&reserved=0>











Received on 2019-11-12 20:46:32