C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] [isocpp-lib] [isocpp-lib-ext] [time.duration.io] : Is stream insertion behavior locale dependent when Period::type is micro?

From: Tom Honermann <tom_at_[hidden]>
Date: Thu, 7 Nov 2019 11:37:38 +0000
On 11/7/19 11:23 AM, Billy O'Neal (VC LIBS) wrote:
>
> > The library doesn't need to assume. An example implementation
> (ignoring support for non-char types) could be: […]
>
> That does not do the correct thing because the locale on the target is
> often not the locale when compiling. At compile time we usually
> consider our ‘execution character set’ to be the ASCII subset for
> maximum resistance to changes in locale at runtime, but the compiler
> will generally pass through more strict settings if the user has set them.
>
This is exactly why the original wording I proposed stated that the
result is unspecified if the run-time locale encoding is not compatible
with the encoding used for the execution character set.
>
> > I think the Windows 10 comment is only relevant with respect to the
> run-time locale and choice of encoding for the console/terminal.
> Execution character set is independent of both of those.
>
> It is dependent with both of those in that the choice of execution
> character set is constrained by the environment in which the program
> will run.
>
Indeed. But if a programmer compiles their code with
/execution-charset:utf-8, it seems a clear indication that they intend
to constrain the environment in which the program is run to one that
supports UTF-8 (e.g., Windows 10, with UTF-8 ACP, and the new Windows
Terminal). I recognize that such a deployment target is an uncommon
reality today, but that is a direction to be encouraged.

Tom.

> Billy3
>
> ------------------------------------------------------------------------
> *From:* Tom Honermann <tom_at_[hidden]>
> *Sent:* Wednesday, November 6, 2019 10:58:17 PM
> *To:* Billy O'Neal (VC LIBS) <bion_at_[hidden]>;
> lib_at_[hidden] <lib_at_[hidden]>; Corentin
> <corentin.jabot_at_[hidden]>
> *Cc:* C++ Library Evolution Working Group <lib-ext_at_[hidden]>;
> unicode_at_[hidden] <unicode_at_[hidden]>
> *Subject:* Re: [isocpp-lib] [SG16-Unicode] [isocpp-lib-ext]
> [time.duration.io] : Is stream insertion behavior locale dependent
> when Period::type is micro?
> On 11/6/19 10:20 PM, Billy O'Neal (VC LIBS) wrote:
>>
>> > That isn't what it (is intended to) say, nor how I read it.
>>
>> Then remove the qualifications about terminals or codecvt facets and
>> talk only about the execution character set, and things are OK. (As
>> Corentin’s PR does)
>>
>> > The intent of the wording was to allow Microsoft to use "µs" when
>> the compiler is invoked with /execution-charset:utf-8 and to use "us"
>> otherwise.
>>
>> Given that UTF-8 support is still a rarely used user opt-in at this
>> time only available on recent versions of Windows 10, it isn’t an
>> assumption the library is going to be able to make soon (i.e. the
>> next decade)
>>
> The library doesn't need to assume. An example implementation
> (ignoring support for non-char types) could be:
>
> template<class traits, class Rep, class Period>void print_fancy_suffix(basic_ostream<char, traits>& os, const
> duration<Rep, Period>& d){ static const char micro_sign[] =
> "\u00B5s"; if (as_unsigned(micro_sign[0]) == 0xC2u &&
> as_unsigned(micro_sign[1]) == 0xB5u) { // execution character set
> smells like UTF-8. os << d.count() << micro_sign; } else { //
> execution character set smells like bad.os << d.count() << "us"; }}
>
> There are, of course, better ways to do this if the compiler has the
> ability to inform the library what the execution character set really
> is (e.g., a predefined macro).
>
> I'm not arguing for any particular choice on Microsoft's part.
>
> I think the Windows 10 comment is only relevant with respect to the
> run-time locale and choice of encoding for the console/terminal.
> Execution character set is independent of both of those.
>
> Tom.
>
>> Billy3
>>
>> ------------------------------------------------------------------------
>> *From:* Tom Honermann <tom_at_[hidden]> <mailto:tom_at_[hidden]>
>> *Sent:* Wednesday, November 6, 2019 5:38:34 PM
>> *To:* Billy O'Neal (VC LIBS) <bion_at_[hidden]>
>> <mailto:bion_at_[hidden]>; lib_at_[hidden]
>> <mailto:lib_at_[hidden]> <lib_at_[hidden]>
>> <mailto:lib_at_[hidden]>; Corentin <corentin.jabot_at_[hidden]>
>> <mailto:corentin.jabot_at_[hidden]>
>> *Cc:* C++ Library Evolution Working Group <lib-ext_at_[hidden]>
>> <mailto:lib-ext_at_[hidden]>; unicode_at_[hidden]
>> <mailto:unicode_at_[hidden]> <unicode_at_[hidden]>
>> <mailto:unicode_at_[hidden]>
>> *Subject:* Re: [isocpp-lib] [SG16-Unicode] [isocpp-lib-ext]
>> [time.duration.io] : Is stream insertion behavior locale dependent
>> when Period::type is micro?
>> On 11/6/19 5:30 PM, Billy O'Neal (VC LIBS) wrote:
>>>
>>> Corentin’s PR says “if char (the execution encoding) can always
>>> represent µ for your implementation, use that. Otherwise use u.”
>>> Which means on my implementation where char can’t always represent
>>> such a thing as that is locale dependent. we will statically use u
>>> (and µ for wchar_t); but an implementation that assumes char is
>>> UTF-8 could use µ.
>>>
>>> The LWG issue’s PR says “if the stream can detect that it is
>>> targeting a console or codecvt facet that don’t support µ, an
>>> implementation may use u, otherwise they use µ”. But streams have no
>>> means of doing that detection. (And the answer can even change if
>>> someone changes the streambuf)
>>>
>> That isn't what it (is intended to) say, nor how I read it. It
>> states that the suffix is determined by the execution character set
>> (the character set used for string literals and known at compile
>> time); that is in the first sentence. The second sentence
>> acknowledges that if the native character set (the run-time locale
>> dependent character set) lacks representation for the character, then
>> all bets are off with regard to how the character is actually
>> displayed (or converted by a codecvt facet).
>>
>> The intent of the wording was to allow Microsoft to use "µs" when the
>> compiler is invoked with /execution-charset:utf-8 and to use "us"
>> otherwise.
>>
>> Tom.
>>
>>> Billy3
>>>
>>> ------------------------------------------------------------------------
>>> *From:* Tom Honermann <tom_at_[hidden]> <mailto:tom_at_[hidden]>
>>> *Sent:* Wednesday, November 6, 2019 5:14:18 PM
>>> *To:* Billy O'Neal (VC LIBS) <bion_at_[hidden]>
>>> <mailto:bion_at_[hidden]>; lib_at_[hidden]
>>> <mailto:lib_at_[hidden]> <lib_at_[hidden]>
>>> <mailto:lib_at_[hidden]>; Corentin <corentin.jabot_at_[hidden]>
>>> <mailto:corentin.jabot_at_[hidden]>
>>> *Cc:* C++ Library Evolution Working Group <lib-ext_at_[hidden]>
>>> <mailto:lib-ext_at_[hidden]>; unicode_at_[hidden]
>>> <mailto:unicode_at_[hidden]> <unicode_at_[hidden]>
>>> <mailto:unicode_at_[hidden]>
>>> *Subject:* Re: [isocpp-lib] [SG16-Unicode] [isocpp-lib-ext]
>>> [time.duration.io] : Is stream insertion behavior locale dependent
>>> when Period::type is micro?
>>> On 11/6/19 4:30 PM, Billy O'Neal (VC LIBS) wrote:
>>>>
>>>> > Please read the wording again. Note that it says that, if those
>>>> conditions are true, then the result is unspecified.
>>>>
>>>> If “the wording” means the P/R of
>>>> https://cplusplus.github.io/LWG/issue3314
>>>> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcplusplus.github.io%2FLWG%2Fissue3314&data=02%7C01%7Cbion%40microsoft.com%7C371c6ae112934c8e66ca08d7630cd250%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086779033889953&sdata=gfv7uzxwY5Ol8guxD0C179G6xnDBcdbt2qA%2FVrE3AyU%3D&reserved=0>,
>>>> the wording there implies that we must make some effort to
>>>> determine that the condition is true, which in practice we cannot
>>>> do because the interface between streams and streambufs is public.
>>>>
>>> Yes, that is the wording I meant. The intent is to ensure the
>>> implementation does *not* have to put forth such effort. I don't
>>> understand where such an implication is coming from, but that
>>> wording has confused at least three experienced wordsmiths, so I
>>> acknowledge there is an issue, but I don't understand what it is.
>>>
>>> I think it is important to say something here. Otherwise, one could
>>> claim that the terminal failing to display "μs" because it is
>>> configured for an incompatible encoding is non-conforming. Well, to
>>> the extent that the standard addresses such devices.
>>>
>>> Tom.
>>>
>>>> Corentin’s P/R below seems to not have this concern.
>>>>
>>>> Billy3
>>>>
>>>> ------------------------------------------------------------------------
>>>> *From:* Lib <lib-bounces_at_[hidden]>
>>>> <mailto:lib-bounces_at_[hidden]> on behalf of Tom Honermann
>>>> via Lib <lib_at_[hidden]> <mailto:lib_at_[hidden]>
>>>> *Sent:* Wednesday, November 6, 2019 1:12:48 PM
>>>> *To:* Corentin <corentin.jabot_at_[hidden]>
>>>> <mailto:corentin.jabot_at_[hidden]>
>>>> *Cc:* Tom Honermann <tom_at_[hidden]> <mailto:tom_at_[hidden]>;
>>>> C++ Library Evolution Working Group <lib-ext_at_[hidden]>
>>>> <mailto:lib-ext_at_[hidden]>; Library Working Group
>>>> <lib_at_[hidden]> <mailto:lib_at_[hidden]>;
>>>> unicode_at_[hidden] <mailto:unicode_at_[hidden]>
>>>> <unicode_at_[hidden]> <mailto:unicode_at_[hidden]>
>>>> *Subject:* Re: [isocpp-lib] [SG16-Unicode] [isocpp-lib-ext]
>>>> [time.duration.io] : Is stream insertion behavior locale dependent
>>>> when Period::type is micro?
>>>> The intent of the wording is to say that implementors do *not* need
>>>> to be aware of terminals or codecvt facets. Without this, the
>>>> wording could be read that implementations must implement magic to
>>>> make the character display correctly.
>>>>
>>>> Please read the wording again. Note that it says that, if those
>>>> conditions are true, then the result is unspecified.
>>>>
>>>> Tom.
>>>>
>>>> On Nov 6, 2019, at 12:07 PM, Corentin <corentin.jabot_at_[hidden]
>>>> <mailto:corentin.jabot_at_[hidden]>> wrote:
>>>>
>>>>> Then I would just say associated execution encoding with charT
>>>>>
>>>>> Extremely uncomfortable with involving stream, console or anything
>>>>> else not known at compile time
>>>>>
>>>>> On Wed, 6 Nov 2019 at 04:51, Tom Honermann <tom_at_[hidden]
>>>>> <mailto:tom_at_[hidden]>> wrote:
>>>>>
>>>>> On 11/6/19 8:30 AM, Howard Hinnant wrote:
>>>>>> You can comment the LWG issue (if you want) by emailing said comment tolwgchair_at_[hidden] <mailto:lwgchair_at_[hidden]>, specifying which issue you wish to comment and supplying the comment.
>>>>>>
>>>>>> Howard
>>>>>>
>>>>>> On Nov 5, 2019, at 10:32 PM, Corentin via Lib-Ext<lib-ext_at_[hidden]> <mailto:lib-ext_at_[hidden]> wrote:
>>>>>>> Not sure how to do that proceduraly but here is some alternative wording.
>>>>>>> The "runtime" locale-tied encoding is *assumed to be* a super set of the execution encoding - to the extent the standard doesn't distinguish between the two
>>>>>>>
>>>>>>>
>>>>>>> If Period::type is micro, but the <ins>abstract</ins> character <ins>µ , which has the universal character name </ins> U+00B5 cannot be represented in the <ins>execution</ins> encoding <del>used for</del><ins> associated with the character type </ins> charT, the unit suffix "us" is used instead of "µs".
>>>>>
>>>>> Howard and I discussed the wording I proposed today and we're
>>>>> now on the same page with regard to the intent.
>>>>>
>>>>> With regard to Corentin's suggested wording above, "abstract
>>>>> character" and "execution encoding" are not current terms in
>>>>> the standard (well, the former is inherited from our reference
>>>>> to the Unicode standard but is otherwise unused at present).
>>>>> P1859R0
>>>>> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwg21.link%2Fp1859r0&data=02%7C01%7Cbion%40microsoft.com%7C371c6ae112934c8e66ca08d7630cd250%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086779033899949&sdata=oRDqgPM%2BQYpE7tvZ%2FNdfTgdtQfJ4IlCfccsiCFj3aWU%3D&reserved=0>
>>>>> does intend to standardize new terminology, but we don't yet
>>>>> have consensus for what the new terms should be named. I
>>>>> think we should avoid using candidate names until we have such
>>>>> consensus.
>>>>>
>>>>> Tom.
>>>>>
>>>>>>>> On Mon, 4 Nov 2019 at 15:42, Tom Honermann via Lib-Ext<lib-ext_at_[hidden]> <mailto:lib-ext_at_[hidden]> wrote:
>>>>>>>> A new LWG issue was filed for this question today:
>>>>>>>> -https://cplusplus.github.io/LWG/issue3314 <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcplusplus.github.io%2FLWG%2Fissue3314&data=02%7C01%7Cbion%40microsoft.com%7C371c6ae112934c8e66ca08d7630cd250%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086779033899949&sdata=U5A%2BsZ8XsYQl6KIQpM%2FdifLb70Hs3igIHBHVdsMPFyI%3D&reserved=0>
>>>>>>>>
>>>>>>>> This issue concerns the ostream inserters added for std::chrono::duration in C++20 and what the intended behavior is for a duration when period::type is micro.
>>>>>>>>
>>>>>>>> [time.duration.io <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftime.duration.io&data=02%7C01%7Cbion%40microsoft.com%7C371c6ae112934c8e66ca08d7630cd250%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086779033909944&sdata=GX4dXIwJ%2FLbhh%2BIOPS8nm0WqPZDRGbW38BEd450UsFw%3D&reserved=0>]p4 states:
>>>>>>>>
>>>>>>>>
>>>>>>>>> If Period​::​type is micro, but the character U+00B5 cannot be represented in the encoding used for charT, the unit suffix "us" is used instead of "μs".
>>>>>>>>>
>>>>>>>> The question is with regard to which one of the encodings used for charT is referred to here; the compile-time execution character set or the run-time locale dependent native character set?
>>>>>>>>
>>>>>>>> The proposed resolution specifies that the compile-time execution character set is the intended one. My expectation is that this aligns with existing implementations, but I haven't checked.
>>>>>>>>
>>>>>>>> Tom.
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Lib-Ext mailing list
>>>>>>> Lib-Ext_at_[hidden] <mailto:Lib-Ext_at_[hidden]>
>>>>>>> Subscription:https://lists.isocpp.org/mailman/listinfo.cgi/lib-ext <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Flib-ext&data=02%7C01%7Cbion%40microsoft.com%7C371c6ae112934c8e66ca08d7630cd250%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086779033909944&sdata=ChXa5r4gFfKFLSCp5W0r5KxJp2wQXITkyc%2Fl4qj7T%2FU%3D&reserved=0>
>>>>>>> Link to this post:http://lists.isocpp.org/lib-ext/2019/11/13309.php <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.isocpp.org%2Flib-ext%2F2019%2F11%2F13309.php&data=02%7C01%7Cbion%40microsoft.com%7C371c6ae112934c8e66ca08d7630cd250%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086779033919938&sdata=LvQSK0LbtvfCPYA%2BJEUGBQcc4xgqYrqIVOW%2BfzKZFNA%3D&reserved=0>
>>>>>>> _______________________________________________
>>>>>>> Lib-Ext mailing list
>>>>>>> Lib-Ext_at_[hidden] <mailto:Lib-Ext_at_[hidden]>
>>>>>>> Subscription:https://lists.isocpp.org/mailman/listinfo.cgi/lib-ext <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Flib-ext&data=02%7C01%7Cbion%40microsoft.com%7C371c6ae112934c8e66ca08d7630cd250%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086779033919938&sdata=HwWb%2F5ULhnKvs1vwyWfcE4fOrit5SFLKBLIyJp13VHA%3D&reserved=0>
>>>>>>> Link to this post:http://lists.isocpp.org/lib-ext/2019/11/13325.php <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.isocpp.org%2Flib-ext%2F2019%2F11%2F13325.php&data=02%7C01%7Cbion%40microsoft.com%7C371c6ae112934c8e66ca08d7630cd250%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086779033929933&sdata=zNBIMvgu6Y7ljSTA37qaM%2Fs6n7hs4CKqXLplDGiQ0TY%3D&reserved=0>
>>>>>>
>>>>>> _______________________________________________
>>>>>> SG16 Unicode mailing list
>>>>>> Unicode_at_[hidden] <mailto:Unicode_at_[hidden]>
>>>>>> http://www.open-std.org/mailman/listinfo/unicode <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.open-std.org%2Fmailman%2Flistinfo%2Funicode&data=02%7C01%7Cbion%40microsoft.com%7C371c6ae112934c8e66ca08d7630cd250%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086779033929933&sdata=ggFj6DMw%2FETMywUoNGjMBw1Fp5ZsWRJHDmCf05Kohtg%3D&reserved=0>
>>>>>
>>>>>
>>>
>>
>


Received on 2019-11-07 12:37:46