C++ Logo

SG16

Advanced search

Subject: Re: [SG16-Unicode] [isocpp-lib] [isocpp-lib-ext] [time.duration.io] : Is stream insertion behavior locale dependent when Period::type is micro?
From: Tom Honermann (tom_at_[hidden])
Date: 2019-11-06 16:58:17


On 11/6/19 10:20 PM, Billy O'Neal (VC LIBS) wrote:
>
> > That isn't what it (is intended to) say, nor how I read it.
>
> Then remove the qualifications about terminals or codecvt facets and
> talk only about the execution character set, and things are OK. (As
> Corentin's PR does)
>
> > The intent of the wording was to allow Microsoft to use "µs" when
> the compiler is invoked with /execution-charset:utf-8 and to use "us"
> otherwise.
>
> Given that UTF-8 support is still a rarely used user opt-in at this
> time only available on recent versions of Windows 10, it isn't an
> assumption the library is going to be able to make soon (i.e. the next
> decade)
>
The library doesn't need to assume.  An example implementation (ignoring
support for non-char types) could be:

template<class traits, class Rep, class Period>void print_fancy_suffix(basic_ostream<char, traits>& os, const
duration<Rep, Period>& d){  static const char micro_sign[] = "\u00B5s"; 
if (as_unsigned(micro_sign[0]) == 0xC2u &&     
as_unsigned(micro_sign[1]) == 0xB5u)  {    // execution character set
smells like UTF-8.    os << d.count() << micro_sign;  } else {    //
execution character set smells like bad.os << d.count() << "us";  }}

There are, of course, better ways to do this if the compiler has the
ability to inform the library what the execution character set really is
(e.g., a predefined macro).

I'm not arguing for any particular choice on Microsoft's part.

I think the Windows 10 comment is only relevant with respect to the
run-time locale and choice of encoding for the console/terminal. 
Execution character set is independent of both of those.

Tom.

> Billy3
>
> ------------------------------------------------------------------------
> *From:* Tom Honermann <tom_at_[hidden]>
> *Sent:* Wednesday, November 6, 2019 5:38:34 PM
> *To:* Billy O'Neal (VC LIBS) <bion_at_[hidden]>;
> lib_at_[hidden] <lib_at_[hidden]>; Corentin
> <corentin.jabot_at_[hidden]>
> *Cc:* C++ Library Evolution Working Group <lib-ext_at_[hidden]>;
> unicode_at_[hidden] <unicode_at_[hidden]>
> *Subject:* Re: [isocpp-lib] [SG16-Unicode] [isocpp-lib-ext]
> [time.duration.io] : Is stream insertion behavior locale dependent
> when Period::type is micro?
> On 11/6/19 5:30 PM, Billy O'Neal (VC LIBS) wrote:
>>
>> Corentin's PR says "if char (the execution encoding) can always
>> represent µ for your implementation, use that. Otherwise use u."
>> Which means on my implementation where char can't always represent
>> such a thing as that is locale dependent. we will statically use u
>> (and µ for wchar_t); but an implementation that assumes char is UTF-8
>> could use µ.
>>
>> The LWG issue's PR says "if the stream can detect that it is
>> targeting a console or codecvt facet that don't support µ, an
>> implementation  may use u, otherwise they use µ". But streams have no
>> means of doing that detection. (And the answer can even change if
>> someone changes the streambuf)
>>
> That isn't what it (is intended to) say, nor how I read it. It states
> that the suffix is determined by the execution character set (the
> character set used for string literals and known at compile time);
> that is in the first sentence.  The second sentence acknowledges that
> if the native character set (the run-time locale dependent character
> set) lacks representation for the character, then all bets are off
> with regard to how the character is actually displayed (or converted
> by a codecvt facet).
>
> The intent of the wording was to allow Microsoft to use "µs" when the
> compiler is invoked with /execution-charset:utf-8 and to use "us"
> otherwise.
>
> Tom.
>
>> Billy3
>>
>> ------------------------------------------------------------------------
>> *From:* Tom Honermann <tom_at_[hidden]> <mailto:tom_at_[hidden]>
>> *Sent:* Wednesday, November 6, 2019 5:14:18 PM
>> *To:* Billy O'Neal (VC LIBS) <bion_at_[hidden]>
>> <mailto:bion_at_[hidden]>; lib_at_[hidden]
>> <mailto:lib_at_[hidden]> <lib_at_[hidden]>
>> <mailto:lib_at_[hidden]>; Corentin <corentin.jabot_at_[hidden]>
>> <mailto:corentin.jabot_at_[hidden]>
>> *Cc:* C++ Library Evolution Working Group <lib-ext_at_[hidden]>
>> <mailto:lib-ext_at_[hidden]>; unicode_at_[hidden]
>> <mailto:unicode_at_[hidden]> <unicode_at_[hidden]>
>> <mailto:unicode_at_[hidden]>
>> *Subject:* Re: [isocpp-lib] [SG16-Unicode] [isocpp-lib-ext]
>> [time.duration.io] : Is stream insertion behavior locale dependent
>> when Period::type is micro?
>> On 11/6/19 4:30 PM, Billy O'Neal (VC LIBS) wrote:
>>>
>>> > Please read the wording again. Note that it says that, if those
>>> conditions are true, then the result is unspecified.
>>>
>>> If "the wording" means the P/R of
>>> https://cplusplus.github.io/LWG/issue3314
>>> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcplusplus.github.io%2FLWG%2Fissue3314&data=02%7C01%7Cbion%40microsoft.com%7C74f197e07e854e96a8a708d762e027b8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086587197316817&sdata=7Ay4MBsgFceIPx7HV2S1JNb9lMEmtRHK%2FMHGFe15enI%3D&reserved=0>,
>>> the wording there implies that we must make some effort to determine
>>> that the condition is true, which in practice we cannot do because
>>> the interface between streams and streambufs is public.
>>>
>> Yes, that is the wording I meant.  The intent is to ensure the
>> implementation does *not* have to put forth such effort.  I don't
>> understand where such an implication is coming from, but that wording
>> has confused at least three experienced wordsmiths, so I acknowledge
>> there is an issue, but I don't understand what it is.
>>
>> I think it is important to say something here. Otherwise, one could
>> claim that the terminal failing to display "μs" because it is
>> configured for an incompatible encoding is non-conforming.  Well, to
>> the extent that the standard addresses such devices.
>>
>> Tom.
>>
>>> Corentin's P/R below seems to not have this concern.
>>>
>>> Billy3
>>>
>>> ------------------------------------------------------------------------
>>> *From:* Lib <lib-bounces_at_[hidden]>
>>> <mailto:lib-bounces_at_[hidden]> on behalf of Tom Honermann via
>>> Lib <lib_at_[hidden]> <mailto:lib_at_[hidden]>
>>> *Sent:* Wednesday, November 6, 2019 1:12:48 PM
>>> *To:* Corentin <corentin.jabot_at_[hidden]>
>>> <mailto:corentin.jabot_at_[hidden]>
>>> *Cc:* Tom Honermann <tom_at_[hidden]> <mailto:tom_at_[hidden]>;
>>> C++ Library Evolution Working Group <lib-ext_at_[hidden]>
>>> <mailto:lib-ext_at_[hidden]>; Library Working Group
>>> <lib_at_[hidden]> <mailto:lib_at_[hidden]>;
>>> unicode_at_[hidden] <mailto:unicode_at_[hidden]>
>>> <unicode_at_[hidden]> <mailto:unicode_at_[hidden]>
>>> *Subject:* Re: [isocpp-lib] [SG16-Unicode] [isocpp-lib-ext]
>>> [time.duration.io] : Is stream insertion behavior locale dependent
>>> when Period::type is micro?
>>> The intent of the wording is to say that implementors do *not* need
>>> to be aware of terminals or codecvt facets. Without this, the
>>> wording could be read that implementations must implement magic to
>>> make the character display correctly.
>>>
>>> Please read the wording again. Note that it says that, if those
>>> conditions are true, then the result is unspecified.
>>>
>>> Tom.
>>>
>>> On Nov 6, 2019, at 12:07 PM, Corentin <corentin.jabot_at_[hidden]
>>> <mailto:corentin.jabot_at_[hidden]>> wrote:
>>>
>>>> Then I would just say associated execution encoding with charT
>>>>
>>>> Extremely uncomfortable with involving stream, console or anything
>>>> else not known at compile time
>>>>
>>>> On Wed, 6 Nov 2019 at 04:51, Tom Honermann <tom_at_[hidden]
>>>> <mailto:tom_at_[hidden]>> wrote:
>>>>
>>>> On 11/6/19 8:30 AM, Howard Hinnant wrote:
>>>>> You can comment the LWG issue (if you want) by emailing said comment tolwgchair_at_[hidden] <mailto:lwgchair_at_[hidden]>, specifying which issue you wish to comment and supplying the comment.
>>>>>
>>>>> Howard
>>>>>
>>>>> On Nov 5, 2019, at 10:32 PM, Corentin via Lib-Ext<lib-ext_at_[hidden]> <mailto:lib-ext_at_[hidden]> wrote:
>>>>>> Not sure how to do that proceduraly but here is some alternative wording.
>>>>>> The "runtime" locale-tied encoding is *assumed to be* a super set of the execution encoding - to the extent the standard doesn't distinguish between the two
>>>>>>
>>>>>>
>>>>>> If Period::type is micro, but the <ins>abstract</ins> character <ins>µ , which has the universal character name </ins> U+00B5 cannot be represented in the <ins>execution</ins> encoding <del>used for</del><ins> associated with the character type </ins> charT, the unit suffix "us" is used instead of "µs".
>>>>
>>>> Howard and I discussed the wording I proposed today and we're
>>>> now on the same page with regard to the intent.
>>>>
>>>> With regard to Corentin's suggested wording above, "abstract
>>>> character" and "execution encoding" are not current terms in
>>>> the standard (well, the former is inherited from our reference
>>>> to the Unicode standard but is otherwise unused at present).
>>>> P1859R0
>>>> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwg21.link%2Fp1859r0&data=02%7C01%7Cbion%40microsoft.com%7C74f197e07e854e96a8a708d762e027b8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086587197316817&sdata=fRQkAXY8N4iGfVAcggrHXWcGfGZYnLTcKbn4EvQmh1E%3D&reserved=0>
>>>> does intend to standardize new terminology, but we don't yet
>>>> have consensus for what the new terms should be named.  I think
>>>> we should avoid using candidate names until we have such consensus.
>>>>
>>>> Tom.
>>>>
>>>>>>> On Mon, 4 Nov 2019 at 15:42, Tom Honermann via Lib-Ext<lib-ext_at_[hidden]> <mailto:lib-ext_at_[hidden]> wrote:
>>>>>>> A new LWG issue was filed for this question today:
>>>>>>> -https://cplusplus.github.io/LWG/issue3314 <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcplusplus.github.io%2FLWG%2Fissue3314&data=02%7C01%7Cbion%40microsoft.com%7C74f197e07e854e96a8a708d762e027b8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086587197326768&sdata=EyplItoVn6mHk6%2BYsoWYfLuLKpfXFIVXR75m94jkUd0%3D&reserved=0>
>>>>>>>
>>>>>>> This issue concerns the ostream inserters added for std::chrono::duration in C++20 and what the intended behavior is for a duration when period::type is micro.
>>>>>>>
>>>>>>> [time.duration.io <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftime.duration.io&data=02%7C01%7Cbion%40microsoft.com%7C74f197e07e854e96a8a708d762e027b8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086587197326768&sdata=ZqgUK0G5nT3PIvt0z2JLkuCoQq%2Fj2rkQJguq1Yi5J08%3D&reserved=0>]p4 states:
>>>>>>>
>>>>>>>
>>>>>>>> If Period::type is micro, but the character U+00B5 cannot be represented in the encoding used for charT, the unit suffix "us" is used instead of "μs".
>>>>>>>>
>>>>>>> The question is with regard to which one of the encodings used for charT is referred to here; the compile-time execution character set or the run-time locale dependent native character set?
>>>>>>>
>>>>>>> The proposed resolution specifies that the compile-time execution character set is the intended one. My expectation is that this aligns with existing implementations, but I haven't checked.
>>>>>>>
>>>>>>> Tom.
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Lib-Ext mailing list
>>>>>> Lib-Ext_at_[hidden] <mailto:Lib-Ext_at_[hidden]>
>>>>>> Subscription:https://lists.isocpp.org/mailman/listinfo.cgi/lib-ext <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Flib-ext&data=02%7C01%7Cbion%40microsoft.com%7C74f197e07e854e96a8a708d762e027b8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086587197336730&sdata=KUn3PjJ0%2FhPiFTFa%2BCYhidHySoo7RmqWAFur1lJmDM4%3D&reserved=0>
>>>>>> Link to this post:http://lists.isocpp.org/lib-ext/2019/11/13309.php <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.isocpp.org%2Flib-ext%2F2019%2F11%2F13309.php&data=02%7C01%7Cbion%40microsoft.com%7C74f197e07e854e96a8a708d762e027b8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086587197336730&sdata=INf%2BwIZDoM7tAbfb46SzNRxIk7o%2Fx1BeHeX%2BQETACwE%3D&reserved=0>
>>>>>> _______________________________________________
>>>>>> Lib-Ext mailing list
>>>>>> Lib-Ext_at_[hidden] <mailto:Lib-Ext_at_[hidden]>
>>>>>> Subscription:https://lists.isocpp.org/mailman/listinfo.cgi/lib-ext <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Flib-ext&data=02%7C01%7Cbion%40microsoft.com%7C74f197e07e854e96a8a708d762e027b8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086587197346685&sdata=aV2v1RPhQUAUDdiWggWhOKWRBHbBA6yRvF7h65gksqw%3D&reserved=0>
>>>>>> Link to this post:http://lists.isocpp.org/lib-ext/2019/11/13325.php <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.isocpp.org%2Flib-ext%2F2019%2F11%2F13325.php&data=02%7C01%7Cbion%40microsoft.com%7C74f197e07e854e96a8a708d762e027b8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086587197346685&sdata=Q7ssk2EUlVVMtDO9InNh1T1Mpi3SC3bjApHqlBs8KZg%3D&reserved=0>
>>>>>
>>>>> _______________________________________________
>>>>> SG16 Unicode mailing list
>>>>> Unicode_at_[hidden] <mailto:Unicode_at_[hidden]>
>>>>> http://www.open-std.org/mailman/listinfo/unicode <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.open-std.org%2Fmailman%2Flistinfo%2Funicode&data=02%7C01%7Cbion%40microsoft.com%7C74f197e07e854e96a8a708d762e027b8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086587197356642&sdata=ldcvN7dm7G%2BRwnxOvfqJgm5d6hVw5jHp%2BFai51CBsjw%3D&reserved=0>
>>>>
>>>>
>>
>



SG16 list run by herb.sutter at gmail.com