C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] Hidden locale dependency in [time.duration.io]?

From: Tom Honermann <tom_at_[hidden]>
Date: Mon, 4 Nov 2019 00:27:07 +0000
By "device", do you mean a display device, e.g., a console/terminal? If
the intent is that the implementor can fall back at run-time, then that
does seem to apply a locale dependency to me since whether "μs" can
actually be displayed is locale dependent. (Actually, it is worse than
locale dependent since it depends on console/terminal configuration as
well).

I suspect what we really want to depend on here is the execution
character set known at compile time. If the characters can be
represented in that associated character set, then whether they can be
displayed correctly depends on having a sane choice for execution
character set and run-time locale. In this sense, this touches on
issues we're working to address in:
- P1854R0: Conversion to execution encoding should not lead to loss of
meaning
    - This paper proposes that character/string literals are ill-formed
if they contain a character not representable in the corresponding
execution character set; the intent of the wording we're discussing is
to avoid the problems this paper intends to address.
- P1859R0: Standard terminology for execution character set encodings
    - We've discussed adding a requirement or at least a note to this
paper stating that the contents of a character/string literal must be
representable in the dynamic (locale dependent) encoding to prevent ...
something (unspecified behavior?)

I suggest the following wording: (using terminology from P1859R0)

If Period​::​type is micro, but the character U+00B5 <del>cannot be
represented in the encoding used</del><ins>lacks representation in the
execution character set</ins> for charT, the unit suffix "us" is used
instead of "μs". <ins>If
"μs"is used but the dynamic encoding lacks representation for U+00B5 and
the stream is associated with a terminal or console, or if the stream is
imbued with a std::codecvt facet that lacks conversion support for the
character, then the result is unspecified.</ins>

Looking for feedback, but am leaning towards filing an LWG issue.

Tom.

On 11/3/19 9:38 AM, Howard Hinnant wrote:
> If for some reason the device can’t deal with Unicode, the vendor can fall back to “us” in the basic character set.
>
> Howard
>
> On Nov 3, 2019, at 11:52 AM, Steve Downey <sdowney_at_[hidden]> wrote:
>> If the encoding is intended as the one used for literals of type charT, then locale does not have to be involved. That does run the risk of "?s" being produced instead of "μs" .
>> If the encoding is latin-7, is it supposed to produce "μs"?
>>
>> On Sun, Nov 3, 2019, 09:47 Howard Hinnant <howard.hinnant_at_[hidden]> wrote:
>>> The intent is to use Unicode to get “μs” without involving a locale. That would be UTF-8 for char, UTF-16 for a 2 byte wchar_t and UTF-32 for a 4 byte wchar_t. And if for some reason the device can’t deal with Unicode, the vendor can fall back to “us” in the basic character set. In either event, it is not intended to involve a locale, and specifically doesn’t not involve the ctype facet and widening/narrowing. I’m not sure I would call it implementation defined as the vendor isn’t required to document it. But the vendor can choose between the Unicode output, or the “us” approximation.
>>>
>>> Feel free to submit an issue, but if you do I strongly recommend suggested wording as the LWG has already been over this paragraph in detail and the current wording is a product of that review.
>>>
>>> Howard
>>>
>> On Nov 3, 2019, at 8:16 AM, Tom Honermann <tom_at_[hidden]> wrote:
>>> I just came across [time.duration.io]p4:
>>>
>>>> If Period​::​type is micro, but the character U+00B5 cannot be represented in the encoding used for charT, the unit suffix "us" is used instead of "μs".
>>> How is the determination as to whether the character can be represented to be done? It seems this would involve consulting the locale. Or is this effectively implementation defined behavior?
>>>
>>> Perhaps this is worth an LWG issue to at least clarify the behavior?
>>>
>>> Tom.
>>>
>> _______________________________________________
>> SG16 Unicode mailing list
>> Unicode_at_[hidden]
>> http://www.open-std.org/mailman/listinfo/unicode
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode



Received on 2019-11-04 01:27:13