Subject: Re: [SG16-Unicode] Hidden locale dependency in [time.duration.io]?
From: Tom Honermann (tom_at_[hidden])
Date: 2019-11-04 04:06:05
On 11/4/19 9:40 AM, Steve Downey wrote:
> I believe the wording around locale is merely warning that if Î¼s isn't
> supported by the locale associated with a stream, then the results are
> unspecified, which is true, but unhelpful, and probably does not need
> to be in the normative wording for this.
I think it is helpful to make it clear that the implementation does not
(should not) make such cases "work".
> I'm unaware of any implementation that supports checking if string
> literals are actually encodable. All implementations are requirged to
> at least track \u00b5 until literals are encoded. This sound like an
> implementation that supports targeting non-unicode encodings of
> literals, such as MSVC, will have to use "us".
I believe gcc at least will warn in cases where the source encoding and
execution encoding are not the same.
I would argue that MSVC can use "Î¼s" when compiling with the
/execution-charset:utf-8 or /utf-8 options (implicitly or explicitly)
> On Mon, Nov 4, 2019 at 9:03 AM Howard Hinnant
> <howard.hinnant_at_[hidden] <mailto:howard.hinnant_at_[hidden]>> wrote:
> On Nov 4, 2019, at 8:45 AM, Tom Honermann <tom_at_[hidden]
> <mailto:tom_at_[hidden]>> wrote:
> > On 11/4/19 7:18 AM, Howard Hinnant wrote:
> >> On Nov 4, 2019, at 12:27 AM, Tom Honermann <tom_at_[hidden]
> <mailto:tom_at_[hidden]>> wrote:
> >>> I suggest the following wording: (using terminology from P1859R0)
> >>> If Period::type is micro, but the character U+00B5
> <del>cannot be represented in the encoding used</del><ins>lacks
> representation in the execution character set</ins> for charT, the
> unit suffix "us" is used instead of "Î¼s".Â <ins>If
> >>> "Î¼s" is used but the dynamic encoding lacks representation for
> U+00B5 and the stream is associated with a terminal or console, or
> if the stream is imbued with a std::codecvt facet that lacks
> conversion support for the character, then theÂ result is
> >> I've no objection to an issue, but your proposed wording
> explicitly involves two things I'm strongly against:
> >> 1.Â Now the code has to check the locale, for this precision only.
> >> 2.Â Now the code has different behavior between cout and
> ostringstream.Â And the result of ostringstream is very commonly
> subsequently sent to cout (ostringstream is a common formatting aid).
> >> Imo, the proposed wording is much, much worse than the
> status-quo and I would vote strongly against it.
> > No, the wording I proposed doesn't check for locale.Â The
> execution character set is the character set used for string
> literals and is known at compile time; it is not the locale
> dependent run-time character set.
> Here is the processed form of what you wrote (the deletes deleted,
> the inserts inserted):
> If Period::type is micro, but the character U+00B5 lacks
> representation in the execution character set for charT, the unit
> suffix "us" is used instead of "Î¼s".Â If "Î¼s" is used but the
> dynamic encoding lacks representation for U+00B5 and the stream is
> associated with a terminal or console, or if the stream is imbued
> with a std::codecvt facet that lacks conversion support for the
> character, then theÂ result is unspecified.
> The phrase "or if the stream is imbued with a std::codecvt facet
> that..." implies that the implementation gets the locale of the
> stream, extracts the codecvt facet from it, and does something
> with it.
> I do not believe the streaming of durations of any precision
> should involve the stream's locale.
> For microseconds precision the suffix should be "Î¼s", but at the
> vendor's discretion may be "us" instead.
> I'm open to better ways of saying the sentence above.Â The above
> sentence doesn't (and shouldn't) be stream-dependent or locale
> dependent.Â It should not involve properties of the codecvt facet.
SG16 list run by herb.sutter at gmail.com