sg16: Re: [SG16-Unicode] Unicode streams

From: Tom Honermann <tom_at_[hidden]>
Date: Sun, 20 Oct 2019 16:03:17 -0400

On 10/19/19 12:53 PM, Victor Zverovich wrote:
> STATICALLY-WIDEN is not a hack, it's just a convenient pseudo-function
> that simplifies and formalizes a bunch of wording originated from
> p0355. In case of chrono it will work identically for Unicode and the
> only reason we didn't mention charX_t is that neither std::format nor
> iostreams support those. As Corentin wrote you could provide formatter
> specializations including for Unicode types and once we have charX_t
> overloads of std::format (I plan to propose these for C++23) it will
> just pick up those specialization. In your case STATICALLY-WIDEN won't
> help because you have different representation of units for different
> character types.

I think STATICALLY-WIDEN will work for his use case as he intends to
provide distinct partial specializations for char, wchar_t, and
charN_t. Basically, Mat needs a STATICALLY-WIDEN-UTF variant that takes
a string literal of some kind (presumably UTF-8) and "widens" it to
UTF-16 or UTF-32. Note the char partial specialization below (that I
think should not be parameterized on CharT, just Traits).

What isn't clear to me is how implementors will implement
STATICALLY-WIDEN. Victor, do you know what techniques implementors are
expected to employ?

Tom.

>
> Cheers,
> Victor
>
> On Thu, Oct 17, 2019 at 12:21 PM Mateusz Pusz <mateusz.pusz_at_[hidden]
> <mailto:mateusz.pusz_at_[hidden]>> wrote:
>
> Hi everyone,
>
> Right now I am in the process of designing and implementing a
> Physical Units library that hopefully will be a start for having
> such a feature in the C++ Standard Library. You can find more info
> on the library here: https://github.com/mpusz/units.
>
> Recently, I started to work on the text output of quantities.
> Quantities consist of value and a unit symbol. The latter is a
> perfect use case for Unicode. Consider:
>
> 10 us vs 10 μs
> 2 kg*m/s^2 vs 2 kg⋅m/s²
>
> Before C++20 we could get away with a hack by providing Unicode
> characters to `char`-based types and streams, but with the
> introduction of `char8_t` in C++20 it seems it will be a bigger
> issue from now on. The library implementors will have to provide 2
> separate implementations:
> 1. For `char`-based types (string_view, ostream) without Unicode signs
> 2. For Unicode char based types
>
> However, there are a few issues here:
> 1. As of now, we do not have std::u8cout or even std::u8ostream.
> So there is really no easy way to create and use a stream for
> Unicode characters. So even if I implement
>
> template<class CharT, class Traits>
> friend std::basic_ostream<CharT, Traits>&
> operator<<(std::basic_ostream<CharT, Traits>& os, const quantity& q)
>
> correctly, we do not have an easy way to use it.
>
> 2. In order to implement the above, I could imagine such an
> interface for a symbol prefix:
>
> template<typename CharT, typename Traits, typename Prefix,
> typename Ratio>
> inline constexpr std::basic_string_view<CharT, Traits> prefix_symbol;
>
> and its partial specializations for different prefixes/ratios:
>
> template<typename CharT, typename Traits>
> inline constexpr std::basic_string_view<char, Traits>
> prefix_symbol<char, Traits, si_prefix, std::micro> = "u";
> template<typename CharT, typename Traits>
> inline constexpr std::basic_string_view<CharT,
> Traits> prefix_symbol<CharT, Traits, si_prefix, std::micro> =
> u8"\u00b5"; // µ
> template<typename CharT, typename Traits>
> inline constexpr std::basic_string_view<CharT,
> Traits> prefix_symbol<CharT, Traits, si_prefix, std::milli> = "m";
>
> The problem is that the above code will not compile.
> Specialization for all `CharT` will not be possible to be
> initialized with a literal like "m". Also, there is no generic
> mechanism to initialize all Unicode-based versions of the type
> with the same literal as each of them requires a different prefix
> (u8, u, U). Providing a specialization for every character type
> here is going to be a nightmare for library authors.
>
> To solve the second problem fmt and chrono defined something
> called STATICALLY-WIDEN (http://wg21.link/time.general) but it
> seems that it is more a specification hack rather than the
> implementation technique. I call it a hack as it currently
> addresses only `char` and `wchar_t` and does not mention Unicode
> characters at all as of now.
>
> Dear SG16 members, do you have any BKMs or suggestions on how to
> write a library that is Unicode aware and safe in an easy and
> approachable way? Should we strive to provide a nice-looking
> representation of units for outputs that support Unicode (console,
> files, etc) or should we, as ever before, just support only `char`
> and `wchar_t` and ignore the existence of Unicode in C++?
>
> Please keep in mind that the library is hoped to target C++23.
>
> Best
>
> Mat
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden] <mailto:Unicode_at_[hidden]>
> http://www.open-std.org/mailman/listinfo/unicode
>
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode

Received on 2019-10-20 22:03:25