sg16: Re: [SG16-Unicode] Unicode streams

From: Tom Honermann <tom_at_[hidden]>
Date: Thu, 17 Oct 2019 16:13:02 -0400

On 10/17/19 3:02 PM, Mateusz Pusz wrote:
> Hi everyone,
>
> Right now I am in the process of designing and implementing a Physical
> Units library that hopefully will be a start for having such a feature
> in the C++ Standard Library. You can find more info on the library
> here: https://github.com/mpusz/units.
>
> Recently, I started to work on the text output of quantities.
> Quantities consist of value and a unit symbol. The latter is a perfect
> use case for Unicode. Consider:
>
> 10 us vs 10 μs
> 2 kg*m/s^2 vs 2 kg⋅m/s²
>
> Before C++20 we could get away with a hack by providing Unicode
> characters to `char`-based types and streams, but with the
> introduction of `char8_t` in C++20 it seems it will be a bigger issue
> from now on. The library implementors will have to provide 2 separate
> implementations:
> 1. For `char`-based types (string_view, ostream) without Unicode signs
> 2. For Unicode char based types
>
> However, there are a few issues here:
> 1. As of now, we do not have std::u8cout or even std::u8ostream. So
> there is really no easy way to create and use a stream for Unicode
> characters. So even if I implement
>
> template<class CharT, class Traits>
> friend std::basic_ostream<CharT, Traits>&
> operator<<(std::basic_ostream<CharT, Traits>& os, const quantity& q)
>
> correctly, we do not have an easy way to use it.
Yes, this is a problem that we need to figure out how to solve; ideally
in the C++23 time frame.
>
> 2. In order to implement the above, I could imagine such an interface
> for a symbol prefix:
>
> template<typename CharT, typename Traits, typename Prefix, typename Ratio>
> inline constexpr std::basic_string_view<CharT, Traits> prefix_symbol;
>
> and its partial specializations for different prefixes/ratios:
>
> template<typename CharT, typename Traits>
> inline constexpr std::basic_string_view<char, Traits>
> prefix_symbol<char, Traits, si_prefix, std::micro> = "u";
> template<typename CharT, typename Traits>
> inline constexpr std::basic_string_view<CharT,
> Traits> prefix_symbol<CharT, Traits, si_prefix, std::micro> =
> u8"\u00b5"; // µ
> template<typename CharT, typename Traits>
> inline constexpr std::basic_string_view<CharT,
> Traits> prefix_symbol<CharT, Traits, si_prefix, std::milli> = "m";
>
> The problem is that the above code will not compile. Specialization
> for all `CharT` will not be possible to be initialized with a literal
> like "m". Also, there is no generic mechanism to initialize all
> Unicode-based versions of the type with the same literal as each of
> them requires a different prefix (u8, u, U). Providing a
> specialization for every character type here is going to be a
> nightmare for library authors.
>
> To solve the second problem fmt and chrono defined something called
> STATICALLY-WIDEN (http://wg21.link/time.general) but it seems that it
> is more a specification hack rather than the implementation technique.
> I call it a hack as it currently addresses only `char` and `wchar_t`
> and does not mention Unicode characters at all as of now.
The code would be ugly, but you could implement constexpr conversion
functions using the techniques used to implement the '_as_char' UDL in
https://github.com/tahonermann/char8_t-remediation/blob/master/char8_t-remediation.h.
> Dear SG16 members, do you have any BKMs or suggestions on how to write
> a library that is Unicode aware and safe in an easy and approachable
> way? Should we strive to provide a nice-looking representation of
> units for outputs that support Unicode (console, files, etc) or should
> we, as ever before, just support only `char` and `wchar_t` and ignore
> the existence of Unicode in C++?

It would be great if you could (continue to) bring us these use cases
and experiment to help us figure out how to solve these problems.

One of our highest priorities right now is to get basic transcoding
support into the standard so that we can at least better enable programs
to relatively easily use Unicode as an internal encoding. Then, a suite
of text maintenance types are next in order. Along the way we need to
be thinking about the program environment (as you are) and how we
interact with it. Unfortunately, we're having to basically build from
the ground up.

Tom.

>
> Please keep in mind that the library is hoped to target C++23.
>
> Best
>
> Mat
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode

Received on 2019-10-17 22:13:06