C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] Unicode streams

From: Victor Zverovich <victor.zverovich_at_[hidden]>
Date: Sun, 20 Oct 2019 13:45:44 -0700
> do you know what techniques implementors are expected to employ?

One obvious technique that comes to mind is providing specializations of
some class template parameterized on the character type with static members
returning necessary strings (general solution). If wide and narrow
encodings have compatible representations of characters used then all of
this can be replaced with a simple copy. Might be worth checking what the
date library does.

- Victor

On Sun, Oct 20, 2019 at 1:03 PM Tom Honermann <tom_at_[hidden]> wrote:

> On 10/19/19 12:53 PM, Victor Zverovich wrote:
>
> STATICALLY-WIDEN is not a hack, it's just a convenient pseudo-function
> that simplifies and formalizes a bunch of wording originated from p0355. In
> case of chrono it will work identically for Unicode and the only reason we
> didn't mention charX_t is that neither std::format nor iostreams support
> those. As Corentin wrote you could provide formatter specializations
> including for Unicode types and once we have charX_t overloads of
> std::format (I plan to propose these for C++23) it will just pick up those
> specialization. In your case STATICALLY-WIDEN won't help because you have
> different representation of units for different character types.
>
> I think STATICALLY-WIDEN will work for his use case as he intends to
> provide distinct partial specializations for char, wchar_t, and charN_t.
> Basically, Mat needs a STATICALLY-WIDEN-UTF variant that takes a string
> literal of some kind (presumably UTF-8) and "widens" it to UTF-16 or
> UTF-32. Note the char partial specialization below (that I think should
> not be parameterized on CharT, just Traits).
>
> What isn't clear to me is how implementors will implement
> STATICALLY-WIDEN. Victor, do you know what techniques implementors are
> expected to employ?
>
> Tom.
>
>
> Cheers,
> Victor
>
> On Thu, Oct 17, 2019 at 12:21 PM Mateusz Pusz <mateusz.pusz_at_[hidden]>
> wrote:
>
>> Hi everyone,
>>
>> Right now I am in the process of designing and implementing a Physical
>> Units library that hopefully will be a start for having such a feature in
>> the C++ Standard Library. You can find more info on the library here:
>> https://github.com/mpusz/units.
>>
>> Recently, I started to work on the text output of quantities. Quantities
>> consist of value and a unit symbol. The latter is a perfect use case for
>> Unicode. Consider:
>>
>> 10 us vs 10 μs
>> 2 kg*m/s^2 vs 2 kg⋅m/s²
>>
>> Before C++20 we could get away with a hack by providing Unicode
>> characters to `char`-based types and streams, but with the introduction of
>> `char8_t` in C++20 it seems it will be a bigger issue from now on. The
>> library implementors will have to provide 2 separate implementations:
>> 1. For `char`-based types (string_view, ostream) without Unicode signs
>> 2. For Unicode char based types
>>
>> However, there are a few issues here:
>> 1. As of now, we do not have std::u8cout or even std::u8ostream. So
>> there is really no easy way to create and use a stream for Unicode
>> characters. So even if I implement
>>
>> template<class CharT, class Traits>
>> friend std::basic_ostream<CharT, Traits>&
>> operator<<(std::basic_ostream<CharT, Traits>& os, const quantity& q)
>>
>> correctly, we do not have an easy way to use it.
>>
>> 2. In order to implement the above, I could imagine such an interface for
>> a symbol prefix:
>>
>> template<typename CharT, typename Traits, typename Prefix, typename Ratio>
>> inline constexpr std::basic_string_view<CharT, Traits> prefix_symbol;
>>
>> and its partial specializations for different prefixes/ratios:
>>
>> template<typename CharT, typename Traits>
>> inline constexpr std::basic_string_view<char, Traits> prefix_symbol<char,
>> Traits, si_prefix, std::micro> = "u";
>> template<typename CharT, typename Traits>
>> inline constexpr std::basic_string_view<CharT,
>> Traits> prefix_symbol<CharT, Traits, si_prefix, std::micro> = u8"\u00b5";
>> // µ
>> template<typename CharT, typename Traits>
>> inline constexpr std::basic_string_view<CharT,
>> Traits> prefix_symbol<CharT, Traits, si_prefix, std::milli> = "m";
>>
>> The problem is that the above code will not compile. Specialization for
>> all `CharT` will not be possible to be initialized with a literal like "m".
>> Also, there is no generic mechanism to initialize all Unicode-based
>> versions of the type with the same literal as each of them requires a
>> different prefix (u8, u, U). Providing a specialization for every character
>> type here is going to be a nightmare for library authors.
>>
>> To solve the second problem fmt and chrono defined something called
>> STATICALLY-WIDEN (http://wg21.link/time.general) but it seems that it is
>> more a specification hack rather than the implementation technique. I call
>> it a hack as it currently addresses only `char` and `wchar_t` and does not
>> mention Unicode characters at all as of now.
>>
>> Dear SG16 members, do you have any BKMs or suggestions on how to write a
>> library that is Unicode aware and safe in an easy and approachable way?
>> Should we strive to provide a nice-looking representation of units for
>> outputs that support Unicode (console, files, etc) or should we, as ever
>> before, just support only `char` and `wchar_t` and ignore the existence of
>> Unicode in C++?
>>
>> Please keep in mind that the library is hoped to target C++23.
>>
>> Best
>>
>> Mat
>> _______________________________________________
>> SG16 Unicode mailing list
>> Unicode_at_[hidden]
>> http://www.open-std.org/mailman/listinfo/unicode
>>
>
> _______________________________________________
> SG16 Unicode mailing listUnicode_at_[hidden]://www.open-std.org/mailman/listinfo/unicode
>
>
>

Received on 2019-10-20 22:45:58