On 10/19/19 12:53 PM, Victor Zverovich wrote:
STATICALLY-WIDEN is not a hack, it's just a convenient pseudo-function that simplifies and formalizes a bunch of wording originated from p0355. In case of chrono it will work identically for Unicode and the only reason we didn't mention charX_t is that neither std::format nor iostreams support those. As Corentin wrote you could provide formatter specializations including for Unicode types and once we have charX_t overloads of std::format (I plan to propose these for C++23) it will just pick up those specialization. In your case STATICALLY-WIDEN won't help because you have different representation of units for different character types.

I think STATICALLY-WIDEN will work for his use case as he intends to provide distinct partial specializations for char, wchar_t, and charN_t.  Basically, Mat needs a STATICALLY-WIDEN-UTF variant that takes a string literal of some kind (presumably UTF-8) and "widens" it to UTF-16 or UTF-32.  Note the char partial specialization below (that I think should not be parameterized on CharT, just Traits).

What isn't clear to me is how implementors will implement STATICALLY-WIDEN.  Victor, do you know what techniques implementors are expected to employ?



On Thu, Oct 17, 2019 at 12:21 PM Mateusz Pusz <mateusz.pusz@gmail.com> wrote:
Hi everyone,

Right now I am in the process of designing and implementing a Physical Units library that hopefully will be a start for having such a feature in the C++ Standard Library. You can find more info on the library here: https://github.com/mpusz/units.

Recently, I started to work on the text output of quantities. Quantities consist of value and a unit symbol. The latter is a perfect use case for Unicode. Consider:

10 us        vs   10 μs
2 kg*m/s^2   vs   2 kg⋅m/s²

Before C++20 we could get away with a hack by providing Unicode characters to `char`-based types and streams, but with the introduction of `char8_t` in C++20 it seems it will be a bigger issue from now on. The library implementors will have to provide 2 separate implementations:
1. For `char`-based types (string_view, ostream) without Unicode signs
2. For Unicode char based types

However, there are a few issues here:
1. As of now, we do not have std::u8cout or even std::u8ostream. So there is really no easy way to create and use a stream for Unicode characters. So even if I implement

template<class CharT, class Traits>
friend std::basic_ostream<CharT, Traits>& operator<<(std::basic_ostream<CharT, Traits>& os, const quantity& q)

correctly, we do not have an easy way to use it.

2. In order to implement the above, I could imagine such an interface for a symbol prefix:

template<typename CharT, typename Traits, typename Prefix, typename Ratio>
inline constexpr std::basic_string_view<CharT, Traits> prefix_symbol;

and its partial specializations for different prefixes/ratios:

template<typename CharT, typename Traits>
inline constexpr std::basic_string_view<char, Traits> prefix_symbol<char, Traits, si_prefix, std::micro> = "u";
template<typename CharT, typename Traits>
inline constexpr std::basic_string_view<CharT, Traits> prefix_symbol<CharT, Traits, si_prefix, std::micro> = u8"\u00b5";  // µ
template<typename CharT, typename Traits>
inline constexpr std::basic_string_view<CharT, Traits> prefix_symbol<CharT, Traits, si_prefix, std::milli> = "m";

The problem is that the above code will not compile. Specialization for all `CharT` will not be possible to be initialized with a literal like "m". Also, there is no generic mechanism to initialize all Unicode-based versions of the type with the same literal as each of them requires a different prefix (u8, u, U). Providing a specialization for every character type here is going to be a nightmare for library authors.

To solve the second problem fmt and chrono defined something called STATICALLY-WIDEN (http://wg21.link/time.general) but it seems that it is more a specification hack rather than the implementation technique. I call it a hack as it currently addresses only `char` and `wchar_t` and does not mention Unicode characters at all as of now.

Dear SG16 members, do you have any BKMs or suggestions on how to write a library that is Unicode aware and safe in an easy and approachable way? Should we strive to provide a nice-looking representation of units for outputs that support Unicode (console, files, etc) or should we, as ever before, just support only `char` and `wchar_t` and ignore the existence of Unicode in C++?

Please keep in mind that the library is hoped to target C++23.


SG16 Unicode mailing list

SG16 Unicode mailing list