On Sat, Sep 7, 2019 at 11:39 PM Victor Zverovich via Lib <lib@lists.isocpp.org> wrote:

> if code units aren't used, then behavior should be different for LANG=C vs LANG=C.UTF-8.

In that case I agree with your proposed resolution of using code units because all of std::format is locale-independent by default by design and it would be very unfortunate to break this property and make the output depend on the global locale (or the passed locale for some overloads).

As a bit of a reminder, we spoke about this before in Rapperswil and I believe in a teleconference: for char and wchar_t, we said that to keep the design locale-independent we needed to stay with treating it as code units because there was no other reasonable interpretation that did not include dragging in a std::locale or some other unspecified dependency for measuring field width. We then said that we intend that char8_t, char16_t, and char32_t should all play by nicer rules, contingent upon getting better encoding and decoding interfaces and rudimentary Unicode support in C++.

Burdening std::format with encoding troubles now is not useful, and we are likely to get it wrong if we say things like "assume wchar_t is X, assume char is X". It's broken and we know it's broken: if we can't get weasel wording to allow it to fill a "glyph" right now (and still leave a code-unit based implementation as standards-conforming), then just go with the code unit implementation.

Besides, it's just one more reason to prefer charX_t over char/wchar_t when we get there. :D

Sincerely,

JeanHeyd