C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] [isocpp-lib] New issue: Are std::format field widths code units, code points, or something else?

From: JeanHeyd Meneide <phdofthehouse_at_[hidden]>
Date: Sat, 7 Sep 2019 23:48:15 -0400
On Sat, Sep 7, 2019 at 11:39 PM Victor Zverovich via Lib <
lib_at_[hidden]> wrote:

> > if code units aren't used, then behavior should be different for LANG=C
> vs LANG=C.UTF-8.
>
> In that case I agree with your proposed resolution of using code units
> because all of std::format is locale-independent by default by design and
> it would be very unfortunate to break this property and make the output
> depend on the global locale (or the passed locale for some overloads).
>

     As a bit of a reminder, we spoke about this before in Rapperswil and I
believe in a teleconference: for char and wchar_t, we said that to keep the
design locale-independent we needed to stay with treating it as code units
because there was no other reasonable interpretation that did not include
dragging in a std::locale or some other unspecified dependency for
measuring field width. We then said that we intend that char8_t, char16_t,
and char32_t should all play by nicer rules, contingent upon getting better
encoding and decoding interfaces and rudimentary Unicode support in C++.

     Burdening std::format with encoding troubles now is not useful, and we
are likely to get it wrong if we say things like "assume wchar_t is X,
assume char is X". It's broken and we know it's broken: if we can't get
weasel wording to allow it to fill a "glyph" right now (and still leave a
code-unit based implementation as standards-conforming), then just go with
the code unit implementation.

     Besides, it's just one more reason to prefer charX_t over char/wchar_t
when we get there. :D

Sincerely,
JeanHeyd

Received on 2019-09-08 05:48:28