C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] [isocpp-lib] New issue: Are std::format field widths code units, code points, or something else?

From: JeanHeyd Meneide <phdofthehouse_at_[hidden]>
Date: Tue, 10 Sep 2019 11:17:37 -0400
On Tue, Sep 10, 2019 at 10:36 AM Niall Douglas <s_sourceforge_at_[hidden]>
wrote:

>
> > Perhaps it would be helpful to enumerate what we expect to be portable
> > uses of field widths. My personal take is that they are useful to
> > specify widths for fields where the content is restricted to members of
> > the basic source character set where we already have a guarantee that
> > each character can be represented with one code unit.
>
> Most programmers would use field widths for padding items so they appear
> in a grid. They would expect that 𐐗 padded to eight characters yields
> seven spaces and 𐐗, not four spaces and 𐐗 (because 𐐗 consumes four
> bytes of UTF-8).
>
> That said, as we have no idea how unicode would get rendered (0, 1, or 4
> characters for 𐐗 being the most likely), I cannot improve on your
> proposal. The situation sucks, quite frankly.
>

     One of the benefits of using code units for char and wchar_t here is
that, even if its visually wrong, its *dependably* wrong. I can pass
char-based utf8 and know exactly how to mitigate the problem if I care, and
on all platforms I will have exactly the same problem, regardless of
whether the program is deployed on a Turkish, German, or Japanese machine.
This, combined with the ability to not do anything with std::locale for
char and wchar_t, is extremely valuable (if frustrating for those who care).

     char and wchar_t are portability dead ends; let's leave it to the mess
that they are and focus on having a really good story for char8_t,
char16_t, and char32_t.

Sincerely,
JeanHeyd

Received on 2019-09-10 17:17:52