Date: Tue, 10 Sep 2019 11:17:37 -0400
On Tue, Sep 10, 2019 at 10:36 AM Niall Douglas <s_sourceforge_at_[hidden]>
wrote:
>
> > Perhaps it would be helpful to enumerate what we expect to be portable
> > uses of field widths. My personal take is that they are useful to
> > specify widths for fields where the content is restricted to members of
> > the basic source character set where we already have a guarantee that
> > each character can be represented with one code unit.
>
> Most programmers would use field widths for padding items so they appear
> in a grid. They would expect that 𐐗 padded to eight characters yields
> seven spaces and 𐐗, not four spaces and 𐐗 (because 𐐗 consumes four
> bytes of UTF-8).
>
> That said, as we have no idea how unicode would get rendered (0, 1, or 4
> characters for 𐐗 being the most likely), I cannot improve on your
> proposal. The situation sucks, quite frankly.
>
One of the benefits of using code units for char and wchar_t here is
that, even if its visually wrong, its *dependably* wrong. I can pass
char-based utf8 and know exactly how to mitigate the problem if I care, and
on all platforms I will have exactly the same problem, regardless of
whether the program is deployed on a Turkish, German, or Japanese machine.
This, combined with the ability to not do anything with std::locale for
char and wchar_t, is extremely valuable (if frustrating for those who care).
char and wchar_t are portability dead ends; let's leave it to the mess
that they are and focus on having a really good story for char8_t,
char16_t, and char32_t.
Sincerely,
JeanHeyd
wrote:
>
> > Perhaps it would be helpful to enumerate what we expect to be portable
> > uses of field widths. My personal take is that they are useful to
> > specify widths for fields where the content is restricted to members of
> > the basic source character set where we already have a guarantee that
> > each character can be represented with one code unit.
>
> Most programmers would use field widths for padding items so they appear
> in a grid. They would expect that 𐐗 padded to eight characters yields
> seven spaces and 𐐗, not four spaces and 𐐗 (because 𐐗 consumes four
> bytes of UTF-8).
>
> That said, as we have no idea how unicode would get rendered (0, 1, or 4
> characters for 𐐗 being the most likely), I cannot improve on your
> proposal. The situation sucks, quite frankly.
>
One of the benefits of using code units for char and wchar_t here is
that, even if its visually wrong, its *dependably* wrong. I can pass
char-based utf8 and know exactly how to mitigate the problem if I care, and
on all platforms I will have exactly the same problem, regardless of
whether the program is deployed on a Turkish, German, or Japanese machine.
This, combined with the ability to not do anything with std::locale for
char and wchar_t, is extremely valuable (if frustrating for those who care).
char and wchar_t are portability dead ends; let's leave it to the mess
that they are and focus on having a really good story for char8_t,
char16_t, and char32_t.
Sincerely,
JeanHeyd
Received on 2019-09-10 17:17:52