C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] [isocpp-lib] New issue: Are std::format field widths code units, code points, or something else?

From: Tom Honermann <tom_at_[hidden]>
Date: Fri, 13 Sep 2019 12:08:53 -0400
On 9/13/19 10:35 AM, Victor Zverovich wrote:
> I'll report back my findings in a paper. It may not be solvable
> perfectly but I think we can come up with a good practical
> approximation that addresses the main use case and I'm fine with not
> addressing esoteric ones. People somehow manage to write CLIs that do
> this and work with fancy emojis and Asian scripts even in C =).

Please make sure to address some of the more funny characters in the
paper. Here are a few examples, but I'm sure there are many more.

  * U+200B { ZERO WIDTH SPACE }
  * U+2063 { INVISIBLE SEPARATOR }
  * U+2064 { INVISIBLE PLUS }
  * Half and full width characters
  * Family emoji

I tried an experiment a little while back. I thought it would be fun to
take Eric Niebler's range-v3 calendar example
(https://github.com/ericniebler/range-v3/blob/master/example/calendar.cpp)
and modify it to generate emoji for some holidays. I didn't actually go
so far as to modify his code, but rather just did a simple hack to test
output to a terminal.

$ cat cal.cpp
#include <iostream>
#include <locale>
int main() {
   std::setlocale(LC_ALL, "");
   std::cout <<
     " October November December\n"
     " 1 2 3 1 2 3 4 5 6 7 1 2 3 4 5\n"
     " 4 5 6 7 8 9 10 8 9 10 11 12 13 14 6 7 8 9 10 11 12\n"
     " 11 12 13 14 15 16 17 15 16 17 18 19 20 21 13 14 15 16 17 18 19\n"
     " 18 19 20 21 22 23 24 22 23 24 25 \xF0\x9F\xA6\x83 27 28 20 21
22 23 24 \xF0\x9F\x8E\x84 26\n"
     " 25 26 27 28 29 30 \xF0\x9F\x8E\x83 29 30 27 28
29 30 31\n";
}

Here is what konsole on Ubuntu 18.04 displays for me today:

I find it interesting that misalignment is not consistent even when font
support is not present.

I wasn't able to get font fallback working in the time I allotted to
this. The only way I could get emoji to appear was to install the
"fonts-noto-color-emoji" package and then change konsole's font to
select it. This is a proportional font, so of course everything looks
ridiculous.

Tom.

>
> - Victor
>
> On Fri, Sep 13, 2019 at 6:57 AM Niall Douglas
> <s_sourceforge_at_[hidden] <mailto:s_sourceforge_at_[hidden]>> wrote:
>
> On 13/09/2019 14:36, Victor Zverovich wrote:
> >> Instead of inventing something in the abstract, a good next
> step would
> >> be to figure out how (in UTF-8 mode) Apple Terminal, Gnome
> Terminal,
> >> Konsole, and the new Windows Terminal determine how many terminal
> >> display column a string takes. (I'm not volunteering.)
> >
> > I'm volunteering to do this since improving handling of width is
> already
> > on my TODO list for the fmt library.
>
> I'll be interested in what you come up with on this, as I don't think
> this solvable.
>
> For example, imagine formatting into a file, and then that file is
> rendered onto a console.
>
> Another example: imagine formatting into a clipboard, which on Windows
> and POSIX might involve three or four renditions into differing
> formats
> and encodings. Then the consumer of the clipboard chooses an
> unknown one
> of those renditions, and reinterprets it in some unknown way into a
> paste into some document.
>
> Personally speaking, I think the best course is to declare
> codepoint or
> byte based formatting widths, and draw a line under it.
>
> Niall
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden] <mailto:Unicode_at_[hidden]>
> http://www.open-std.org/mailman/listinfo/unicode
>
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode



Received on 2019-09-13 18:08:59