Date: Fri, 13 Sep 2019 12:08:53 -0400
On 9/13/19 10:35 AM, Victor Zverovich wrote:
> I'll report back my findings in a paper. It may not be solvable
> perfectly but I think we can come up with a good practical
> approximation that addresses the main use case and I'm fine with not
> addressing esoteric ones. People somehow manage to write CLIs that do
> this and work with fancy emojis and Asian scripts even in C =).
Please make sure to address some of the more funny characters in the
paper. Here are a few examples, but I'm sure there are many more.
* U+200B { ZERO WIDTH SPACE }
* U+2063 { INVISIBLE SEPARATOR }
* U+2064 { INVISIBLE PLUS }
* Half and full width characters
* Family emoji
I tried an experiment a little while back. I thought it would be fun to
take Eric Niebler's range-v3 calendar example
(https://github.com/ericniebler/range-v3/blob/master/example/calendar.cpp)
and modify it to generate emoji for some holidays. I didn't actually go
so far as to modify his code, but rather just did a simple hack to test
output to a terminal.
$ cat cal.cpp
#include <iostream>
#include <locale>
int main() {
std::setlocale(LC_ALL, "");
std::cout <<
" October November December\n"
" 1 2 3 1 2 3 4 5 6 7 1 2 3 4 5\n"
" 4 5 6 7 8 9 10 8 9 10 11 12 13 14 6 7 8 9 10 11 12\n"
" 11 12 13 14 15 16 17 15 16 17 18 19 20 21 13 14 15 16 17 18 19\n"
" 18 19 20 21 22 23 24 22 23 24 25 \xF0\x9F\xA6\x83 27 28 20 21
22 23 24 \xF0\x9F\x8E\x84 26\n"
" 25 26 27 28 29 30 \xF0\x9F\x8E\x83 29 30 27 28
29 30 31\n";
}
Here is what konsole on Ubuntu 18.04 displays for me today:
I find it interesting that misalignment is not consistent even when font
support is not present.
I wasn't able to get font fallback working in the time I allotted to
this. The only way I could get emoji to appear was to install the
"fonts-noto-color-emoji" package and then change konsole's font to
select it. This is a proportional font, so of course everything looks
ridiculous.
Tom.
>
> - Victor
>
> On Fri, Sep 13, 2019 at 6:57 AM Niall Douglas
> <s_sourceforge_at_[hidden] <mailto:s_sourceforge_at_[hidden]>> wrote:
>
> On 13/09/2019 14:36, Victor Zverovich wrote:
> >> Instead of inventing something in the abstract, a good next
> step would
> >> be to figure out how (in UTF-8 mode) Apple Terminal, Gnome
> Terminal,
> >> Konsole, and the new Windows Terminal determine how many terminal
> >> display column a string takes. (I'm not volunteering.)
> >
> > I'm volunteering to do this since improving handling of width is
> already
> > on my TODO list for the fmt library.
>
> I'll be interested in what you come up with on this, as I don't think
> this solvable.
>
> For example, imagine formatting into a file, and then that file is
> rendered onto a console.
>
> Another example: imagine formatting into a clipboard, which on Windows
> and POSIX might involve three or four renditions into differing
> formats
> and encodings. Then the consumer of the clipboard chooses an
> unknown one
> of those renditions, and reinterprets it in some unknown way into a
> paste into some document.
>
> Personally speaking, I think the best course is to declare
> codepoint or
> byte based formatting widths, and draw a line under it.
>
> Niall
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden] <mailto:Unicode_at_[hidden]>
> http://www.open-std.org/mailman/listinfo/unicode
>
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode
> I'll report back my findings in a paper. It may not be solvable
> perfectly but I think we can come up with a good practical
> approximation that addresses the main use case and I'm fine with not
> addressing esoteric ones. People somehow manage to write CLIs that do
> this and work with fancy emojis and Asian scripts even in C =).
Please make sure to address some of the more funny characters in the
paper. Here are a few examples, but I'm sure there are many more.
* U+200B { ZERO WIDTH SPACE }
* U+2063 { INVISIBLE SEPARATOR }
* U+2064 { INVISIBLE PLUS }
* Half and full width characters
* Family emoji
I tried an experiment a little while back. I thought it would be fun to
take Eric Niebler's range-v3 calendar example
(https://github.com/ericniebler/range-v3/blob/master/example/calendar.cpp)
and modify it to generate emoji for some holidays. I didn't actually go
so far as to modify his code, but rather just did a simple hack to test
output to a terminal.
$ cat cal.cpp
#include <iostream>
#include <locale>
int main() {
std::setlocale(LC_ALL, "");
std::cout <<
" October November December\n"
" 1 2 3 1 2 3 4 5 6 7 1 2 3 4 5\n"
" 4 5 6 7 8 9 10 8 9 10 11 12 13 14 6 7 8 9 10 11 12\n"
" 11 12 13 14 15 16 17 15 16 17 18 19 20 21 13 14 15 16 17 18 19\n"
" 18 19 20 21 22 23 24 22 23 24 25 \xF0\x9F\xA6\x83 27 28 20 21
22 23 24 \xF0\x9F\x8E\x84 26\n"
" 25 26 27 28 29 30 \xF0\x9F\x8E\x83 29 30 27 28
29 30 31\n";
}
Here is what konsole on Ubuntu 18.04 displays for me today:
I find it interesting that misalignment is not consistent even when font
support is not present.
I wasn't able to get font fallback working in the time I allotted to
this. The only way I could get emoji to appear was to install the
"fonts-noto-color-emoji" package and then change konsole's font to
select it. This is a proportional font, so of course everything looks
ridiculous.
Tom.
>
> - Victor
>
> On Fri, Sep 13, 2019 at 6:57 AM Niall Douglas
> <s_sourceforge_at_[hidden] <mailto:s_sourceforge_at_[hidden]>> wrote:
>
> On 13/09/2019 14:36, Victor Zverovich wrote:
> >> Instead of inventing something in the abstract, a good next
> step would
> >> be to figure out how (in UTF-8 mode) Apple Terminal, Gnome
> Terminal,
> >> Konsole, and the new Windows Terminal determine how many terminal
> >> display column a string takes. (I'm not volunteering.)
> >
> > I'm volunteering to do this since improving handling of width is
> already
> > on my TODO list for the fmt library.
>
> I'll be interested in what you come up with on this, as I don't think
> this solvable.
>
> For example, imagine formatting into a file, and then that file is
> rendered onto a console.
>
> Another example: imagine formatting into a clipboard, which on Windows
> and POSIX might involve three or four renditions into differing
> formats
> and encodings. Then the consumer of the clipboard chooses an
> unknown one
> of those renditions, and reinterprets it in some unknown way into a
> paste into some document.
>
> Personally speaking, I think the best course is to declare
> codepoint or
> byte based formatting widths, and draw a line under it.
>
> Niall
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden] <mailto:Unicode_at_[hidden]>
> http://www.open-std.org/mailman/listinfo/unicode
>
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode
Received on 2019-09-13 18:08:59