On Fri, Sep 13, 2019, 2:55 AM Thiago Macieira <thiago@macieira.org> wrote:
On Wednesday, 11 September 2019 23:05:53 PDT Henri Sivonen wrote:
> Having stuff line up on a terminal grid in the Unicode context calls
> for East Asian Width (in the mode that resolves ambiguous characters
> as narrow). The concept ignores lots of scripts, but many of those
> scripts have properties such that combining those scripts with the
> notion of having stuff line up on a grid leads to a bad time
> regardless of exact definitions.
>
> Instead of inventing something in the abstract, a good next step would
> be to figure out how (in UTF-8 mode) Apple Terminal, Gnome Terminal,
> Konsole, and the new Windows Terminal determine how many terminal
> display column a string takes. (I'm not volunteering.)

Hello Henri

First of all, thank you for the blog post you made on the encoding forms
versus EGCs and EAW. I learnt a lot and I suspect many others here did too.
And I saw it make the rounds to other SG's slack channels.

We can look into how those terminals do spacing, but it may not be sufficient.
There's also all the applications that display monospaced fonts, including
every source-code oriented text editor (whether they're IDEs or not) and
browsers. I think terminals are more constrained, because they really have a
cell grid concept (from what I remember of Konsole's source code), while
browsers and usual text renderers "just happen" to get monospace right, but I
might be wrong.

Moreover, I don't expect them to be right today. So inspecting their codebases
may just reveal that they're all wrong.

At the risk of just kicking the can down the road, isn't there something the
Unicode Consortium should provide? If there's a TR that the C++ standard and
the terminals can refer to, we can all agree on what the "right way" should
be.

Nothing useful.

Note: The East_Asian_Width property is not intended for use by modern terminal emulators without appropriate tailoring on a case-by-case basis. Such terminal emulators need a way to resolve the halfwidth/fullwidth dichotomy that is necessary for such environments, but the East_Asian_Width property does not provide an off-the-shelf solution for all situations. The growing repertoire of the Unicode Standard has long exceeded the bounds of East Asian legacy character encodings, and terminal emulations often need to be customized to support edge cases and for changes in typographical behavior over time

But short of an API to communicate width information between terminals and c++ applications magically appearing, we can only approximate.

Code Units, codepoints, EGCS and tailored EGCS.

Trying to do something more seems pointless.

I still think the only sensible approximation of text alignment is egcs.
 
> Storage implies code unit count. Do people actually use, _with
> Unicode_, fields of storage that are so fixed-width that they need to
> be padded to the full storage width _and_ do they use std::format to
> do so? (I guess anything is possible, but this seems to me like a very
> specialized niche use case whose premise is a bad idea.)

I suspect no one pads for storage. Padding is done for alignment, which is
where EAW comes into play.

Storage usually implies maximum sizes, not minimum. Take this example of C:

        char buf[21];
        sprintf(buf, "%20s", somestr);

Every good static analysis tool and even most compilers will warn you of this
use of sprintf, but it's a common mistake that novice developers will make.
That 20 does not set the maximum.

Great point

Now, for C++ we usually just allocate memory instead of trying to fit a buffer
size. And even GNU tools, which attempt to make sure there are no arbitrary
limits anywhere, do that through asprintf.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel System Software Products



_______________________________________________
SG16 Unicode mailing list
Unicode@isocpp.open-std.org
http://www.open-std.org/mailman/listinfo/unicode