C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] [isocpp-lib] New issue: Are std::format field widths code units, code points, or something else?

From: Thiago Macieira <thiago_at_[hidden]>
Date: Thu, 12 Sep 2019 17:55:27 -0700
On Wednesday, 11 September 2019 23:05:53 PDT Henri Sivonen wrote:
> Having stuff line up on a terminal grid in the Unicode context calls
> for East Asian Width (in the mode that resolves ambiguous characters
> as narrow). The concept ignores lots of scripts, but many of those
> scripts have properties such that combining those scripts with the
> notion of having stuff line up on a grid leads to a bad time
> regardless of exact definitions.
>
> Instead of inventing something in the abstract, a good next step would
> be to figure out how (in UTF-8 mode) Apple Terminal, Gnome Terminal,
> Konsole, and the new Windows Terminal determine how many terminal
> display column a string takes. (I'm not volunteering.)

Hello Henri

First of all, thank you for the blog post you made on the encoding forms
versus EGCs and EAW. I learnt a lot and I suspect many others here did too.
And I saw it make the rounds to other SG's slack channels.

We can look into how those terminals do spacing, but it may not be sufficient.
There's also all the applications that display monospaced fonts, including
every source-code oriented text editor (whether they're IDEs or not) and
browsers. I think terminals are more constrained, because they really have a
cell grid concept (from what I remember of Konsole's source code), while
browsers and usual text renderers "just happen" to get monospace right, but I
might be wrong.

Moreover, I don't expect them to be right today. So inspecting their codebases
may just reveal that they're all wrong.

At the risk of just kicking the can down the road, isn't there something the
Unicode Consortium should provide? If there's a TR that the C++ standard and
the terminals can refer to, we can all agree on what the "right way" should
be.

> Storage implies code unit count. Do people actually use, _with
> Unicode_, fields of storage that are so fixed-width that they need to
> be padded to the full storage width _and_ do they use std::format to
> do so? (I guess anything is possible, but this seems to me like a very
> specialized niche use case whose premise is a bad idea.)

I suspect no one pads for storage. Padding is done for alignment, which is
where EAW comes into play.

Storage usually implies maximum sizes, not minimum. Take this example of C:

        char buf[21];
        sprintf(buf, "%20s", somestr);

Every good static analysis tool and even most compilers will warn you of this
use of sprintf, but it's a common mistake that novice developers will make.
That 20 does not set the maximum.

Now, for C++ we usually just allocate memory instead of trying to fit a buffer
size. And even GNU tools, which attempt to make sure there are no arbitrary
limits anywhere, do that through asprintf.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel System Software Products

Received on 2019-09-13 02:55:31