C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] [isocpp-lib] New issue: Are std::format field widths code units, code points, or something else?

From: Corentin <corentin.jabot_at_[hidden]>
Date: Fri, 13 Sep 2019 07:49:30 +0200
On Fri, Sep 13, 2019, 2:55 AM Thiago Macieira <thiago_at_[hidden]> wrote:

> On Wednesday, 11 September 2019 23:05:53 PDT Henri Sivonen wrote:
> > Having stuff line up on a terminal grid in the Unicode context calls
> > for East Asian Width (in the mode that resolves ambiguous characters
> > as narrow). The concept ignores lots of scripts, but many of those
> > scripts have properties such that combining those scripts with the
> > notion of having stuff line up on a grid leads to a bad time
> > regardless of exact definitions.
> >
> > Instead of inventing something in the abstract, a good next step would
> > be to figure out how (in UTF-8 mode) Apple Terminal, Gnome Terminal,
> > Konsole, and the new Windows Terminal determine how many terminal
> > display column a string takes. (I'm not volunteering.)
>
> Hello Henri
>
> First of all, thank you for the blog post you made on the encoding forms
> versus EGCs and EAW. I learnt a lot and I suspect many others here did
> too.
> And I saw it make the rounds to other SG's slack channels.
>
> We can look into how those terminals do spacing, but it may not be
> sufficient.
> There's also all the applications that display monospaced fonts, including
> every source-code oriented text editor (whether they're IDEs or not) and
> browsers. I think terminals are more constrained, because they really have
> a
> cell grid concept (from what I remember of Konsole's source code), while
> browsers and usual text renderers "just happen" to get monospace right,
> but I
> might be wrong.
>
> Moreover, I don't expect them to be right today. So inspecting their
> codebases
> may just reveal that they're all wrong.
>
> At the risk of just kicking the can down the road, isn't there something
> the
> Unicode Consortium should provide? If there's a TR that the C++ standard
> and
> the terminals can refer to, we can all agree on what the "right way"
> should
> be.
>

Nothing useful.

*Note:* The East_Asian_Width property is not intended for use by modern
terminal emulators without appropriate tailoring on a case-by-case basis.
Such terminal emulators need a way to resolve the halfwidth/fullwidth
dichotomy that is necessary for such environments, but the East_Asian_Width
property does not provide an off-the-shelf solution for all situations. The
growing repertoire of the Unicode Standard has long exceeded the bounds of
East Asian legacy character encodings, and terminal emulations often need
to be customized to support edge cases and for changes in typographical
behavior over time

But short of an API to communicate width information between terminals and
c++ applications magically appearing, we can only approximate.

Code Units, codepoints, EGCS and tailored EGCS.

Trying to do something more seems pointless.

I still think the only sensible approximation of text alignment is egcs.

>
> > Storage implies code unit count. Do people actually use, _with
> > Unicode_, fields of storage that are so fixed-width that they need to
> > be padded to the full storage width _and_ do they use std::format to
> > do so? (I guess anything is possible, but this seems to me like a very
> > specialized niche use case whose premise is a bad idea.)
>
> I suspect no one pads for storage. Padding is done for alignment, which is
> where EAW comes into play.
>
> Storage usually implies maximum sizes, not minimum. Take this example of C:
>
> char buf[21];
> sprintf(buf, "%20s", somestr);
>
> Every good static analysis tool and even most compilers will warn you of
> this
> use of sprintf, but it's a common mistake that novice developers will
> make.
> That 20 does not set the maximum.
>

Great point

>
> Now, for C++ we usually just allocate memory instead of trying to fit a
> buffer
> size. And even GNU tools, which attempt to make sure there are no
> arbitrary
> limits anywhere, do that through asprintf.
>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> Software Architect - Intel System Software Products
>
>
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode
>

Received on 2019-09-13 08:05:19