sg16: Re: [SG16-Unicode] [isocpp-lib] New issue: Are std::format field widths code units, code points, or something else?

From: Tom Honermann <tom_at_[hidden]>
Date: Mon, 9 Sep 2019 15:29:41 -0400

On 9/9/19 3:26 AM, Corentin wrote:
>
> On Mon, Sep 9, 2019, 4:34 AM Tom Honermann <tom_at_[hidden]
> <mailto:tom_at_[hidden]>> wrote:
>
> My preferred direction for exploration is a future extension that
> enables opt-in to field widths that are encoding dependent (and
> therefore locale dependent for char and wchar_t). For example
> (using 'L' appended to the width; 'L' doesn't conflict with the
> existing type options):
>
> std::format("{:3L}", "\xC3\x81"); // produces "\xC3\x81\x20\x20";
> 3 EGCs.
>
> std::format("{:3L}", "ch"); what does that produces?
"ch " (one trailing space). The implied constraint with respect to
literals is that they must be compatible with whatever the locale
dependent encoding is. If your question was intended to ask whether
transliteration should occur here or whether "ch" might be presented
with a ligature, well that is yet another dimension of why field widths
don't really work for aligning text (in general, it works just fine for
characters for which one code unit == one code point == one glyph that
can be presented in a monospace font).
> Locale specifiers should only affect region specific rules, not
> whether something is interpreted as bytes or not
Ideally I agree, but that isn't the reality we are faced with.
>
> But again, I'm far from convinced that this is actually useful
> since EGCs don't suffice to ensure an aligned result anyway as
> nicely described in Henri's post (https://hsivonen.fi/string-length).
>
> Agreed but i think you know that code units is the least useful option
> in this case and i am concerned about choosing a bad option to make a
> fix easy.

I didn't propose code units in order to make an easy fix. The intent
was to choose the best option given the trade offs involved. Since none
of code units, code points, scalar values, or EGCs would result in
reliable alignment and most uses of such alignment (e.g., via printf)
are used in situations where characters outside the basic source
character set are unlikely to appear [citation needed], I felt that
avoiding the locale dependency was the more important goal.

Tom.

Received on 2019-09-09 21:29:48