sg16: Re: [SG16-Unicode] [isocpp-lib] New issue: Are std::format field widths code units, code points, or something else?

From: Billy O'Neal (VC LIBS) <"Billy>
Date: Sun, 8 Sep 2019 07:48:38 +0000

> Grapheme breaking is simple, and requires no locale info.

The encoding that goes with char* is part of the locale. Where the breaks go in a shift-jis stream is probably different than where they go in a UTF-8 stream or a latin-1 stream.

Billy3

________________________________
From: Lib <lib-bounces_at_[hidden]> on behalf of Zach Laine via Lib <lib_at_[hidden]>
Sent: Saturday, September 7, 2019 6:11:47 PM
To: Library Working Group <lib_at_[hidden]>
Cc: Zach Laine <whatwasthataddress_at_[hidden]>; Tony V E <tvaneerd_at_[hidden]>; Tom Honermann <tom_at_[hidden]>; unicode_at_[hidden] <unicode_at_[hidden]>
Subject: Re: [isocpp-lib] New issue: Are std::format field widths code units, code points, or something else?

On Sat, Sep 7, 2019 at 7:31 PM Tom Honermann via Lib <lib_at_[hidden]<mailto:lib_at_[hidden]>> wrote:
On 9/7/19 8:27 PM, Tony V E wrote:
I think we would want it to be measured in glyphs.
I agree that would be ideal, but...

Stop right there. If that's ideal, let's do that. Or at least, let's leave room for it to be done at some point. Specifying CUs now prevents the ideal from ever being realized.
Are you suggesting code points because glyphs are too hard?
I don't know how to achieve that. Field width doesn't really work for alignment unless one assumes a monospace font. We could measure in terms of extended grapheme clusters, but EGCS width has changed over time (e.g., family emoji). That makes alignment dependent on both display properties and Unicode version. And, of course, this would drag in locale dependence as well.

If you just count N=EGCs, you get the "right" answer. if your terminal shows more or less than N characters, get a new terminal. What I mean by this is that there should be no consideration of fonts.

As for the need for a locale, I don't get that. Grapheme breaking is simple, and requires no locale info. Do you mean Unicode data? Picking a version and sticking with it should be sufficient. No system that I know of has multiple Unicode versions to pick from programatically.

Zach

Received on 2019-09-08 09:48:41