On 9/7/19 9:11 PM, Zach Laine wrote:

On Sat, Sep 7, 2019 at 7:31 PM Tom Honermann via Lib <lib@lists.isocpp.org> wrote:

On 9/7/19 8:27 PM, Tony V E wrote:

I think we would want it to be measured in glyphs.

I agree that would be ideal, but...

Stop right there. If that's ideal, let's do that. Or at least, let's leave room for it to be done at some point. Specifying CUs now prevents the ideal from ever being realized.

There are other options. For example, a future extension could allow specifying what units are to be used for field width.

Are you suggesting code points because glyphs are too hard?

I don't know how to achieve that. Field width doesn't really work for alignment unless one assumes a monospace font. We could measure in terms of extended grapheme clusters, but EGCS width has changed over time (e.g., family emoji). That makes alignment dependent on both display properties and Unicode version. And, of course, this would drag in locale dependence as well.

If you just count N=EGCs, you get the "right" answer. if your terminal shows more or less than N characters, get a new terminal. What I mean by this is that there should be no consideration of fonts.

I see field width as either indicating storage (number of code units) or alignment. The number of user perceived characters is not useful for aligning text unless a monospace font is assumed. Therefore, storage seems like the more useful measurement. This also aligns with format_to_n and formatted_size which, unless I'm mistaken, work in code units. (It would be nice to clarify the wording for these as well; what is meant by "number of characters in the character representation"?)

As for the need for a locale, I don't get that. Grapheme breaking is simple, and requires no locale info. Do you mean Unicode data? Picking a version and sticking with it should be sufficient. No system that I know of has multiple Unicode versions to pick from programatically.

For char and wchar_t, encoding is locale dependent. Think POSIX LANG=C (probably ASCII or ISO-8859-1) vs LANG=C.UTF-8.

Zach