On 9/7/19 11:25 PM, Tom Honermann via Lib wrote:

On 9/7/19 9:11 PM, Zach Laine wrote:

On Sat, Sep 7, 2019 at 7:31 PM Tom Honermann via Lib <lib@lists.isocpp.org> wrote:

On 9/7/19 8:27 PM, Tony V E wrote:

I think we would want it to be measured in glyphs.

I agree that would be ideal, but...

Stop right there. If that's ideal, let's do that. Or at least, let's leave room for it to be done at some point. Specifying CUs now prevents the ideal from ever being realized.

There are other options. For example, a future extension could allow specifying what units are to be used for field width.

Are you suggesting code points because glyphs are too hard?

I don't know how to achieve that. Field width doesn't really work for alignment unless one assumes a monospace font. We could measure in terms of extended grapheme clusters, but EGCS width has changed over time (e.g., family emoji). That makes alignment dependent on both display properties and Unicode version. And, of course, this would drag in locale dependence as well.

If you just count N=EGCs, you get the "right" answer. if your terminal shows more or less than N characters, get a new terminal. What I mean by this is that there should be no consideration of fonts.

I see field width as either indicating storage (number of code units) or alignment. The number of user perceived characters is not useful for aligning text unless a monospace font is assumed. Therefore, storage seems like the more useful measurement. This also aligns with format_to_n and formatted_size which, unless I'm mistaken, work in code units. (It would be nice to clarify the wording for these as well; what is meant by "number of characters in the character representation"?)

Henri Sivonen just today posted a fantastic analysis of the various ways in which we think about the length/width of a string. Particularly relevant to this discussion is the "Display Space" section, but I encourage everyone to read the entire article. It's fascinating!
- https://hsivonen.fi/string-length

Tom.