Just a small followup on our discussion of P0645 Text Formatting during the previous meeting.

1. Interpretation of width with multibyte encodings and combining characters.

P0645R2 currently doesn't specify the units of width. Possible options are (from lower to higher abstraction level):

* Code units
* Code points
* Grapheme clusters

Python 3 uses code points as can be seen from the following example:

>>> o = b'\x6F\xCC\x88'.decode('utf8')
>>> o
'ö'
>>> '{:>2}'.format(o)
'ö' # note missing space
>>> o = b'\xC3\xB6'.decode('utf8')
>>> o
'ö'
>>> '{:>2}'.format(o)
' ö'

I have slight preference to grapheme clusters because according to Unicode Standard Annex #29 UNICODE TEXT SEGMENTATION they correspond to “user-perceived characters” (at least that seems to be the intention, whether they are successful in that is another question).

Zach provided an example of "👨\u200D👩\u200D👧\u200D👦", where \u200D is a zero-width joiner (ZWJ), rendered as a single glyph representing a family "👨‍👩‍👧‍👦". However, if I interpret the following part of http://www.unicode.org/reports/tr29/tr29-29.html#GB10 correctly:

> Do not break within emoji modifier sequences or emoji zwj sequences.

this is not a problem and "👨\u200D👩\u200D👧\u200D👦" will constitute a single grapheme cluster.

That said, making grapheme clusters width units may add significant complexity to the implementation with minor benefits, so I'm fine going with code points especially since there is an established example of doing this (Python) and it's already an improvement over stdio & iostreams.

2. Interpretation of fill.

It seems there was a general agreement that fill should be a code point but please let me know if you have other ideas.

3. There was a question about signed and unsigned char. I checked and there is no special handling for these types which means that they are treated as integral types, only char and wchar_t are treated specially as character types (and later charN_t will be added).

4. Interpretation of n in format_to_n.

There was no agreement whether n should be specified in code units or code points. An argument in favor of code units is that n often gives the output buffer size. On the other hand, using code points would be more consistent with width.

I plan to add support for specifying width and fill as code points in fmt (Zach gave some useful pointers on how to do that, thanks!) and will report back with any user feedback.

Cheers,

Victor