On the specific point mentioned in the title of this email thread:
Did we actually specify something for C++23 that depends on or provides the breaking algorithms?
std::format width estimation requires clustering
For a sequence of characters in UTF-8, UTF-16, or UTF-32, an implementation should use as its field width the sum of the field widths of the first code point of each extended grapheme cluster. Extended grapheme clusters are defined by UAX #29 of the Unicode Standard. […]
That is should, not shall, so is there really a conformance requirement here, regardless of the version meant by the phrase “the Unicode Standard”?
- Hubert asked why the reference for extended grapheme cluster is non-normative.
- Jens replied that he thinks UAX #29 is only referenced to satisfy normative encouragement for an implementation direction.
- Charlie expressed agreement with Jens' recollection.
Yes, that's a good point. If I use the field width of the first code point in <some cluster that bears a resemblance to an extended grapheme cluster as described by Unicode> then that's still conforming.
So a best effort, or an older definition of extended grapheme cluster, is better than nothing.