Hello,

A couple of comments on P2572R0

> Change in 22.14.2.2 [format.string.std] paragraph 11:

The intent is not clear to me here.

I don't think there are a precondition that unicode encoded strings are well-formed in format (and if there was that change is not necessary as there can be no codepoints that are not scalar value in a well-formed sequence), and if we want to enforce well-formedness i'd rather that it be stated that way.

On the other hand, if we do not intend to ensure well-formedness, we should be mindful that

if we allow ill-formed sequences, then the standard practice is to replace isolate surrogate by

� whose width is definitively 1 [1], and so the changes in 22.14.2.2 requires, in my mind further discussion and a broader solution. I would keep that a separate issue.

Unless you want to clearly state "the width of surrogates is undefined" or something like that.

Beside, I also don't think it would be useful to replace code point by scalar values in places where there is already a precondition or no possibility to have isolated surrogates, as scalar value, beside being a mouthful, is only applicable to unicode (unlike codepoint), and in places where well-formedness is desirable, "Precondition foo is a well-formed sequence in the bar encoding" would take care of scalar values.

Change in 22.14.2.2 [format.string.std] paragraph 3:

A couple of comments on P2572R0

The 🤡 (U+1F921 CLOWN FACE) emoji has an estimated width of 2. The examples above that include that character illustrate the effect of the estimated width when that character is used as a fill character as opposed to when it is used as a formatting argument.

I had trouble understanding that, I would suggest including the information that 🤡 is of width 2 directly within the example or in a separate note directly below the example.

" The examples above that include that character illustrate the effect of the estimated width when that character is used as a fill character as opposed to when it is used as a formatting argument." is probably superfluous and could be omitted.

Do we need examples with extended grapheme clusters?

ie 🐻‍❄️ has 3 codepoints, width 1, but its use as fill-option is invalid

The rest looks great to me

Thanks,

Corentin

[1] Unicode Standard 14.0 - 2.7 Unicode Strings

Whenever such strings are specified to be in a particular Unicode encoding form—even
one with the same code unit size—the string must not violate the requirements of that
encoding form. For example, isolated surrogates in a Unicode 16-bit string are not allowed
when that string is specified to be well-formed UTF-16. A number of techniques are available for dealing with an isolated surrogate, such as omitting it, converting it into U+FFFD
replacement character to produce well-formed UTF-16, or simply halting the processing of the string with an error. (See Section 3.9, Unicode Encoding Forms.)