C++ Logo

sg16

Advanced search

Re: [SG16] Proposed resolution for LWG3639: Handling of fill character width is underspecified in std::format

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Wed, 1 Dec 2021 17:50:30 +0100
On Wed, Dec 1, 2021 at 5:07 PM Victor Zverovich via SG16 <
sg16_at_[hidden]> wrote:

> Thanks Tom for putting this together.
>
> I am strongly opposed to the proposed resolution because:
>
> 1. It brings a novel design that has no implementation or usage experience.
> 2. It will likely have a nontrivial performance regression for the common
> case of width = 1.
> 3. Width > 1 will often result in misaligned output for all alignment
> options (not just >) and none of the options to handle look satisfactory.
>
> Other comments:
>
> Misalignment can occur in all cases, not just when aligning to the end
> (>), e.g.
>
> std::format(
> "1234|123"
> "{:🤡<4}|foo", 123);
>
> > It is not specified what happens when alignment to the end of the field
> is requested, but the width of the formatted value exceeds the field width
> thereby making such alignment impossible
>
> It is specified elsewhere (http://eel.is/c++draft/format.string#std-8):
>
> The positive-integer in width is a decimal integer defining the minimum
> field width.
>
> which means that the field must overflow. That said, we might want to add
> a clarification elsewhere.
>
> > That ... is probably not a good idea.
>
> Yes, that would be extremely novel and surprising.
>
> The added example is incorrect:
>
> *string s7 = format("{:*6>}", "12345678"); // value of s7 is "12345678"*
>
> The format string should be "{:*>6}".
>
> Cheers,
> Victor
>

So your prefered solution would be option 3 as suggested by Tom? If a fill
character has width > 1 it reports an error then?




>
> On Tue, Nov 30, 2021 at 9:03 PM Tom Honermann via SG16 <
> sg16_at_[hidden]> wrote:
>
>> The following is in preparation for the SG16 telecon scheduled for
>> tomorrow.
>>
>> LWG3639 <https://wg21.link/lwg3639> tracks how implementations should
>> handle fill characters that have an estimated width other than 1 (the
>> current proposed resolution for LWG3576
>> <https://cplusplus.github.io/LWG/issue3576> limits fill characters to
>> those encodeable as a single code point in the ordinary literal encoding).
>> The issue discussion records three possible ways to resolve the issue:
>>
>> 1. s == "🤡🤡🤡🤡42": use the estimated display width, correctly
>> displayed on compatible terminals.
>> 2. s == "🤡🤡🤡🤡🤡🤡🤡🤡42": assume the display width of 1,
>> incorrectly displayed.
>> 3. Require the fill character to have the estimated width of 1.
>>
>> Discussion:
>>
>> Assuming that we elect to allow characters with an estimated width other
>> than 1 to be used as fill characters (e.g., we reject option 3 above), I
>> think the normative guidance offered in [format.string.std]p11
>> <http://eel.is/c++draft/format.string.std#11> suffices to direct
>> implementations to choose between options 1 and 2; option 1 should be used
>> for Unicode encodings, and option 2 otherwise. This effectively limits the
>> options under consideration to just two.
>>
>> If characters that have estimated widths other than 1 are not permitted
>> as fill characters, then whether a program is correct or not may depend on
>> what encoding is used for the ordinary literal encoding. For example,
>> Shift-JIS supports encoding katakana as either half-width or full-width
>> (e.g., カ U+FF76 {HALFWIDTH KATAKANA LETTER KA} or カ U+30AB {KATAKANA LETTER
>> KA}). Since Shift-JIS is not a Unicode encoding, an implementation may
>> assign all such characters an estimated width of 1, but the full-width
>> variants would have an estimated width of 2 for a Unicode encoding. I have
>> no perspective on whether such full-width characters would ever be
>> desirable as fill characters and thus prefer not to prohibit them without
>> strong cause.
>>
>> If the estimated width of the fill character is greater than 1, then
>> alignment to the end of the field might not be possible. For example:
>> std::format("{:🤡>4}", 123);
>> There are a number of options available to handle such cases:
>>
>> 1. Underfill the available space leaving the field misaligned.
>> 2. Overfill the available space leaving the field misaligned.
>> 3. Substitute the default fill character (that has an estimated width
>> of 1) once the available space is reduced to less than the estimated width
>> of the requested fill character.
>> 4. Throw an exception.
>> 5. UB.
>>
>> There are also two related wording omissions:
>>
>> 1. Table [tab.format.align] <http://eel.is/c++draft/tab:format.align>
>> doesn't specify how alignment is achieved for the '<' and '>' options (the
>> wording doesn't state to insert fill characters as it does for the '^'
>> option).
>> 2. It is not specified what happens when alignment to the end of the
>> field is requested, but the width of the formatted value exceeds the field
>> width thereby making such alignment impossible (it is presumably intended
>> that the available space is overflowed resulting in misalignment;
>> truncation, throwing an exception, or UB are alternate options).
>> std::format("{:X>1}}, 9999);
>>
>> In some cases, it would be possible for field alignment to be restored by
>> tracking underfill and overfill counts and underfilling or overfilling
>> later fields. That ... is probably not a good idea.
>> Proposal:
>>
>> - Allow characters that have an estimated width other than 1 to be
>> used as fill characters.
>> - Underfill the available field space when inserting an additional
>> fill character would otherwise lead to overfill.
>> - Do not substitute a default fill character, throw an exception, or
>> specify UB when alignment is not possible. This is based on the assumption
>> that misalignment is preferred over the other possibilities.
>> - Overflow the available field space when the formatted value exceeds
>> the field width. This is based on the assumption that outputting all
>> available data is preferred over the other possibilities.
>> - Do not try to cleverly count underfill and overfill and adjust
>> later fields.
>>
>> Proposed Resolution:
>>
>> The wording below is intended to address LWG3639
>> <https://cplusplus.github.io/LWG/issue3639> and to supersede the current
>> proposed resolution for LWG3576
>> <https://cplusplus.github.io/LWG/issue3576>.
>>
>> Change [format.string.std]p1 <http://eel.is/c++draft/format.string.std#1>
>> :
>>
>> [...]
>>
>> The syntax of format specifications is as follows:
>>
>> [...]
>>
>> *fill*:
>>
>> any character*member of the translation character set **([lex.charset]
>> <http://eel.is/c++draft/lex.charset>)* other than {*U+007B LEFT CURLY
>> BRACKET* or }*U+007D RIGHT CURLY BRACKET*
>>
>> [...]
>>
>> Change [format.string.std]p2 <http://eel.is/c++draft/format.string.std#2>
>> :
>>
>> *The **fill character** is the character denoted by the **fill**
>> specifier or, if the **fill** specifier is absent, U+0020 SPACE.*
>>
>> [*Note 2*: The *fill* character can be any character other than { or }. The
>> presence of a fill character*fill specifier* is signaled by the
>> character following it, which must be one of the alignment options. If the
>> second character of *std-format-spec* is not a valid alignment option,
>> then it is assumed that both the fill character and the alignment option
>> are*the fill-and-align specifier is* absent. — *end note*]
>>
>> Change [format.string.std]p3 <http://eel.is/c++draft/format.string.std#3>
>> :
>>
>> The *align* specifier applies to all argument types. The meaning of the
>> various alignment options is as specified in Table 62
>> <http://eel.is/c++draft/format.string.std#tab:format.align>.
>>
>> [*Example 1*:
>> char c = 120;
>> string s0 = format("{:6}", 42); // value of s0 is " 42"
>> string s1 = format("{:6}", 'x'); // value of s1 is "x "
>> string s2 = format("{:*<6}", 'x'); // value of s2 is "x*****"
>> string s3 = format("{:*>6}", 'x'); // value of s3 is "*****x"
>> string s4 = format("{:*^6}", 'x'); // value of s4 is "**x***"
>> string s5 = format("{:6d}", c); // value of s5 is " 120"
>> string s6 = format("{:6}", true); // value of s6 is "true "
>> *string s7 = format("{:*6>}", "12345678"); // value of s7 is "12345678"*
>> — *end example*]
>>
>> [*Note 3*: Unless a minimum field width is defined, the field width is
>> determined by the size of the content and the alignment option has no
>> effect. — *end note*]
>>
>> *[Note 4: If the width of the formatting argument value exceeds the field
>> width, then the alignment option has no effect. *
>>
>>
>> *— end note] [Note 5: It may not be possible to exactly align the
>> formatting argument value within the available space when the fill
>> character has an estimated width other than 1. — end note] *
>> Table 62 <http://eel.is/c++draft/format.string.std#tab:format.align>:
>> Meaning of *align* options [tab:format.align]
>> <http://eel.is/c++draft/tab:format.align>
>> *Option*
>> *Meaning*
>> <
>> Forces the field to be aligned to the start of the available space* by
>> inserting n fill characters after the formatting argument value where n is
>> the number of fill characters needed to most closely align to the field
>> width without exceeding it*. This is the default for non-arithmetic
>> types, charT, and bool, unless an integer presentation type is specified.
>> >
>> Forces the field to be aligned to the end of the available space* by
>> inserting n fill characters before the formatting argument value where n is
>> the number of fill characters ** needed to most closely align to the
>> field width without exceeding it*. This is the default for arithmetic
>> types other than charT and bool or when an integer presentation type is
>> specified.
>> ^
>> Forces the field to be centered within the available space by inserting ⌊
>> *n*/2⌋ *fill *characters before and ⌈*n*/2⌉ *fill *characters after the *formatting
>> argument *value, where *n* is the total number of fill characters to
>> insert*needed to most closely align to the field width without exceeding
>> it*.
>>
>> Tom.
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2021-12-01 10:50:47