C++ Logo

sg16

Advanced search

Re: [SG16] Proposed resolution for LWG3639: Handling of fill character width is underspecified in std::format

From: Victor Zverovich <victor.zverovich_at_[hidden]>
Date: Wed, 1 Dec 2021 09:01:05 -0800
Yes, option 3 from the original issue (
https://cplusplus.github.io/LWG/issue3639). Anything else would require a
substantial investigation and will unlikely make it to C++23. One of the
good things about option 3 is that it can be made open to extension if
someone decides to do the work.

- Victor

On Wed, Dec 1, 2021 at 8:50 AM Corentin Jabot <corentinjabot_at_[hidden]>
wrote:

>
>
> On Wed, Dec 1, 2021 at 5:07 PM Victor Zverovich via SG16 <
> sg16_at_[hidden]> wrote:
>
>> Thanks Tom for putting this together.
>>
>> I am strongly opposed to the proposed resolution because:
>>
>> 1. It brings a novel design that has no implementation or usage
>> experience.
>> 2. It will likely have a nontrivial performance regression for the common
>> case of width = 1.
>> 3. Width > 1 will often result in misaligned output for all alignment
>> options (not just >) and none of the options to handle look satisfactory.
>>
>> Other comments:
>>
>> Misalignment can occur in all cases, not just when aligning to the end
>> (>), e.g.
>>
>> std::format(
>> "1234|123"
>> "{:🤡<4}|foo", 123);
>>
>> > It is not specified what happens when alignment to the end of the field
>> is requested, but the width of the formatted value exceeds the field width
>> thereby making such alignment impossible
>>
>> It is specified elsewhere (http://eel.is/c++draft/format.string#std-8):
>>
>> The positive-integer in width is a decimal integer defining the minimum
>> field width.
>>
>> which means that the field must overflow. That said, we might want to add
>> a clarification elsewhere.
>>
>> > That ... is probably not a good idea.
>>
>> Yes, that would be extremely novel and surprising.
>>
>> The added example is incorrect:
>>
>> *string s7 = format("{:*6>}", "12345678"); // value of s7 is
>> "12345678"*
>>
>> The format string should be "{:*>6}".
>>
>> Cheers,
>> Victor
>>
>
> So your prefered solution would be option 3 as suggested by Tom? If a fill
> character has width > 1 it reports an error then?
>
>
>
>
>>
>> On Tue, Nov 30, 2021 at 9:03 PM Tom Honermann via SG16 <
>> sg16_at_[hidden]> wrote:
>>
>>> The following is in preparation for the SG16 telecon scheduled for
>>> tomorrow.
>>>
>>> LWG3639 <https://wg21.link/lwg3639> tracks how implementations should
>>> handle fill characters that have an estimated width other than 1 (the
>>> current proposed resolution for LWG3576
>>> <https://cplusplus.github.io/LWG/issue3576> limits fill characters to
>>> those encodeable as a single code point in the ordinary literal encoding).
>>> The issue discussion records three possible ways to resolve the issue:
>>>
>>> 1. s == "🤡🤡🤡🤡42": use the estimated display width, correctly
>>> displayed on compatible terminals.
>>> 2. s == "🤡🤡🤡🤡🤡🤡🤡🤡42": assume the display width of 1,
>>> incorrectly displayed.
>>> 3. Require the fill character to have the estimated width of 1.
>>>
>>> Discussion:
>>>
>>> Assuming that we elect to allow characters with an estimated width other
>>> than 1 to be used as fill characters (e.g., we reject option 3 above), I
>>> think the normative guidance offered in [format.string.std]p11
>>> <http://eel.is/c++draft/format.string.std#11> suffices to direct
>>> implementations to choose between options 1 and 2; option 1 should be used
>>> for Unicode encodings, and option 2 otherwise. This effectively limits the
>>> options under consideration to just two.
>>>
>>> If characters that have estimated widths other than 1 are not permitted
>>> as fill characters, then whether a program is correct or not may depend on
>>> what encoding is used for the ordinary literal encoding. For example,
>>> Shift-JIS supports encoding katakana as either half-width or full-width
>>> (e.g., カ U+FF76 {HALFWIDTH KATAKANA LETTER KA} or カ U+30AB {KATAKANA LETTER
>>> KA}). Since Shift-JIS is not a Unicode encoding, an implementation may
>>> assign all such characters an estimated width of 1, but the full-width
>>> variants would have an estimated width of 2 for a Unicode encoding. I have
>>> no perspective on whether such full-width characters would ever be
>>> desirable as fill characters and thus prefer not to prohibit them without
>>> strong cause.
>>>
>>> If the estimated width of the fill character is greater than 1, then
>>> alignment to the end of the field might not be possible. For example:
>>> std::format("{:🤡>4}", 123);
>>> There are a number of options available to handle such cases:
>>>
>>> 1. Underfill the available space leaving the field misaligned.
>>> 2. Overfill the available space leaving the field misaligned.
>>> 3. Substitute the default fill character (that has an estimated
>>> width of 1) once the available space is reduced to less than the estimated
>>> width of the requested fill character.
>>> 4. Throw an exception.
>>> 5. UB.
>>>
>>> There are also two related wording omissions:
>>>
>>> 1. Table [tab.format.align] <http://eel.is/c++draft/tab:format.align>
>>> doesn't specify how alignment is achieved for the '<' and '>' options (the
>>> wording doesn't state to insert fill characters as it does for the '^'
>>> option).
>>> 2. It is not specified what happens when alignment to the end of the
>>> field is requested, but the width of the formatted value exceeds the field
>>> width thereby making such alignment impossible (it is presumably intended
>>> that the available space is overflowed resulting in misalignment;
>>> truncation, throwing an exception, or UB are alternate options).
>>> std::format("{:X>1}}, 9999);
>>>
>>> In some cases, it would be possible for field alignment to be restored
>>> by tracking underfill and overfill counts and underfilling or overfilling
>>> later fields. That ... is probably not a good idea.
>>> Proposal:
>>>
>>> - Allow characters that have an estimated width other than 1 to be
>>> used as fill characters.
>>> - Underfill the available field space when inserting an additional
>>> fill character would otherwise lead to overfill.
>>> - Do not substitute a default fill character, throw an exception, or
>>> specify UB when alignment is not possible. This is based on the assumption
>>> that misalignment is preferred over the other possibilities.
>>> - Overflow the available field space when the formatted value
>>> exceeds the field width. This is based on the assumption that outputting
>>> all available data is preferred over the other possibilities.
>>> - Do not try to cleverly count underfill and overfill and adjust
>>> later fields.
>>>
>>> Proposed Resolution:
>>>
>>> The wording below is intended to address LWG3639
>>> <https://cplusplus.github.io/LWG/issue3639> and to supersede the
>>> current proposed resolution for LWG3576
>>> <https://cplusplus.github.io/LWG/issue3576>.
>>>
>>> Change [format.string.std]p1
>>> <http://eel.is/c++draft/format.string.std#1>:
>>>
>>> [...]
>>>
>>> The syntax of format specifications is as follows:
>>>
>>> [...]
>>>
>>> *fill*:
>>>
>>> any character*member of the translation character set **([lex.charset]
>>> <http://eel.is/c++draft/lex.charset>)* other than {*U+007B LEFT CURLY
>>> BRACKET* or }*U+007D RIGHT CURLY BRACKET*
>>>
>>> [...]
>>>
>>> Change [format.string.std]p2
>>> <http://eel.is/c++draft/format.string.std#2>:
>>>
>>> *The **fill character** is the character denoted by the **fill**
>>> specifier or, if the **fill** specifier is absent, U+0020 SPACE.*
>>>
>>> [*Note 2*: The *fill* character can be any character other than { or }. The
>>> presence of a fill character*fill specifier* is signaled by the
>>> character following it, which must be one of the alignment options. If the
>>> second character of *std-format-spec* is not a valid alignment option,
>>> then it is assumed that both the fill character and the alignment
>>> option are*the fill-and-align specifier is* absent. — *end note*]
>>>
>>> Change [format.string.std]p3
>>> <http://eel.is/c++draft/format.string.std#3>:
>>>
>>> The *align* specifier applies to all argument types. The meaning of the
>>> various alignment options is as specified in Table 62
>>> <http://eel.is/c++draft/format.string.std#tab:format.align>.
>>>
>>> [*Example 1*:
>>> char c = 120;
>>> string s0 = format("{:6}", 42); // value of s0 is " 42"
>>> string s1 = format("{:6}", 'x'); // value of s1 is "x "
>>> string s2 = format("{:*<6}", 'x'); // value of s2 is "x*****"
>>> string s3 = format("{:*>6}", 'x'); // value of s3 is "*****x"
>>> string s4 = format("{:*^6}", 'x'); // value of s4 is "**x***"
>>> string s5 = format("{:6d}", c); // value of s5 is " 120"
>>> string s6 = format("{:6}", true); // value of s6 is "true "
>>> *string s7 = format("{:*6>}", "12345678"); // value of s7 is "12345678"*
>>> — *end example*]
>>>
>>> [*Note 3*: Unless a minimum field width is defined, the field width is
>>> determined by the size of the content and the alignment option has no
>>> effect. — *end note*]
>>>
>>> *[Note 4: If the width of the formatting argument value exceeds the
>>> field width, then the alignment option has no effect. *
>>>
>>>
>>> *— end note] [Note 5: It may not be possible to exactly align the
>>> formatting argument value within the available space when the fill
>>> character has an estimated width other than 1. — end note] *
>>> Table 62 <http://eel.is/c++draft/format.string.std#tab:format.align>:
>>> Meaning of *align* options [tab:format.align]
>>> <http://eel.is/c++draft/tab:format.align>
>>> *Option*
>>> *Meaning*
>>> <
>>> Forces the field to be aligned to the start of the available space* by
>>> inserting n fill characters after the formatting argument value where n is
>>> the number of fill characters needed to most closely align to the field
>>> width without exceeding it*. This is the default for non-arithmetic
>>> types, charT, and bool, unless an integer presentation type is
>>> specified.
>>> >
>>> Forces the field to be aligned to the end of the available space* by
>>> inserting n fill characters before the formatting argument value where n is
>>> the number of fill characters ** needed to most closely align to the
>>> field width without exceeding it*. This is the default for arithmetic
>>> types other than charT and bool or when an integer presentation type is
>>> specified.
>>> ^
>>> Forces the field to be centered within the available space by inserting ⌊
>>> *n*/2⌋ *fill *characters before and ⌈*n*/2⌉ *fill *characters after the *formatting
>>> argument *value, where *n* is the total number of fill characters to
>>> insert*needed to most closely align to the field width without
>>> exceeding it*.
>>>
>>> Tom.
>>> --
>>> SG16 mailing list
>>> SG16_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>

Received on 2021-12-01 11:01:19