C++ Logo

sg16

Advanced search

Re: [SG16] Proposed resolution for LWG3639: Handling of fill character width is underspecified in std::format

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Wed, 1 Dec 2021 18:41:26 +0100
On Wed, Dec 1, 2021 at 6:01 PM Victor Zverovich <victor.zverovich_at_[hidden]>
wrote:

> Yes, option 3 from the original issue (
> https://cplusplus.github.io/LWG/issue3639). Anything else would require a
> substantial investigation and will unlikely make it to C++23. One of the
> good things about option 3 is that it can be made open to extension if
> someone decides to do the work.
>

Option 3 still requires to check at constexpr the width, but as specified
in the standard estimating the width doesn't seem tricky.

I gave it a crack:


*The fill character is the code point denoted by the fill specifier or, if
the fill specifier is absent, U+0020 SPACE.For a string in a Unicode
encoding, the fill **character*
* can be any scalar value other than { or }.For a string in a non-Unicode
encoding, the fill **character*
* can be any codepoint represented as a single code unit other than { or
}.If the estimated width of the fill **character** is not 1, an exception
of type format_Β­error is thrown.*

What do you'all think?


>
> - Victor
>
> On Wed, Dec 1, 2021 at 8:50 AM Corentin Jabot <corentinjabot_at_[hidden]>
> wrote:
>
>>
>>
>> On Wed, Dec 1, 2021 at 5:07 PM Victor Zverovich via SG16 <
>> sg16_at_[hidden]> wrote:
>>
>>> Thanks Tom for putting this together.
>>>
>>> I am strongly opposed to the proposed resolution because:
>>>
>>> 1. It brings a novel design that has no implementation or usage
>>> experience.
>>> 2. It will likely have a nontrivial performance regression for the
>>> common case of width = 1.
>>> 3. Width > 1 will often result in misaligned output for all alignment
>>> options (not just >) and none of the options to handle look satisfactory.
>>>
>>> Other comments:
>>>
>>> Misalignment can occur in all cases, not just when aligning to the end
>>> (>), e.g.
>>>
>>> std::format(
>>> "1234|123"
>>> "{:🀑<4}|foo", 123);
>>>
>>> > It is not specified what happens when alignment to the end of the
>>> field is requested, but the width of the formatted value exceeds the field
>>> width thereby making such alignment impossible
>>>
>>> It is specified elsewhere (http://eel.is/c++draft/format.string#std-8):
>>>
>>> The positive-integer in width is a decimal integer defining the
>>> minimum field width.
>>>
>>> which means that the field must overflow. That said, we might want to
>>> add a clarification elsewhere.
>>>
>>> > That ... is probably not a good idea.
>>>
>>> Yes, that would be extremely novel and surprising.
>>>
>>> The added example is incorrect:
>>>
>>> *string s7 = format("{:*6>}", "12345678"); // value of s7 is
>>> "12345678"*
>>>
>>> The format string should be "{:*>6}".
>>>
>>> Cheers,
>>> Victor
>>>
>>
>> So your prefered solution would be option 3 as suggested by Tom? If a
>> fill character has width > 1 it reports an error then?
>>
>>
>>
>>
>>>
>>> On Tue, Nov 30, 2021 at 9:03 PM Tom Honermann via SG16 <
>>> sg16_at_[hidden]> wrote:
>>>
>>>> The following is in preparation for the SG16 telecon scheduled for
>>>> tomorrow.
>>>>
>>>> LWG3639 <https://wg21.link/lwg3639> tracks how implementations should
>>>> handle fill characters that have an estimated width other than 1 (the
>>>> current proposed resolution for LWG3576
>>>> <https://cplusplus.github.io/LWG/issue3576> limits fill characters to
>>>> those encodeable as a single code point in the ordinary literal encoding).
>>>> The issue discussion records three possible ways to resolve the issue:
>>>>
>>>> 1. s == "🀑🀑🀑🀑42": use the estimated display width, correctly
>>>> displayed on compatible terminals.
>>>> 2. s == "🀑🀑🀑🀑🀑🀑🀑🀑42": assume the display width of 1,
>>>> incorrectly displayed.
>>>> 3. Require the fill character to have the estimated width of 1.
>>>>
>>>> Discussion:
>>>>
>>>> Assuming that we elect to allow characters with an estimated width
>>>> other than 1 to be used as fill characters (e.g., we reject option 3
>>>> above), I think the normative guidance offered in
>>>> [format.string.std]p11 <http://eel.is/c++draft/format.string.std#11>
>>>> suffices to direct implementations to choose between options 1 and 2;
>>>> option 1 should be used for Unicode encodings, and option 2 otherwise. This
>>>> effectively limits the options under consideration to just two.
>>>>
>>>> If characters that have estimated widths other than 1 are not permitted
>>>> as fill characters, then whether a program is correct or not may depend on
>>>> what encoding is used for the ordinary literal encoding. For example,
>>>> Shift-JIS supports encoding katakana as either half-width or full-width
>>>> (e.g., ο½Ά U+FF76 {HALFWIDTH KATAKANA LETTER KA} or γ‚« U+30AB {KATAKANA LETTER
>>>> KA}). Since Shift-JIS is not a Unicode encoding, an implementation may
>>>> assign all such characters an estimated width of 1, but the full-width
>>>> variants would have an estimated width of 2 for a Unicode encoding. I have
>>>> no perspective on whether such full-width characters would ever be
>>>> desirable as fill characters and thus prefer not to prohibit them without
>>>> strong cause.
>>>>
>>>> If the estimated width of the fill character is greater than 1, then
>>>> alignment to the end of the field might not be possible. For example:
>>>> std::format("{:🀑>4}", 123);
>>>> There are a number of options available to handle such cases:
>>>>
>>>> 1. Underfill the available space leaving the field misaligned.
>>>> 2. Overfill the available space leaving the field misaligned.
>>>> 3. Substitute the default fill character (that has an estimated
>>>> width of 1) once the available space is reduced to less than the estimated
>>>> width of the requested fill character.
>>>> 4. Throw an exception.
>>>> 5. UB.
>>>>
>>>> There are also two related wording omissions:
>>>>
>>>> 1. Table [tab.format.align]
>>>> <http://eel.is/c++draft/tab:format.align> doesn't specify how
>>>> alignment is achieved for the '<' and '>' options (the wording doesn't
>>>> state to insert fill characters as it does for the '^' option).
>>>> 2. It is not specified what happens when alignment to the end of
>>>> the field is requested, but the width of the formatted value exceeds the
>>>> field width thereby making such alignment impossible (it is presumably
>>>> intended that the available space is overflowed resulting in misalignment;
>>>> truncation, throwing an exception, or UB are alternate options).
>>>> std::format("{:X>1}}, 9999);
>>>>
>>>> In some cases, it would be possible for field alignment to be restored
>>>> by tracking underfill and overfill counts and underfilling or overfilling
>>>> later fields. That ... is probably not a good idea.
>>>> Proposal:
>>>>
>>>> - Allow characters that have an estimated width other than 1 to be
>>>> used as fill characters.
>>>> - Underfill the available field space when inserting an additional
>>>> fill character would otherwise lead to overfill.
>>>> - Do not substitute a default fill character, throw an exception,
>>>> or specify UB when alignment is not possible. This is based on the
>>>> assumption that misalignment is preferred over the other possibilities.
>>>> - Overflow the available field space when the formatted value
>>>> exceeds the field width. This is based on the assumption that outputting
>>>> all available data is preferred over the other possibilities.
>>>> - Do not try to cleverly count underfill and overfill and adjust
>>>> later fields.
>>>>
>>>> Proposed Resolution:
>>>>
>>>> The wording below is intended to address LWG3639
>>>> <https://cplusplus.github.io/LWG/issue3639> and to supersede the
>>>> current proposed resolution for LWG3576
>>>> <https://cplusplus.github.io/LWG/issue3576>.
>>>>
>>>> Change [format.string.std]p1
>>>> <http://eel.is/c++draft/format.string.std#1>:
>>>>
>>>> [...]
>>>>
>>>> The syntax of format specifications is as follows:
>>>>
>>>> [...]
>>>>
>>>> *fill*:
>>>>
>>>> any character*member of the translation character set **([lex.charset]
>>>> <http://eel.is/c++draft/lex.charset>)* other than {*U+007B LEFT CURLY
>>>> BRACKET* or }*U+007D RIGHT CURLY BRACKET*
>>>>
>>>> [...]
>>>>
>>>> Change [format.string.std]p2
>>>> <http://eel.is/c++draft/format.string.std#2>:
>>>>
>>>> *The **fill character** is the character denoted by the **fill**
>>>> specifier or, if the **fill** specifier is absent, U+0020 SPACE.*
>>>>
>>>> [*Note 2*: The *fill* character can be any character other than { or
>>>> }. The presence of a fill character*fill specifier* is signaled by the
>>>> character following it, which must be one of the alignment options. If the
>>>> second character of *std-format-spec* is not a valid alignment option,
>>>> then it is assumed that both the fill character and the alignment
>>>> option are*the fill-and-align specifier is* absent. β€” *end note*]
>>>>
>>>> Change [format.string.std]p3
>>>> <http://eel.is/c++draft/format.string.std#3>:
>>>>
>>>> The *align* specifier applies to all argument types. The meaning of
>>>> the various alignment options is as specified in Table 62
>>>> <http://eel.is/c++draft/format.string.std#tab:format.align>.
>>>>
>>>> [*Example 1*:
>>>> char c = 120;
>>>> string s0 = format("{:6}", 42); // value of s0 is " 42"
>>>> string s1 = format("{:6}", 'x'); // value of s1 is "x "
>>>> string s2 = format("{:*<6}", 'x'); // value of s2 is "x*****"
>>>> string s3 = format("{:*>6}", 'x'); // value of s3 is "*****x"
>>>> string s4 = format("{:*^6}", 'x'); // value of s4 is "**x***"
>>>> string s5 = format("{:6d}", c); // value of s5 is " 120"
>>>> string s6 = format("{:6}", true); // value of s6 is "true "
>>>> *string s7 = format("{:*6>}", "12345678"); // value of s7 is "12345678"*
>>>> β€” *end example*]
>>>>
>>>> [*Note 3*: Unless a minimum field width is defined, the field width is
>>>> determined by the size of the content and the alignment option has no
>>>> effect. β€” *end note*]
>>>>
>>>> *[Note 4: If the width of the formatting argument value exceeds the
>>>> field width, then the alignment option has no effect. *
>>>>
>>>>
>>>> *β€” end note] [Note 5: It may not be possible to exactly align the
>>>> formatting argument value within the available space when the fill
>>>> character has an estimated width other than 1. β€” end note] *
>>>> Table 62 <http://eel.is/c++draft/format.string.std#tab:format.align>:
>>>> Meaning of *align* options [tab:format.align]
>>>> <http://eel.is/c++draft/tab:format.align>
>>>> *Option*
>>>> *Meaning*
>>>> <
>>>> Forces the field to be aligned to the start of the available space* by
>>>> inserting n fill characters after the formatting argument value where n is
>>>> the number of fill characters needed to most closely align to the field
>>>> width without exceeding it*. This is the default for non-arithmetic
>>>> types, charT, and bool, unless an integer presentation type is
>>>> specified.
>>>> >
>>>> Forces the field to be aligned to the end of the available space* by
>>>> inserting n fill characters before the formatting argument value where n is
>>>> the number of fill characters ** needed to most closely align to the
>>>> field width without exceeding it*. This is the default for arithmetic
>>>> types other than charT and bool or when an integer presentation type
>>>> is specified.
>>>> ^
>>>> Forces the field to be centered within the available space by inserting
>>>> ⌊*n*/2βŒ‹ *fill *characters before and ⌈*n*/2βŒ‰ *fill *characters after
>>>> the *formatting argument *value, where *n* is the total number of fill
>>>> characters to insert*needed to most closely align to the field width
>>>> without exceeding it*.
>>>>
>>>> Tom.
>>>> --
>>>> SG16 mailing list
>>>> SG16_at_[hidden]
>>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>>
>>> --
>>> SG16 mailing list
>>> SG16_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>
>>

Received on 2021-12-01 11:41:44