C++ Logo

sg16

Advanced search

Re: [SG16] Proposed resolution for LWG3639: Handling of fill character width is underspecified in std::format

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 1 Dec 2021 14:25:14 -0500
On 12/1/21 11:06 AM, Victor Zverovich wrote:
> Thanks Tom for putting this together.
>
> I am strongly opposed to the proposed resolution because:
>
> 1. It brings a novel design that has no implementation or usage
> experience.
> 2. It will likely have a nontrivial performance regression for the
> common case of width = 1.
> 3. Width > 1 will often result in misaligned output for all alignment
> options (not just >) and none of the options to handle look satisfactory.
I'm fine with option 3. I proposed option 1 (option 2 for the
non-Unicode case) because I expected that to have consensus, but I'm
content with either approach.
>
> Other comments:
>
> Misalignment can occur in all cases, not just when aligning to the end
> (>), e.g.
>
> std::format(
> "1234|123"
> "{:🤡<4}|foo", 123);
The formatted value would still appear aligned at the beginning of the
available space (which may not be itself aligned due to overflow of
prior fields).
>
> > It is not specified what happens when alignment to the end of the
> field is requested, but the width of the formatted value exceeds the
> field width thereby making such alignment impossible
>
> It is specified elsewhere (http://eel.is/c++draft/format.string#std-8
> <http://eel.is/c++draft/format.string#std-8>):
>
> The positive-integer in width is a decimal integer defining the
> minimum field width.
>
> which means that the field must overflow. That said, we might want to
> add a clarification elsewhere.
That doesn't actually state what happens when the minimum field width is
exceeded though. Is there wording elsewhere that requires overflow
(perhaps implied)?
>
> > That ... is probably not a good idea.
>
> Yes, that would be extremely novel and surprising.
>
> The added example is incorrect:
>
> _string s7 = format("{:*6>}", "12345678"); // value of s7 is "12345678"_
>
> The format string should be "{:*>6}".

Bah, yes, thanks.

Tom.

>
> Cheers,
> Victor
>
>
> On Tue, Nov 30, 2021 at 9:03 PM Tom Honermann via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
> The following is in preparation for the SG16 telecon scheduled for
> tomorrow.
>
> LWG3639 <https://wg21.link/lwg3639> tracks how implementations
> should handle fill characters that have an estimated width other
> than 1 (the current proposed resolution for LWG3576
> <https://cplusplus.github.io/LWG/issue3576> limits fill characters
> to those encodeable as a single code point in the ordinary literal
> encoding). The issue discussion records three possible ways to
> resolve the issue:
>
> 1. s == "🤡🤡🤡🤡42": use the estimated display width, correctly
> displayed on compatible terminals.
> 2. s == "🤡🤡🤡🤡🤡🤡🤡🤡42": assume the display width of 1,
> incorrectly displayed.
> 3. Require the fill character to have the estimated width of 1.
>
>
> Discussion:
>
> Assuming that we elect to allow characters with an estimated width
> other than 1 to be used as fill characters (e.g., we reject option
> 3 above), I think the normative guidance offered in
> [format.string.std]p11
> <http://eel.is/c++draft/format.string.std#11> suffices to direct
> implementations to choose between options 1 and 2; option 1 should
> be used for Unicode encodings, and option 2 otherwise. This
> effectively limits the options under consideration to just two.
>
> If characters that have estimated widths other than 1 are not
> permitted as fill characters, then whether a program is correct or
> not may depend on what encoding is used for the ordinary literal
> encoding. For example, Shift-JIS supports encoding katakana as
> either half-width or full-width (e.g., カ U+FF76 {HALFWIDTH
> KATAKANA LETTER KA} or ã‚« U+30AB {KATAKANA LETTER KA}). Since
> Shift-JIS is not a Unicode encoding, an implementation may assign
> all such characters an estimated width of 1, but the full-width
> variants would have an estimated width of 2 for a Unicode
> encoding. I have no perspective on whether such full-width
> characters would ever be desirable as fill characters and thus
> prefer not to prohibit them without strong cause.
>
> If the estimated width of the fill character is greater than 1,
> then alignment to the end of the field might not be possible. For
> example:
> std::format("{:🤡>4}", 123);
> There are a number of options available to handle such cases:
>
> 1. Underfill the available space leaving the field misaligned.
> 2. Overfill the available space leaving the field misaligned.
> 3. Substitute the default fill character (that has an estimated
> width of 1) once the available space is reduced to less than
> the estimated width of the requested fill character.
> 4. Throw an exception.
> 5. UB.
>
> There are also two related wording omissions:
>
> 1. Table [tab.format.align]
> <http://eel.is/c++draft/tab:format.align> doesn't specify how
> alignment is achieved for the '<' and '>' options (the wording
> doesn't state to insert fill characters as it does for the '^'
> option).
> 2. It is not specified what happens when alignment to the end of
> the field is requested, but the width of the formatted value
> exceeds the field width thereby making such alignment
> impossible (it is presumably intended that the available space
> is overflowed resulting in misalignment; truncation, throwing
> an exception, or UB are alternate options).
> std::format("{:X>1}}, 9999);
>
> In some cases, it would be possible for field alignment to be
> restored by tracking underfill and overfill counts and
> underfilling or overfilling later fields. That ... is probably not
> a good idea.
>
>
> Proposal:
>
> * Allow characters that have an estimated width other than 1 to
> be used as fill characters.
> * Underfill the available field space when inserting an
> additional fill character would otherwise lead to overfill.
> * Do not substitute a default fill character, throw an
> exception, or specify UB when alignment is not possible. This
> is based on the assumption that misalignment is preferred over
> the other possibilities.
> * Overflow the available field space when the formatted value
> exceeds the field width. This is based on the assumption that
> outputting all available data is preferred over the other
> possibilities.
> * Do not try to cleverly count underfill and overfill and adjust
> later fields.
>
>
> Proposed Resolution:
>
> The wording below is intended to address LWG3639
> <https://cplusplus.github.io/LWG/issue3639> and to supersede the
> current proposed resolution for LWG3576
> <https://cplusplus.github.io/LWG/issue3576>.
>
> Change [format.string.std]p1
> <http://eel.is/c++draft/format.string.std#1>:
>
> [...]
>
> The syntax of format specifications is as follows:
>
> [...]
>
> /fill/:
>
> any character_member of the translation character set
> __([lex.charset]
> <http://eel.is/c++draft/lex.charset>)_ other than
> {_U+007B LEFT CURLY BRACKET_ or }_U+007D RIGHT CURLY
> BRACKET_
>
> [...]
>
> Change [format.string.std]p2
> <http://eel.is/c++draft/format.string.std#2>:
>
> _The __/fill character/__is the character denoted by the
> __/fill/__specifier or, if the __/fill/__specifier is absent,
> U+0020 SPACE._
>
> [/Note 2/: The /fill/ character can be any character other
> than { or }. The presence of a fill character_/fill/
> specifier_ is signaled by the character following it, which
> must be one of the alignment options. If the second character
> of /std-format-spec/ is not a valid alignment option, then it
> is assumed that both the fill character and the alignment
> option are_the /fill-and-align/ specifier is_ absent. — /end
> note/]
>
> Change [format.string.std]p3
> <http://eel.is/c++draft/format.string.std#3>:
>
> The /align/ specifier applies to all argument types. The
> meaning of the various alignment options is as specified in
> Table 62
> <http://eel.is/c++draft/format.string.std#tab:format.align>.
>
> [/Example 1/:
> char c = 120;
> string s0 = format("{:6}", 42); // value of s0 is
> " 42"
> string s1 = format("{:6}", 'x'); // value of s1 is
> "x "
> string s2 = format("{:*<6}", 'x'); // value of s2 is
> "x*****"
> string s3 = format("{:*>6}", 'x'); // value of s3 is
> "*****x"
> string s4 = format("{:*^6}", 'x'); // value of s4 is
> "**x***"
> string s5 = format("{:6d}", c); // value of s5 is
> " 120"
> string s6 = format("{:6}", true); // value of s6 is
> "true "
> _string s7 = format("{:*6>}", "12345678"); // value of s7 is
> "12345678"_
> — /end example/]
>
> [/Note 3/: Unless a minimum field width is defined, the field
> width is determined by the size of the content and the
> alignment option has no effect. — /end note/]
>
> __[/Note 4/: If the width of the formatting argument value
> exceeds the field width, then the alignment option has no
> effect. _____— /end note/]_
>
> _[/Note 5/: It may not be possible to exactly align the
> formatting argument value within the available space when the
> fill character has an estimated width other than 1. — /end note/]
> _
> Table 62
> <http://eel.is/c++draft/format.string.std#tab:format.align>:
> Meaning of /align/ options [tab:format.align]
> <http://eel.is/c++draft/tab:format.align>
> *Option*
> *Meaning*
> <
> Forces the field to be aligned to the start of the available
> space_by inserting /n/ fill characters after the formatting
> argument value where /n/ is the number of fill characters
> needed to most closely align to the field width without
> exceeding it_. This is the default for non-arithmetic types,
> charT, and bool, unless an integer presentation type is specified.
> >
> Forces the field to be aligned to the end of the available
> space_by inserting /n/ fill characters before the formatting
> argument value where /n/ is the number of fill characters
> __needed to most closely align to the field width without
> exceeding it_. This is the default for arithmetic types other
> than charT and bool or when an integer presentation type is
> specified.
> ^
> Forces the field to be centered within the available space by
> inserting ⌊/n//2⌋ _fill _characters before and ⌈/n//2⌉ _fill
> _characters after the _formatting argument _value, where /n/
> is the total number of fill characters to insert_needed to
> most closely align to the field width without exceeding it_.
>
> _
> _
>
> Tom.
>
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
> <https://lists.isocpp.org/mailman/listinfo.cgi/sg16>
>


Received on 2021-12-01 13:25:22