C++ Logo

sg16

Advanced search

Re: [SG16] Proposed resolution for LWG3639: Handling of fill character width is underspecified in std::format

From: Victor Zverovich <victor.zverovich_at_[hidden]>
Date: Wed, 1 Dec 2021 08:06:58 -0800
Thanks Tom for putting this together.

I am strongly opposed to the proposed resolution because:

1. It brings a novel design that has no implementation or usage experience.
2. It will likely have a nontrivial performance regression for the common
case of width = 1.
3. Width > 1 will often result in misaligned output for all alignment
options (not just >) and none of the options to handle look satisfactory.

Other comments:

Misalignment can occur in all cases, not just when aligning to the end (>),
e.g.

  std::format(
    "1234|123"
    "{:🤡<4}|foo", 123);

> It is not specified what happens when alignment to the end of the field
is requested, but the width of the formatted value exceeds the field width
thereby making such alignment impossible

It is specified elsewhere (http://eel.is/c++draft/format.string#std-8):

  The positive-integer in width is a decimal integer defining the minimum
field width.

which means that the field must overflow. That said, we might want to add a
clarification elsewhere.

> That ... is probably not a good idea.

Yes, that would be extremely novel and surprising.

The added example is incorrect:

  *string s7 = format("{:*6>}", "12345678"); // value of s7 is "12345678"*

The format string should be "{:*>6}".

Cheers,
Victor


On Tue, Nov 30, 2021 at 9:03 PM Tom Honermann via SG16 <
sg16_at_[hidden]> wrote:

> The following is in preparation for the SG16 telecon scheduled for
> tomorrow.
>
> LWG3639 <https://wg21.link/lwg3639> tracks how implementations should
> handle fill characters that have an estimated width other than 1 (the
> current proposed resolution for LWG3576
> <https://cplusplus.github.io/LWG/issue3576> limits fill characters to
> those encodeable as a single code point in the ordinary literal encoding).
> The issue discussion records three possible ways to resolve the issue:
>
> 1. s == "🤡🤡🤡🤡42": use the estimated display width, correctly
> displayed on compatible terminals.
> 2. s == "🤡🤡🤡🤡🤡🤡🤡🤡42": assume the display width of 1,
> incorrectly displayed.
> 3. Require the fill character to have the estimated width of 1.
>
> Discussion:
>
> Assuming that we elect to allow characters with an estimated width other
> than 1 to be used as fill characters (e.g., we reject option 3 above), I
> think the normative guidance offered in [format.string.std]p11
> <http://eel.is/c++draft/format.string.std#11> suffices to direct
> implementations to choose between options 1 and 2; option 1 should be used
> for Unicode encodings, and option 2 otherwise. This effectively limits the
> options under consideration to just two.
>
> If characters that have estimated widths other than 1 are not permitted as
> fill characters, then whether a program is correct or not may depend on
> what encoding is used for the ordinary literal encoding. For example,
> Shift-JIS supports encoding katakana as either half-width or full-width
> (e.g., カ U+FF76 {HALFWIDTH KATAKANA LETTER KA} or カ U+30AB {KATAKANA LETTER
> KA}). Since Shift-JIS is not a Unicode encoding, an implementation may
> assign all such characters an estimated width of 1, but the full-width
> variants would have an estimated width of 2 for a Unicode encoding. I have
> no perspective on whether such full-width characters would ever be
> desirable as fill characters and thus prefer not to prohibit them without
> strong cause.
>
> If the estimated width of the fill character is greater than 1, then
> alignment to the end of the field might not be possible. For example:
> std::format("{:🤡>4}", 123);
> There are a number of options available to handle such cases:
>
> 1. Underfill the available space leaving the field misaligned.
> 2. Overfill the available space leaving the field misaligned.
> 3. Substitute the default fill character (that has an estimated width
> of 1) once the available space is reduced to less than the estimated width
> of the requested fill character.
> 4. Throw an exception.
> 5. UB.
>
> There are also two related wording omissions:
>
> 1. Table [tab.format.align] <http://eel.is/c++draft/tab:format.align>
> doesn't specify how alignment is achieved for the '<' and '>' options (the
> wording doesn't state to insert fill characters as it does for the '^'
> option).
> 2. It is not specified what happens when alignment to the end of the
> field is requested, but the width of the formatted value exceeds the field
> width thereby making such alignment impossible (it is presumably intended
> that the available space is overflowed resulting in misalignment;
> truncation, throwing an exception, or UB are alternate options).
> std::format("{:X>1}}, 9999);
>
> In some cases, it would be possible for field alignment to be restored by
> tracking underfill and overfill counts and underfilling or overfilling
> later fields. That ... is probably not a good idea.
> Proposal:
>
> - Allow characters that have an estimated width other than 1 to be
> used as fill characters.
> - Underfill the available field space when inserting an additional
> fill character would otherwise lead to overfill.
> - Do not substitute a default fill character, throw an exception, or
> specify UB when alignment is not possible. This is based on the assumption
> that misalignment is preferred over the other possibilities.
> - Overflow the available field space when the formatted value exceeds
> the field width. This is based on the assumption that outputting all
> available data is preferred over the other possibilities.
> - Do not try to cleverly count underfill and overfill and adjust later
> fields.
>
> Proposed Resolution:
>
> The wording below is intended to address LWG3639
> <https://cplusplus.github.io/LWG/issue3639> and to supersede the current
> proposed resolution for LWG3576
> <https://cplusplus.github.io/LWG/issue3576>.
>
> Change [format.string.std]p1 <http://eel.is/c++draft/format.string.std#1>:
>
> [...]
>
> The syntax of format specifications is as follows:
>
> [...]
>
> *fill*:
>
> any character*member of the translation character set **([lex.charset]
> <http://eel.is/c++draft/lex.charset>)* other than {*U+007B LEFT CURLY
> BRACKET* or }*U+007D RIGHT CURLY BRACKET*
>
> [...]
>
> Change [format.string.std]p2 <http://eel.is/c++draft/format.string.std#2>:
>
> *The **fill character** is the character denoted by the **fill**
> specifier or, if the **fill** specifier is absent, U+0020 SPACE.*
>
> [*Note 2*: The *fill* character can be any character other than { or }. The
> presence of a fill character*fill specifier* is signaled by the character
> following it, which must be one of the alignment options. If the second
> character of *std-format-spec* is not a valid alignment option, then it
> is assumed that both the fill character and the alignment option are*the
> fill-and-align specifier is* absent. — *end note*]
>
> Change [format.string.std]p3 <http://eel.is/c++draft/format.string.std#3>:
>
> The *align* specifier applies to all argument types. The meaning of the
> various alignment options is as specified in Table 62
> <http://eel.is/c++draft/format.string.std#tab:format.align>.
>
> [*Example 1*:
> char c = 120;
> string s0 = format("{:6}", 42); // value of s0 is " 42"
> string s1 = format("{:6}", 'x'); // value of s1 is "x "
> string s2 = format("{:*<6}", 'x'); // value of s2 is "x*****"
> string s3 = format("{:*>6}", 'x'); // value of s3 is "*****x"
> string s4 = format("{:*^6}", 'x'); // value of s4 is "**x***"
> string s5 = format("{:6d}", c); // value of s5 is " 120"
> string s6 = format("{:6}", true); // value of s6 is "true "
> *string s7 = format("{:*6>}", "12345678"); // value of s7 is "12345678"*
> — *end example*]
>
> [*Note 3*: Unless a minimum field width is defined, the field width is
> determined by the size of the content and the alignment option has no
> effect. — *end note*]
>
> *[Note 4: If the width of the formatting argument value exceeds the field
> width, then the alignment option has no effect. *
>
>
> *— end note] [Note 5: It may not be possible to exactly align the
> formatting argument value within the available space when the fill
> character has an estimated width other than 1. — end note] *
> Table 62 <http://eel.is/c++draft/format.string.std#tab:format.align>:
> Meaning of *align* options [tab:format.align]
> <http://eel.is/c++draft/tab:format.align>
> *Option*
> *Meaning*
> <
> Forces the field to be aligned to the start of the available space* by
> inserting n fill characters after the formatting argument value where n is
> the number of fill characters needed to most closely align to the field
> width without exceeding it*. This is the default for non-arithmetic
> types, charT, and bool, unless an integer presentation type is specified.
> >
> Forces the field to be aligned to the end of the available space* by
> inserting n fill characters before the formatting argument value where n is
> the number of fill characters ** needed to most closely align to the
> field width without exceeding it*. This is the default for arithmetic
> types other than charT and bool or when an integer presentation type is
> specified.
> ^
> Forces the field to be centered within the available space by inserting ⌊
> *n*/2⌋ *fill *characters before and ⌈*n*/2⌉ *fill *characters after the *formatting
> argument *value, where *n* is the total number of fill characters to
> insert*needed to most closely align to the field width without exceeding
> it*.
>
> Tom.
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2021-12-01 10:07:13