sg16: [SG16] Proposed resolution for LWG3639: Handling of fill character width is underspecified in std::format

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 1 Dec 2021 00:02:56 -0500

The following is in preparation for the SG16 telecon scheduled for tomorrow.

LWG3639 <https://wg21.link/lwg3639> tracks how implementations should
handle fill characters that have an estimated width other than 1 (the
current proposed resolution for LWG3576
<https://cplusplus.github.io/LWG/issue3576> limits fill characters to
those encodeable as a single code point in the ordinary literal
encoding). The issue discussion records three possible ways to resolve
the issue:

1. s == "🤡🤡🤡🤡42": use the estimated display width, correctly
    displayed on compatible terminals.
2. s == "🤡🤡🤡🤡🤡🤡🤡🤡42": assume the display width of 1,
    incorrectly displayed.
3. Require the fill character to have the estimated width of 1.

  Discussion:

Assuming that we elect to allow characters with an estimated width other
than 1 to be used as fill characters (e.g., we reject option 3 above), I
think the normative guidance offered in [format.string.std]p11
<http://eel.is/c++draft/format.string.std#11> suffices to direct
implementations to choose between options 1 and 2; option 1 should be
used for Unicode encodings, and option 2 otherwise. This effectively
limits the options under consideration to just two.

If characters that have estimated widths other than 1 are not permitted
as fill characters, then whether a program is correct or not may depend
on what encoding is used for the ordinary literal encoding. For example,
Shift-JIS supports encoding katakana as either half-width or full-width
(e.g., ｶ U+FF76 {HALFWIDTH KATAKANA LETTER KA} or カ U+30AB {KATAKANA
LETTER KA}). Since Shift-JIS is not a Unicode encoding, an
implementation may assign all such characters an estimated width of 1,
but the full-width variants would have an estimated width of 2 for a
Unicode encoding. I have no perspective on whether such full-width
characters would ever be desirable as fill characters and thus prefer
not to prohibit them without strong cause.

If the estimated width of the fill character is greater than 1, then
alignment to the end of the field might not be possible. For example:
std::format("{:🤡>4}", 123);
There are a number of options available to handle such cases:

1. Underfill the available space leaving the field misaligned.
2. Overfill the available space leaving the field misaligned.
3. Substitute the default fill character (that has an estimated width
    of 1) once the available space is reduced to less than the estimated
    width of the requested fill character.
4. Throw an exception.
5. UB.

There are also two related wording omissions:

1. Table [tab.format.align] <http://eel.is/c++draft/tab:format.align>
    doesn't specify how alignment is achieved for the '<' and '>'
    options (the wording doesn't state to insert fill characters as it
    does for the '^' option).
2. It is not specified what happens when alignment to the end of the
    field is requested, but the width of the formatted value exceeds the
    field width thereby making such alignment impossible (it is
    presumably intended that the available space is overflowed resulting
    in misalignment; truncation, throwing an exception, or UB are
    alternate options).
    std::format("{:X>1}}, 9999);

In some cases, it would be possible for field alignment to be restored
by tracking underfill and overfill counts and underfilling or
overfilling later fields. That ... is probably not a good idea.

  Proposal:

  * Allow characters that have an estimated width other than 1 to be
    used as fill characters.
  * Underfill the available field space when inserting an additional
    fill character would otherwise lead to overfill.
  * Do not substitute a default fill character, throw an exception, or
    specify UB when alignment is not possible. This is based on the
    assumption that misalignment is preferred over the other possibilities.
  * Overflow the available field space when the formatted value exceeds
    the field width. This is based on the assumption that outputting all
    available data is preferred over the other possibilities.
  * Do not try to cleverly count underfill and overfill and adjust later
    fields.

  Proposed Resolution:

The wording below is intended to address LWG3639
<https://cplusplus.github.io/LWG/issue3639> and to supersede the current
proposed resolution for LWG3576 <https://cplusplus.github.io/LWG/issue3576>.

Change [format.string.std]p1 <http://eel.is/c++draft/format.string.std#1>:

    [...]

      The syntax of format specifications is as follows:

        [...]

        /fill/:

            any character_member of the translation character set
            __([lex.charset] <http://eel.is/c++draft/lex.charset>)_
            other than {_U+007B LEFT CURLY BRACKET_ or }_U+007D RIGHT
            CURLY BRACKET_

        [...]

Change [format.string.std]p2 <http://eel.is/c++draft/format.string.std#2>:

    _The __/fill character/__is the character denoted by the
    __/fill/__specifier or, if the __/fill/__specifier is absent, U+0020
    SPACE._

    [/Note 2/: The /fill/ character can be any character other than { or
    }. The presence of a fill character_/fill/ specifier_ is signaled by
    the character following it, which must be one of the alignment
    options. If the second character of /std-format-spec/ is not a valid
    alignment option, then it is assumed that both the fill character
    and the alignment option are_the /fill-and-align/ specifier is_
    absent. — /end note/]

Change [format.string.std]p3 <http://eel.is/c++draft/format.string.std#3>:

    The /align/ specifier applies to all argument types. The meaning of
    the various alignment options is as specified in Table 62
    <http://eel.is/c++draft/format.string.std#tab:format.align>.

    [/Example 1/:
    char c = 120;
    string s0 = format("{:6}", 42); // value of s0 is " 42"
    string s1 = format("{:6}", 'x'); // value of s1 is "x "
    string s2 = format("{:*<6}", 'x'); // value of s2 is "x*****"
    string s3 = format("{:*>6}", 'x'); // value of s3 is "*****x"
    string s4 = format("{:*^6}", 'x'); // value of s4 is "**x***"
    string s5 = format("{:6d}", c); // value of s5 is " 120"
    string s6 = format("{:6}", true); // value of s6 is "true "
    _string s7 = format("{:*6>}", "12345678"); // value of s7 is "12345678"_
    — /end example/]

    [/Note 3/: Unless a minimum field width is defined, the field width
    is determined by the size of the content and the alignment option
    has no effect. — /end note/]

    __[/Note 4/: If the width of the formatting argument value exceeds
    the field width, then the alignment option has no effect. _____—
    /end note/]_

    _[/Note 5/: It may not be possible to exactly align the formatting
    argument value within the available space when the fill character
    has an estimated width other than 1. — /end note/]
    _
    Table 62
    <http://eel.is/c++draft/format.string.std#tab:format.align>: Meaning
    of /align/ options [tab:format.align]
    <http://eel.is/c++draft/tab:format.align>
    *Option*
     *Meaning*
    <
     Forces the field to be aligned to the start of the available
    space_by inserting /n/ fill characters after the formatting argument
    value where /n/ is the number of fill characters needed to most
    closely align to the field width without exceeding it_. This is the
    default for non-arithmetic types, charT, and bool, unless an integer
    presentation type is specified.
>
     Forces the field to be aligned to the end of the available space_by
    inserting /n/ fill characters before the formatting argument value
    where /n/ is the number of fill characters __needed to most closely
    align to the field width without exceeding it_. This is the default
    for arithmetic types other than charT and bool or when an integer
    presentation type is specified.
    ^
     Forces the field to be centered within the available space by
    inserting ⌊/n//2⌋ _fill _characters before and ⌈/n//2⌉ _fill
    _characters after the _formatting argument _value, where /n/ is the
    total number of fill characters to insert_needed to most closely
    align to the field width without exceeding it_.

    _
    _

Tom.

Received on 2021-11-30 23:03:00