Date: Wed, 1 Dec 2021 00:02:56 -0500
The following is in preparation for the SG16 telecon scheduled for tomorrow.
LWG3639 <https://wg21.link/lwg3639> tracks how implementations should
handle fill characters that have an estimated width other than 1 (the
current proposed resolution for LWG3576
<https://cplusplus.github.io/LWG/issue3576> limits fill characters to
those encodeable as a single code point in the ordinary literal
encoding). The issue discussion records three possible ways to resolve
the issue:
1. s == "🤡🤡🤡🤡42": use the estimated display width, correctly
displayed on compatible terminals.
2. s == "🤡🤡🤡🤡🤡🤡🤡🤡42": assume the display width of 1,
incorrectly displayed.
3. Require the fill character to have the estimated width of 1.
Discussion:
Assuming that we elect to allow characters with an estimated width other
than 1 to be used as fill characters (e.g., we reject option 3 above), I
think the normative guidance offered in [format.string.std]p11
<http://eel.is/c++draft/format.string.std#11> suffices to direct
implementations to choose between options 1 and 2; option 1 should be
used for Unicode encodings, and option 2 otherwise. This effectively
limits the options under consideration to just two.
If characters that have estimated widths other than 1 are not permitted
as fill characters, then whether a program is correct or not may depend
on what encoding is used for the ordinary literal encoding. For example,
Shift-JIS supports encoding katakana as either half-width or full-width
(e.g., カ U+FF76 {HALFWIDTH KATAKANA LETTER KA} or カ U+30AB {KATAKANA
LETTER KA}). Since Shift-JIS is not a Unicode encoding, an
implementation may assign all such characters an estimated width of 1,
but the full-width variants would have an estimated width of 2 for a
Unicode encoding. I have no perspective on whether such full-width
characters would ever be desirable as fill characters and thus prefer
not to prohibit them without strong cause.
If the estimated width of the fill character is greater than 1, then
alignment to the end of the field might not be possible. For example:
std::format("{:🤡>4}", 123);
There are a number of options available to handle such cases:
1. Underfill the available space leaving the field misaligned.
2. Overfill the available space leaving the field misaligned.
3. Substitute the default fill character (that has an estimated width
of 1) once the available space is reduced to less than the estimated
width of the requested fill character.
4. Throw an exception.
5. UB.
There are also two related wording omissions:
1. Table [tab.format.align] <http://eel.is/c++draft/tab:format.align>
doesn't specify how alignment is achieved for the '<' and '>'
options (the wording doesn't state to insert fill characters as it
does for the '^' option).
2. It is not specified what happens when alignment to the end of the
field is requested, but the width of the formatted value exceeds the
field width thereby making such alignment impossible (it is
presumably intended that the available space is overflowed resulting
in misalignment; truncation, throwing an exception, or UB are
alternate options).
std::format("{:X>1}}, 9999);
In some cases, it would be possible for field alignment to be restored
by tracking underfill and overfill counts and underfilling or
overfilling later fields. That ... is probably not a good idea.
Proposal:
* Allow characters that have an estimated width other than 1 to be
used as fill characters.
* Underfill the available field space when inserting an additional
fill character would otherwise lead to overfill.
* Do not substitute a default fill character, throw an exception, or
specify UB when alignment is not possible. This is based on the
assumption that misalignment is preferred over the other possibilities.
* Overflow the available field space when the formatted value exceeds
the field width. This is based on the assumption that outputting all
available data is preferred over the other possibilities.
* Do not try to cleverly count underfill and overfill and adjust later
fields.
Proposed Resolution:
The wording below is intended to address LWG3639
<https://cplusplus.github.io/LWG/issue3639> and to supersede the current
proposed resolution for LWG3576 <https://cplusplus.github.io/LWG/issue3576>.
Change [format.string.std]p1 <http://eel.is/c++draft/format.string.std#1>:
[...]
The syntax of format specifications is as follows:
[...]
/fill/:
any character_member of the translation character set
__([lex.charset] <http://eel.is/c++draft/lex.charset>)_
other than {_U+007B LEFT CURLY BRACKET_ or }_U+007D RIGHT
CURLY BRACKET_
[...]
Change [format.string.std]p2 <http://eel.is/c++draft/format.string.std#2>:
_The __/fill character/__is the character denoted by the
__/fill/__specifier or, if the __/fill/__specifier is absent, U+0020
SPACE._
[/Note 2/: The /fill/ character can be any character other than { or
}. The presence of a fill character_/fill/ specifier_ is signaled by
the character following it, which must be one of the alignment
options. If the second character of /std-format-spec/ is not a valid
alignment option, then it is assumed that both the fill character
and the alignment option are_the /fill-and-align/ specifier is_
absent. — /end note/]
Change [format.string.std]p3 <http://eel.is/c++draft/format.string.std#3>:
The /align/ specifier applies to all argument types. The meaning of
the various alignment options is as specified in Table 62
<http://eel.is/c++draft/format.string.std#tab:format.align>.
[/Example 1/:
char c = 120;
string s0 = format("{:6}", 42); // value of s0 is " 42"
string s1 = format("{:6}", 'x'); // value of s1 is "x "
string s2 = format("{:*<6}", 'x'); // value of s2 is "x*****"
string s3 = format("{:*>6}", 'x'); // value of s3 is "*****x"
string s4 = format("{:*^6}", 'x'); // value of s4 is "**x***"
string s5 = format("{:6d}", c); // value of s5 is " 120"
string s6 = format("{:6}", true); // value of s6 is "true "
_string s7 = format("{:*6>}", "12345678"); // value of s7 is "12345678"_
— /end example/]
[/Note 3/: Unless a minimum field width is defined, the field width
is determined by the size of the content and the alignment option
has no effect. — /end note/]
__[/Note 4/: If the width of the formatting argument value exceeds
the field width, then the alignment option has no effect. _____—
/end note/]_
_[/Note 5/: It may not be possible to exactly align the formatting
argument value within the available space when the fill character
has an estimated width other than 1. — /end note/]
_
Table 62
<http://eel.is/c++draft/format.string.std#tab:format.align>: Meaning
of /align/ options [tab:format.align]
<http://eel.is/c++draft/tab:format.align>
*Option*
*Meaning*
<
Forces the field to be aligned to the start of the available
space_by inserting /n/ fill characters after the formatting argument
value where /n/ is the number of fill characters needed to most
closely align to the field width without exceeding it_. This is the
default for non-arithmetic types, charT, and bool, unless an integer
presentation type is specified.
>
Forces the field to be aligned to the end of the available space_by
inserting /n/ fill characters before the formatting argument value
where /n/ is the number of fill characters __needed to most closely
align to the field width without exceeding it_. This is the default
for arithmetic types other than charT and bool or when an integer
presentation type is specified.
^
Forces the field to be centered within the available space by
inserting ⌊/n//2⌋ _fill _characters before and ⌈/n//2⌉ _fill
_characters after the _formatting argument _value, where /n/ is the
total number of fill characters to insert_needed to most closely
align to the field width without exceeding it_.
_
_
Tom.
LWG3639 <https://wg21.link/lwg3639> tracks how implementations should
handle fill characters that have an estimated width other than 1 (the
current proposed resolution for LWG3576
<https://cplusplus.github.io/LWG/issue3576> limits fill characters to
those encodeable as a single code point in the ordinary literal
encoding). The issue discussion records three possible ways to resolve
the issue:
1. s == "🤡🤡🤡🤡42": use the estimated display width, correctly
displayed on compatible terminals.
2. s == "🤡🤡🤡🤡🤡🤡🤡🤡42": assume the display width of 1,
incorrectly displayed.
3. Require the fill character to have the estimated width of 1.
Discussion:
Assuming that we elect to allow characters with an estimated width other
than 1 to be used as fill characters (e.g., we reject option 3 above), I
think the normative guidance offered in [format.string.std]p11
<http://eel.is/c++draft/format.string.std#11> suffices to direct
implementations to choose between options 1 and 2; option 1 should be
used for Unicode encodings, and option 2 otherwise. This effectively
limits the options under consideration to just two.
If characters that have estimated widths other than 1 are not permitted
as fill characters, then whether a program is correct or not may depend
on what encoding is used for the ordinary literal encoding. For example,
Shift-JIS supports encoding katakana as either half-width or full-width
(e.g., カ U+FF76 {HALFWIDTH KATAKANA LETTER KA} or カ U+30AB {KATAKANA
LETTER KA}). Since Shift-JIS is not a Unicode encoding, an
implementation may assign all such characters an estimated width of 1,
but the full-width variants would have an estimated width of 2 for a
Unicode encoding. I have no perspective on whether such full-width
characters would ever be desirable as fill characters and thus prefer
not to prohibit them without strong cause.
If the estimated width of the fill character is greater than 1, then
alignment to the end of the field might not be possible. For example:
std::format("{:🤡>4}", 123);
There are a number of options available to handle such cases:
1. Underfill the available space leaving the field misaligned.
2. Overfill the available space leaving the field misaligned.
3. Substitute the default fill character (that has an estimated width
of 1) once the available space is reduced to less than the estimated
width of the requested fill character.
4. Throw an exception.
5. UB.
There are also two related wording omissions:
1. Table [tab.format.align] <http://eel.is/c++draft/tab:format.align>
doesn't specify how alignment is achieved for the '<' and '>'
options (the wording doesn't state to insert fill characters as it
does for the '^' option).
2. It is not specified what happens when alignment to the end of the
field is requested, but the width of the formatted value exceeds the
field width thereby making such alignment impossible (it is
presumably intended that the available space is overflowed resulting
in misalignment; truncation, throwing an exception, or UB are
alternate options).
std::format("{:X>1}}, 9999);
In some cases, it would be possible for field alignment to be restored
by tracking underfill and overfill counts and underfilling or
overfilling later fields. That ... is probably not a good idea.
Proposal:
* Allow characters that have an estimated width other than 1 to be
used as fill characters.
* Underfill the available field space when inserting an additional
fill character would otherwise lead to overfill.
* Do not substitute a default fill character, throw an exception, or
specify UB when alignment is not possible. This is based on the
assumption that misalignment is preferred over the other possibilities.
* Overflow the available field space when the formatted value exceeds
the field width. This is based on the assumption that outputting all
available data is preferred over the other possibilities.
* Do not try to cleverly count underfill and overfill and adjust later
fields.
Proposed Resolution:
The wording below is intended to address LWG3639
<https://cplusplus.github.io/LWG/issue3639> and to supersede the current
proposed resolution for LWG3576 <https://cplusplus.github.io/LWG/issue3576>.
Change [format.string.std]p1 <http://eel.is/c++draft/format.string.std#1>:
[...]
The syntax of format specifications is as follows:
[...]
/fill/:
any character_member of the translation character set
__([lex.charset] <http://eel.is/c++draft/lex.charset>)_
other than {_U+007B LEFT CURLY BRACKET_ or }_U+007D RIGHT
CURLY BRACKET_
[...]
Change [format.string.std]p2 <http://eel.is/c++draft/format.string.std#2>:
_The __/fill character/__is the character denoted by the
__/fill/__specifier or, if the __/fill/__specifier is absent, U+0020
SPACE._
[/Note 2/: The /fill/ character can be any character other than { or
}. The presence of a fill character_/fill/ specifier_ is signaled by
the character following it, which must be one of the alignment
options. If the second character of /std-format-spec/ is not a valid
alignment option, then it is assumed that both the fill character
and the alignment option are_the /fill-and-align/ specifier is_
absent. — /end note/]
Change [format.string.std]p3 <http://eel.is/c++draft/format.string.std#3>:
The /align/ specifier applies to all argument types. The meaning of
the various alignment options is as specified in Table 62
<http://eel.is/c++draft/format.string.std#tab:format.align>.
[/Example 1/:
char c = 120;
string s0 = format("{:6}", 42); // value of s0 is " 42"
string s1 = format("{:6}", 'x'); // value of s1 is "x "
string s2 = format("{:*<6}", 'x'); // value of s2 is "x*****"
string s3 = format("{:*>6}", 'x'); // value of s3 is "*****x"
string s4 = format("{:*^6}", 'x'); // value of s4 is "**x***"
string s5 = format("{:6d}", c); // value of s5 is " 120"
string s6 = format("{:6}", true); // value of s6 is "true "
_string s7 = format("{:*6>}", "12345678"); // value of s7 is "12345678"_
— /end example/]
[/Note 3/: Unless a minimum field width is defined, the field width
is determined by the size of the content and the alignment option
has no effect. — /end note/]
__[/Note 4/: If the width of the formatting argument value exceeds
the field width, then the alignment option has no effect. _____—
/end note/]_
_[/Note 5/: It may not be possible to exactly align the formatting
argument value within the available space when the fill character
has an estimated width other than 1. — /end note/]
_
Table 62
<http://eel.is/c++draft/format.string.std#tab:format.align>: Meaning
of /align/ options [tab:format.align]
<http://eel.is/c++draft/tab:format.align>
*Option*
*Meaning*
<
Forces the field to be aligned to the start of the available
space_by inserting /n/ fill characters after the formatting argument
value where /n/ is the number of fill characters needed to most
closely align to the field width without exceeding it_. This is the
default for non-arithmetic types, charT, and bool, unless an integer
presentation type is specified.
>
Forces the field to be aligned to the end of the available space_by
inserting /n/ fill characters before the formatting argument value
where /n/ is the number of fill characters __needed to most closely
align to the field width without exceeding it_. This is the default
for arithmetic types other than charT and bool or when an integer
presentation type is specified.
^
Forces the field to be centered within the available space by
inserting ⌊/n//2⌋ _fill _characters before and ⌈/n//2⌉ _fill
_characters after the _formatting argument _value, where /n/ is the
total number of fill characters to insert_needed to most closely
align to the field width without exceeding it_.
_
_
Tom.
Received on 2021-11-30 23:03:00