C++ Logo

sg16

Advanced search

Re: [SG16] Questions for LEWG for P2093R4: Formatted output

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Fri, 16 Apr 2021 13:29:26 -0400
On Thu, Mar 11, 2021 at 12:26 AM Tom Honermann via SG16 <
sg16_at_[hidden]> wrote:

> std::print("╟≥σσ⌠Θετ≤ ßεΣ πß∞⌡⌠ß⌠Θ∩επ!\n");
>
> The following are questions/concerns that came up during SG16 review of
> P2093 <https://wg21.link/p2093> that are worthy of further discussion in
> SG16 and/or LEWG. Most of these issues were discussed in SG16 and were
> determined either not to be SG16 concerns or were deemed issues that for
> which we did not want to hold back forward progress. These sentiments were
> not unanimous.
>
> The SG16 poll to forward P2093R3 <https://wg21.link/p2093r3> was taken
> during our February 10th telecon. The poll was:
>
> Poll: Forward P2093R3 to LEWG.
> - Attendance: 9
> SF
> F
> N
> A
> SA
> 4
> 2
> 2
> 0
> 1
>
> Minutes for prior SG16 reviews of P2093 <https://wg21.link/p2093>, are
> available at:
>
> - December 9th, 2020 telecon
> <https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2020.md#december-9th-2020>;
> review of P2093R2 <https://wg21.link/p2093r2>.
> - February 10th, 2021 telecon
> <https://github.com/sg16-unicode/sg16-meetings/blob/master/README.md>;
> review of P2093R3 <https://wg21.link/p2093r3>.
>
> Questions raised include:
>
> 1. How should errors in transcoding be handled?
> The Unicode recommendation is to substitute a replacement character
> for invalid code unit sequences. P2093R4 <https://wg21.link/p2093r4>
> added wording to this effect.
> 2. Should this feature move forward without a parallel proposal to
> provide the underlying implementation dependent features need to implement
> std::print()?
> Specifically, should this feature be blocked on exposing interfaces to
> 1) determine if a stream is connected directly to a terminal/console, and
> 2) write directly to a terminal/console (potentially bypassing a stream)
> using native interfaces where applicable? These features would be
> necessary in order to implement a portable version of std::print().
> (I believe Victor is already working on a companion paper).
> 3. The choice to base behavior on the compile-time choice of execution
> character set results in locale settings being ignored at run-time. Is
> that ok?
> 1. This choice will lead to unexpected results if a program runs in
> a non-UTF-8 locale and consumes non-Unicode input (e.g., from stdin) and
> then attempts to echo it back.
>
>
Out of the meeting, we were asked to continue the discussion on-list (and
also in SG16).

Regarding this point, the non-Unicode "input" can be the result of the
formatting facility.

The conversation so far seems to indicate that the locales are not
constrained to use UTF-8 even in modes where the encoding used for string
literals is UTF-8.
That seems to indicate that something like:

std::print("{:%r}\n", std::chrono::system_clock::now().time_since_epoch());

which only uses the C++ library facilities in an attempt to present a
localized string runs the risk of generating replacement characters.

For example, if
std::locale::global(std::locale(""));

was run when the environment had a non-UTF-8 locale.

For example, "下午" in Big5 could end up being "�U��".


> 1. Additionally, it means that a program that uses only ASCII
> characters in string literals will nevertheless behave differently at
> run-time depending on the choice of execution character set (which
> historically has only affected the encoding of string literals).
> 1. When the execution character set is not UTF-8, should conversion
> to Unicode be performed when writing directly to a Unicode enabled
> terminal/console?
> 1. If so, should conversions be based on the compile-time literal
> encoding or the locale dependent run-time execution encoding?
> 2. If the latter, that creates an odd asymmetry with the behavior
> when the execution character set is UTF-8. Is that ok?
> 2. What are the implications for future support of std::print("{}
> {} {} {}", L"Wide text", u8"UTF-8 text", u"UTF-16 text", U"UTF-32 text")
> ?
> 1. As proposed, std::print() only produces unambiguously encoded
> output when the execution character set is UTF-8 and it is clear how these
> cases should be handled in that case.
> 2. But how would the behavior be defined when the execution
> character set is not UTF-8? Would the arguments be converted to the
> execution character set? Or to the locale dependent encoding?
> 3. Note that these concerns are relevant for std::format() as well.
>
> An additional issue that was not discussed in SG16 relates to Unicode
> normalization. As proposed, the expected output will match expectations if
> the UTF-8 text does not contain any uses of combining characters. However,
> if combining characters are present, either because the text is in NFD or
> because there is no precomposed character defined, then the combining
> characters may be rendered separately from their base character as a result
> of terminal/console interfaces mapping code points rather than grapheme
> clusters to columns. Should std::print() also perform NFC normalization
> so that characters with precomposed forms are displayed correctly? (These
> concerns were explored in P1868 <https://wg21.link/p1868> when it was
> adopted for C++20; see that paper for example screenshots; in practice,
> this is only an issue with the Windows console).
>
> It would not be unreasonable for LEWG to send some of these questions back
> to SG16 for more analysis.
>
> Tom.
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2021-04-16 12:29:57