C++ Logo

sg16

Advanced search

Re: [SG16] Questions for LEWG for P2093R4: Formatted output

From: Tom Honermann <tom_at_[hidden]>
Date: Sun, 14 Mar 2021 16:24:19 -0400
Sorry, folks. I'm unable to see the content of emails before approving
them. Will not approve any more for this account. Obviously.

Tom.

On 3/14/21 3:18 PM, THOMAS CATALANO tomcatalano.0 gmail account via SG16
wrote:
> .uH!! . ..I. ..jErK! ..oFF! ..tO! ..tHaT! ..bEUtIful! ..expose! ..oF!?
> ..unicode8«««»«. ..something. ..in. ..mAtH. ..or. .something. ..that.
> !! ..sOmethIng. ..wE! ..beEn. ..butating. ..fOr. ..yEaRs. // ..call!?
> ..angle–bracket. ..iF . ..yZou. ..mAY!
>
> Sent from my iPhone
>
>> On Mar 10, 2021, at 10:26 PM, Tom Honermann via SG16
>> <sg16_at_[hidden]> wrote:
>>
>> 
>>
>> std::print("╟≥σσ⌠Θετ≤ ßεΣ πß∞⌡⌠ß⌠Θ∩επ!\n");
>>
>> The following are questions/concerns that came up during SG16 review
>> of P2093 <https://wg21.link/p2093> that are worthy of further
>> discussion in SG16 and/or LEWG. Most of these issues were discussed
>> in SG16 and were determined either not to be SG16 concerns or were
>> deemed issues that for which we did not want to hold back forward
>> progress. These sentiments were not unanimous.
>>
>> The SG16 poll to forward P2093R3 <https://wg21.link/p2093r3> was
>> taken during our February 10th telecon. The poll was:
>>
>> Poll: Forward P2093R3 to LEWG.
>> - Attendance: 9
>>
>> SF
>> F
>> N
>> A
>> SA
>> 4
>> 2
>> 2
>> 0
>> 1
>>
>> Minutes for prior SG16 reviews of P2093 <https://wg21.link/p2093>,
>> are available at:
>>
>> * December 9th, 2020 telecon
>> <https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2020.md#december-9th-2020>;
>> review of P2093R2 <https://wg21.link/p2093r2>.
>> * February 10th, 2021 telecon
>> <https://github.com/sg16-unicode/sg16-meetings/blob/master/README.md>;
>> review of P2093R3 <https://wg21.link/p2093r3>.
>>
>> Questions raised include:
>>
>> 1. How should errors in transcoding be handled?
>> The Unicode recommendation is to substitute a replacement
>> character for invalid code unit sequences. P2093R4
>> <https://wg21.link/p2093r4> added wording to this effect.
>> 2. Should this feature move forward without a parallel proposal to
>> provide the underlying implementation dependent features need to
>> implement std::print()?
>> Specifically, should this feature be blocked on exposing
>> interfaces to 1) determine if a stream is connected directly to a
>> terminal/console, and 2) write directly to a terminal/console
>> (potentially bypassing a stream) using native interfaces where
>> applicable? These features would be necessary in order to
>> implement a portable version of std::print(). (I believe Victor
>> is already working on a companion paper).
>> 3. The choice to base behavior on the compile-time choice of
>> execution character set results in locale settings being ignored
>> at run-time. Is that ok?
>> 1. This choice will lead to unexpected results if a program runs
>> in a non-UTF-8 locale and consumes non-Unicode input (e.g.,
>> from stdin) and then attempts to echo it back.
>> 2. Additionally, it means that a program that uses only ASCII
>> characters in string literals will nevertheless behave
>> differently at run-time depending on the choice of execution
>> character set (which historically has only affected the
>> encoding of string literals).
>> 4. When the execution character set is not UTF-8, should conversion
>> to Unicode be performed when writing directly to a Unicode
>> enabled terminal/console?
>> 1. If so, should conversions be based on the compile-time
>> literal encoding or the locale dependent run-time execution
>> encoding?
>> 2. If the latter, that creates an odd asymmetry with the
>> behavior when the execution character set is UTF-8. Is that ok?
>> 5. What are the implications for future support of std::print("{} {}
>> {} {}", L"Wide text", u8"UTF-8 text", u"UTF-16 text", U"UTF-32
>> text")?
>> 1. As proposed, std::print() only produces unambiguously encoded
>> output when the execution character set is UTF-8 and it is
>> clear how these cases should be handled in that case.
>> 2. But how would the behavior be defined when the execution
>> character set is not UTF-8? Would the arguments be converted
>> to the execution character set? Or to the locale dependent
>> encoding?
>> 3. Note that these concerns are relevant for std::format() as well.
>>
>> An additional issue that was not discussed in SG16 relates to Unicode
>> normalization. As proposed, the expected output will match
>> expectations if the UTF-8 text does not contain any uses of combining
>> characters. However, if combining characters are present, either
>> because the text is in NFD or because there is no precomposed
>> character defined, then the combining characters may be rendered
>> separately from their base character as a result of terminal/console
>> interfaces mapping code points rather than grapheme clusters to
>> columns. Should std::print() also perform NFC normalization so that
>> characters with precomposed forms are displayed correctly? (These
>> concerns were explored in P1868 <https://wg21.link/p1868> when it was
>> adopted for C++20; see that paper for example screenshots; in
>> practice, this is only an issue with the Windows console).
>>
>> It would not be unreasonable for LEWG to send some of these questions
>> back to SG16 for more analysis.
>>
>> Tom.
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>


Received on 2021-03-14 15:24:23