ISOCPP sg16 List: Re: Agenda for the 2022-07-27 SG16 telecon

From: Tom Honermann <tom_at_[hidden]>
Date: Tue, 26 Jul 2022 17:54:02 -0400

This is your friendly reminder that this meeting is taking place tomorrow.

Tom.

On 7/21/22 6:46 PM, Tom Honermann via SG16 wrote:
>
> SG16 will hold a telecon on Wednesday, July 27th, at 19:30 UTC
> (timezone conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20220727T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
>
> Please note that this message is being sent to the WG14 mailing list.
>
> Interested WG14 members are encouraged to attend this meeting. A
> calendar event (.ics) file containing the meeting details is attached.
> Alternatively, meeting details can be found here
> <https://documents.isocpp.org/index.php/apps/calendar/p/R7imgS2LJD9xfeWN/dayGridMonth/now/view/sidebar/L3JlbW90ZS5waHAvZGF2L3B1YmxpYy1jYWxlbmRhcnMvUjdpbWdTMkxKRDl4ZmVXTi81QkE1NTVFRC0xOTFCLTRERUQtQUFFMi01Q0Q1OTQwMDM4NjYuaWNz/1658950200>.
>
> The agenda is:
>
> * WG14 N3016: Unicode Length Modifiers v3
> <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3016.pdf>
>
> The linked paper proposes additional length modifiers (U8, U16, and
> U32) for the printf() and scanf() family of functions that enable them
> to write and read UTF-8, UTF-16, and UTF-32 encoded text in char8_t,
> char16_t, and char32_t based storage via conversion from/to the
> (locale sensitive) execution encoding (consistent with conversions
> that are performed for text in wchar_t based storage). For example:
>
> printf("From a UTF-8 string: %*U8*s\n", u8"text");
> printf("From a UTF-16 character: %*U16*c\n", u'X');
>
> WG14 discussed the paper during their committee meeting this week but
> declined to adopt it for C23 due to general concerns about encoding
> issues, a desire to consider the larger design space, and dependencies
> on text conversion facilities not currently required by the C
> standard. The encoding concerns match those we've discussed before and
> underscore the reasons that none of std::format(), std::print(), or
> C++ iostreams support output from UTF encoded text in char8_t,
> char16_t, and char32_t based storage.
>
> Consider the following code and the existing text conversion support
> currently required for wide strings (the contents of ws will be
> converted to the locale sensitive execution encoding).
>
> wchar_t ws[] = L"...";
> printf("<text>: %ls\n", ws);
>
> Programmers using an implementation that encodes string literals as
> UTF-8 will most likely expect the example to produce UTF-8 output
> regardless of the execution encoding associated with the run-time
> locale. However, if run in an environment that uses a locale with a
> different encoding (e.g., Windows-1252 as is the common case for
> Windows machines located in the United States), then the output will
> contain a mix of UTF-8 and non-UTF-8 encoded text.
>
> The same problem occurs for C++ with:
>
> std::cout << "<text>: " << ws << "\n";
>
> WG21 has so far avoided these concerns with regard to char8_t,
> char16_t, and char32_t; no support is currently provided for
> formatting text in storage of these types with any of std::format(),
> std::print(), or iostreams. This limits the usability of these types
> and portable support for UTF encoded text in general.
>
> In this meeting, we'll discuss these concerns and the design space for
> improving the situation. Some items to consider:
>
> 1. When designing std::format() and std::print(), WG21 has chosen to
> ignore the locale dependent execution encoding in several
> situations when the encoding of string literals is known to be UTF-8.
> 2. Many implementations offer text conversion facilities as part of
> their I/O environment:
> 1. Microsoft's fopen() implementation allows a file to be opened
> as text with a specified encoding. From Microsoft's
> documentation
> <https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfopen?view=msvc-170>:
> FILE *fp = fopen("newfile.txt", "rt+, ccs=UTF-8");
> 2. GNU libc's fopen() implementation similarly allows an
> associated encoding via a "ccs" mode string. See Linux
> documentation
> <https://man7.org/linux/man-pages/man3/fopen.3.html>.
> 3. IBM's z/OS allows an encoding to be associated with a file as
> a filesystem attribute. See IBM's chtag documentation
> <https://www.ibm.com/docs/en/zos/2.3.0?topic=descriptions-chtag-change-file-tag-information>.
> z/OS also supports associating an encoding and enabling
> conversions for file streams, but I wasn't able to find
> documentation just now.
>
> Tom.
>
>

Received on 2022-07-26 21:54:07