ISOCPP sg16 List: Agenda for the 2024-02-21 SG16 meeting

From: Tom Honermann <tom_at_[hidden]>
Date: Mon, 19 Feb 2024 22:56:34 -0500

SG16 will hold a meeting on Wednesday, February 21st, at 19:30 UTC
(timezone conversion
<https://www.timeanddate.com/worldclock/converter.html?iso=20240221T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>).

The agenda follows.

  * CWG 2843: Undated reference to Unicode makes C++ a moving target
    <https://cplusplus.github.io/CWG/issues/2843.html>
      o Identify updates needed for UAX #31 changes in Unicode 15.1.0.
  * LWG 4043: "ASCII" is not a registered character encoding
    <https://wg21.link/lwg4043>
  * LWG 4044: Confusing requirements for std::print on POSIX platforms
    <https://wg21.link/lwg4044>

We reached consensus to recommend Unicode 15.1.0 as the minimum Unicode
version and normative reference during the 2024-02-07 SG16 meeting
<https://github.com/sg16-unicode/sg16-meetings?tab=readme-ov-file#february-7th-2024>.
I thought this last discussion brought this issue to a conclusion for
us, but an email sent to the WG14 mailing list by Joseph Myers (on
2024-02-13 with subject "D.2.1 and UAX#31 revision 39") reminded me of
an earlier email Corentin sent to the SG16 mailing list
<https://lists.isocpp.org/sg16/2024/01/4041.php> (on 2024-01-06 with
subject "UAX Profiles"). Changes made to UAX #31 (Unicode Identifiers
and Syntax) <https://unicode.org/reports/tr31/> for Unicode 15.1.0 will
require us to make a decision regarding accepting new character
allowances in identifiers or adopting a profile to retain the Unicode
15.0.0 allowances. In either case, changes to annex E (Conformance with
UAX #31) <http://eel.is/c++draft/uaxid> will be required to reflect that
rule UAX31-R1a (Restricted Format Characters) has been removed
<https://www.unicode.org/reports/tr31/tr31-39.html#R1a>.

A summary of the UAX #31 changes for Unicode 15.1.0 is provided in the
"Modifications" section
<https://www.unicode.org/reports/tr31/tr31-39.html#Modifications>. A
diff of the changes relative to 15.0.0
<https://www.unicode.org/reports/tr31/tr31-38.html> is also available.
My understanding of the changes is that U+200C (ZERO WIDTH NON-JOINER)
and U+200D (ZERO WIDTH JOINER) have been added to XID_continue
<https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AXID_continue%3A%5D&g=&i=>
to allow for characters that native speakers of some languages (e.g.,
Persian) would expect to be able to use in identifiers. Spoofing
concerns, including those that depend on the presence of the ZWNJ and
ZWJ characters, remain the subject matter of UTS #39 (Unicode Security
Mechanisms) <https://unicode.org/reports/tr39/>. I expect that Robin,
Corentin, and Steve will be able to provide more details of the change
and its motivation. As I understand things, our choices will be to:

1. Accept the changes to XID_continue, or
2. Reject the changes to XID_continue by adjusting the profile
    specified in [uaxid.def.general]
    <http://eel.is/c++draft/uaxid.def.general>, possibly by including
    the Default-Ignorable Exclusion Profile
    <https://www.unicode.org/reports/tr31/tr31-39.html#Default_Ignorable_Exclusion_Profile>,
    though that would exclude many code points
    <https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AXID_continue%3A%5D%26%5B%3ADefault_Ignorable_Code_Point%3A%5D&g=&i=>
    beyond ZWNJ and ZWJ.

Once consensus for a direction is established, a volunteer will be
needed to draft wording changes for [uaxid] <http://eel.is/c++draft/uaxid>.

LWG 4043 was recently filed by Jonathan Wakely. It reports a
straightforward concern; that the set of encodings recognized by
std::text_encoding does not include "ASCII" despite that name being
unambiguous and recognized by common encoding libraries. The proposed
resolution is to add "ASCII" to the set of aliases for that IANA
specified "US-ASCII" encoding despite the fact that the IANA character
set registry
<https://www.iana.org/assignments/character-sets/character-sets.xhtml>
does not do so.

LWG 4044 was also recently filed by Jonathan Wakely while working to
implement std::print() in libstdc++. Jonathan's initial implementation
attempted to do what the C++ standard wording stated and detected
ill-formed code units written to a stream that is directed to a terminal
so that they could be diagnosed. He found that the overhead of calling
isatty() on Linux to determine if a stream is directed to a terminal was
prohibitively expensive and started questioning why the standard was
directing him to do this. In private correspondence, it was clarified
that the intent of the "native Unicode API" terminology was to
generically refer to the Windows WriteConsoleW() function and that there
is no need to do anything special on POSIX systems. That discussion also
questioned what it means to diagnose invalid code units written to a
console at run-time. Jonathan has been kind enough to draft a proposed
resolution to clarify the intent.

Tom.

Received on 2024-02-20 03:56:35