This is your friendly reminder that this meeting is taking place tomorrow.
Tom.
SG16 will hold a meeting on Wednesday, February 21st, at 19:30 UTC (timezone conversion).
The agenda follows.
- CWG 2843: Undated reference to Unicode makes C++ a moving target
- Identify updates needed for UAX #31 changes in Unicode 15.1.0.
- LWG 4043: "ASCII" is not a registered character encoding
- LWG 4044: Confusing requirements for std::print on POSIX platforms
We reached consensus to recommend Unicode 15.1.0 as the minimum Unicode version and normative reference during the 2024-02-07 SG16 meeting. I thought this last discussion brought this issue to a conclusion for us, but an email sent to the WG14 mailing list by Joseph Myers (on 2024-02-13 with subject "D.2.1 and UAX#31 revision 39") reminded me of an earlier email Corentin sent to the SG16 mailing list (on 2024-01-06 with subject "UAX Profiles"). Changes made to UAX #31 (Unicode Identifiers and Syntax) for Unicode 15.1.0 will require us to make a decision regarding accepting new character allowances in identifiers or adopting a profile to retain the Unicode 15.0.0 allowances. In either case, changes to annex E (Conformance with UAX #31) will be required to reflect that rule UAX31-R1a (Restricted Format Characters) has been removed.
A summary of the UAX #31 changes for Unicode 15.1.0 is provided in the "Modifications" section. A diff of the changes relative to 15.0.0 is also available. My understanding of the changes is that U+200C (ZERO WIDTH NON-JOINER) and U+200D (ZERO WIDTH JOINER) have been added to XID_continue to allow for characters that native speakers of some languages (e.g., Persian) would expect to be able to use in identifiers. Spoofing concerns, including those that depend on the presence of the ZWNJ and ZWJ characters, remain the subject matter of UTS #39 (Unicode Security Mechanisms). I expect that Robin, Corentin, and Steve will be able to provide more details of the change and its motivation. As I understand things, our choices will be to:
- Accept the changes to XID_continue, or
- Reject the changes to XID_continue by adjusting the profile specified in [uaxid.def.general], possibly by including the Default-Ignorable Exclusion Profile, though that would exclude many code points beyond ZWNJ and ZWJ.
Once consensus for a direction is established, a volunteer will be needed to draft wording changes for [uaxid].
LWG 4043 was recently filed by Jonathan Wakely. It reports a straightforward concern; that the set of encodings recognized by std::text_encoding does not include "ASCII" despite that name being unambiguous and recognized by common encoding libraries. The proposed resolution is to add "ASCII" to the set of aliases for that IANA specified "US-ASCII" encoding despite the fact that the IANA character set registry does not do so.
LWG 4044 was also recently filed by Jonathan Wakely while working to implement std::print() in libstdc++. Jonathan's initial implementation attempted to do what the C++ standard wording stated and detected ill-formed code units written to a stream that is directed to a terminal so that they could be diagnosed. He found that the overhead of calling isatty() on Linux to determine if a stream is directed to a terminal was prohibitively expensive and started questioning why the standard was directing him to do this. In private correspondence, it was clarified that the intent of the "native Unicode API" terminology was to generically refer to the Windows WriteConsoleW() function and that there is no need to do anything special on POSIX systems. That discussion also questioned what it means to diagnose invalid code units written to a console at run-time. Jonathan has been kind enough to draft a proposed resolution to clarify the intent.
Tom.