C++ Logo

sg16

Advanced search

Re: Agenda for the 2024-02-21 SG16 meeting

From: Peter Bindels <peterbindels_at_[hidden]>
Date: Tue, 20 Feb 2024 09:12:38 +0100
That LWG4044 reads like somebody wrote it thinking of Windows first and
adding the rest as an afterthought. I'd agree with Jonathan on the
resolution but would like to adjust the approach:

    - For platforms that have separate methods for outputting Unicode and
non-Unicode text, it should determine if the output is to a Unicode
terminal and use the appropriate API, flushing the other API if necessary.
    - For platforms that have a single Unicode-compatible output, just use
the output.

but in legalese. Splitting platforms on whether or not they are Windows
(except in a non-Windows way) first, and only then adding complexity
required for those platforms, seems like the best way to help implementers
avoid the complexity if it's not necessary. As with Corentin's email (that
just came in), help platforms other than Windows avoid all complexity, and
give Windows the space to do its runtime debugging hooks and required
conversions for Unicode so it will work properly.

I expect a vigorous discussion on UAX#31 since this was the main impetus to
get us to adopt XID_Continue and XID_Start to begin with.

Regards,
Peter

On Tue, Feb 20, 2024 at 4:56 AM Tom Honermann via SG16 <
sg16_at_[hidden]> wrote:

> SG16 will hold a meeting on Wednesday, February 21st, at 19:30 UTC (timezone
> conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20240221T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>
> ).
>
> The agenda follows.
>
> - CWG 2843: Undated reference to Unicode makes C++ a moving target
> <https://cplusplus.github.io/CWG/issues/2843.html>
> - Identify updates needed for UAX #31 changes in Unicode 15.1.0.
> - LWG 4043: "ASCII" is not a registered character encoding
> <https://wg21.link/lwg4043>
> - LWG 4044: Confusing requirements for std::print on POSIX platforms
> <https://wg21.link/lwg4044>
>
> We reached consensus to recommend Unicode 15.1.0 as the minimum Unicode
> version and normative reference during the 2024-02-07 SG16 meeting
> <https://github.com/sg16-unicode/sg16-meetings?tab=readme-ov-file#february-7th-2024>.
> I thought this last discussion brought this issue to a conclusion for us,
> but an email sent to the WG14 mailing list by Joseph Myers (on 2024-02-13
> with subject "D.2.1 and UAX#31 revision 39") reminded me of an earlier email
> Corentin sent to the SG16 mailing list
> <https://lists.isocpp.org/sg16/2024/01/4041.php> (on 2024-01-06 with
> subject "UAX Profiles"). Changes made to UAX #31 (Unicode Identifiers and
> Syntax) <https://unicode.org/reports/tr31/> for Unicode 15.1.0 will
> require us to make a decision regarding accepting new character allowances
> in identifiers or adopting a profile to retain the Unicode 15.0.0
> allowances. In either case, changes to annex E (Conformance with UAX #31)
> <http://eel.is/c++draft/uaxid> will be required to reflect that rule
> UAX31-R1a (Restricted Format Characters) has been removed
> <https://www.unicode.org/reports/tr31/tr31-39.html#R1a>.
>
> A summary of the UAX #31 changes for Unicode 15.1.0 is provided in the "Modifications"
> section <https://www.unicode.org/reports/tr31/tr31-39.html#Modifications>.
> A diff of the changes relative to 15.0.0
> <https://www.unicode.org/reports/tr31/tr31-38.html> is also available. My
> understanding of the changes is that U+200C (ZERO WIDTH NON-JOINER) and
> U+200D (ZERO WIDTH JOINER) have been added to XID_continue
> <https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AXID_continue%3A%5D&g=&i=>
> to allow for characters that native speakers of some languages (e.g.,
> Persian) would expect to be able to use in identifiers. Spoofing concerns,
> including those that depend on the presence of the ZWNJ and ZWJ characters,
> remain the subject matter of UTS #39 (Unicode Security Mechanisms)
> <https://unicode.org/reports/tr39/>. I expect that Robin, Corentin, and
> Steve will be able to provide more details of the change and its
> motivation. As I understand things, our choices will be to:
>
> 1. Accept the changes to XID_continue, or
> 2. Reject the changes to XID_continue by adjusting the profile
> specified in [uaxid.def.general]
> <http://eel.is/c++draft/uaxid.def.general>, possibly by including the Default-Ignorable
> Exclusion Profile
> <https://www.unicode.org/reports/tr31/tr31-39.html#Default_Ignorable_Exclusion_Profile>,
> though that would exclude many code points
> <https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AXID_continue%3A%5D%26%5B%3ADefault_Ignorable_Code_Point%3A%5D&g=&i=>
> beyond ZWNJ and ZWJ.
>
> Once consensus for a direction is established, a volunteer will be needed
> to draft wording changes for [uaxid] <http://eel.is/c++draft/uaxid>.
>
> LWG 4043 was recently filed by Jonathan Wakely. It reports a
> straightforward concern; that the set of encodings recognized by
> std::text_encoding does not include "ASCII" despite that name being
> unambiguous and recognized by common encoding libraries. The proposed
> resolution is to add "ASCII" to the set of aliases for that IANA specified
> "US-ASCII" encoding despite the fact that the IANA character set registry
> <https://www.iana.org/assignments/character-sets/character-sets.xhtml>
> does not do so.
>
> LWG 4044 was also recently filed by Jonathan Wakely while working to
> implement std::print() in libstdc++. Jonathan's initial implementation
> attempted to do what the C++ standard wording stated and detected
> ill-formed code units written to a stream that is directed to a terminal so
> that they could be diagnosed. He found that the overhead of calling
> isatty() on Linux to determine if a stream is directed to a terminal was
> prohibitively expensive and started questioning why the standard was
> directing him to do this. In private correspondence, it was clarified that
> the intent of the "native Unicode API" terminology was to generically refer
> to the Windows WriteConsoleW() function and that there is no need to do
> anything special on POSIX systems. That discussion also questioned what it
> means to diagnose invalid code units written to a console at run-time.
> Jonathan has been kind enough to draft a proposed resolution to clarify the
> intent.
>
> Tom.
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2024-02-20 08:12:51