Date: Mon, 6 May 2024 20:13:00 -0400
On May 6, 2024, at 7:36 PM, Victor Zverovich <victor.zverovich_at_[hidden]> wrote:
I'll likely be ~half an hour late and more interested in P2996. Is the plan to look into Corentin's draft first and then P2996?
- VictorOn Mon, May 6, 2024 at 10:22 AM Tom Honermann via SG16 <sg16_at_[hidden]> wrote:--SG16 will hold a meeting on Wednesday, May 8th, at 19:30 UTC (timezone conversion).
The agenda follows.
D3258R0 was hastily produced by Corentin following the review of P2996R2 during the 2024-04-24 SG16 meeting with the goal of providing a convenient solution for printing UTF-8 text held in char8_t-based storage. It proposes extending std::format() and std::print() to support formatting arguments of Unicode character type (characters and strings of char8_t, char16_t, or char32_t type). It does not propose a solution for iostreams. We won't poll this paper during this meeting for two reasons: 1) the paper is hot off the press and I don't expect everyone to have already read it and internalized all the implications, and 2) I'm going to limit discussion of it to the first half of the meeting so that we continue to make progress on P2996. The intent in discussing it, particularly with the P2996 authors present, is to build a sense of whether it suffices to at least minimally address the printing requirements posed by the P2996 authors; we may take a poll on that point.
Our recent review of P2996R2 was constructive but not conclusive. We'll continue discussion with a goal of establishing consensus on the following points. Please review the meeting summary from the last review as well as the ensuing "Follow up on SG16 review of P2996R2" discussion on the SG16 mailing list prior to the meeting.
- The character type(s) and encoding(s) used for names produced and consumed by reflection interfaces. My sense is that we're leaning in the following direction (not unanimously though):
- Names will be produced and consumed in both the ordinary literal encoding via type char and UTF-8 via type char8_t.
- Production of names that contain characters that are not representable in the ordinary literal encoding will produce a string that contains a UCN-like escape sequence for such characters.
- Consumption of names in the ordinary literal encoding will accept a UCN-like escape sequence for characters not in the basic literal character set that may lack representation in the ordinary literal encoding.
- The use of a distinct type for names (e.g., a type that stores names in an internal representation and exposes them via char and char8_t interfaces).
- Unicode NFC requirements (see below).
We briefly discussed Unicode normalization form C (NFC) last time. Following adoption of P1949R7 (C++ Identifier Syntax using Unicode Standard Annex 31) as a DR for C++23, identifiers are required to be written in NFC. Conversion to the ordinary literal encoding could result in names that are not in NFC. It will presumably be necessary for P2996 to specify that, for round-trip purposes, conversion to the ordinary literal encoding will not perform character substitutions (e.g., UNC-like escape sequences will be generated instead). Likewise, it will be necessary to specify how names that do not conform to NFC will be handled by reflection interfaces that consume user provided names. Note that current compiler releases exhibit implementation divergence with respect to enforcement of the NFC requirement (https://godbolt.org/z/E35r1K7hE; gcc does diagnose, Clang and EDG do not, MSVC does not yet implement P1949R7).
Finally, and as a separable issue that can be discussed at another time, I think we should discuss differentiating between names and identifiers in the reflection interfaces. This isn't an issue for data_member_spec() since data members are always identifiers (or are unnamed; that is another interesting case, but isn't an SG16 concern), but could be an issue for a hypothetical function_spec() or member_function_spec() interface used for named functions, constructors and destructors, overloaded operators, conversion operators, user-defined literals, etc.... Distinguishing between names and identifiers would avoid the need to parse, e.g., operator bool or ""_udl, when consuming names.
Tom.
SG16 mailing list
SG16_at_[hidden]
https://lists.isocpp.org/mailman/listinfo.cgi/sg16
Received on 2024-05-07 00:13:15