Date: Mon, 6 May 2024 18:17:23 +0000
It seems to me that having the ability to write a compile-time transcoder would solve some of the problems.
I think providing transcoding functions and have them work in a constant evaluated context should be a priority.
It may not necessarily be the most ergonomic, but when you can just transcode from one to another at little to no cost the encoding that some of those names are provided in stop being relevant.
I don’t think P2728R6<https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2728r6.html> does it for me, but I think we should be able to find an ergonomic solution for it.
And I agree with the sentiment of just downright not supporting iostream, to me they are only there for legacy reasons and every job that one would care to do has been replaced with a better tool somewhere else, and I don’t think deprecating it is a bad idea (had it not been for its widespread use), there’s no need to waste energy there even though I believe the problem fixes itself once there’s a transcoder.
My 2c,
From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Tom Honermann via SG16
Sent: Monday, May 6, 2024 19:22
To: SG16 <sg16_at_[hidden]>; Daveed Vandevoorde <daveed_at_[hidden]>; Faisal Vali <faisalv_at_[hidden]>; Andrew Sutton <andrew.n.sutton_at_[hidden]>; Barry Revzin <barry.revzin_at_[hidden]>; Dan Katz <dkatz85_at_[hidden]>; Peter Dimov <pdimov_at_[hidden]>; Wyatt Childers <wcc_at_[hidden]>
Cc: Tom Honermann <tom_at_[hidden]>
Subject: [SG16] Agenda for the 2024-05-08 SG16 meeting
SG16 will hold a meeting on Wednesday, May 8th, at 19:30 UTC (timezone conversion<https://www.timeanddate.com/worldclock/converter.html?iso=20240508T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
The agenda follows.
* D3258R0: Formatting of charN_t<https://wg21.link/d3258r0>.
* P2996R2: Reflection for C++26<http://wg21.link/p2996r2>.
D3258R0 was hastily produced by Corentin following the review of P2996R2 during the 2024-04-24 SG16 meeting<https://github.com/sg16-unicode/sg16-meetings/#april-24th-2024> with the goal of providing a convenient solution for printing UTF-8 text held in char8_t-based storage. It proposes extending std::format() and std::print() to support formatting arguments of Unicode character type (characters and strings of char8_t, char16_t, or char32_t type). It does not propose a solution for iostreams. We won't poll this paper during this meeting for two reasons: 1) the paper is hot off the press and I don't expect everyone to have already read it and internalized all the implications, and 2) I'm going to limit discussion of it to the first half of the meeting so that we continue to make progress on P2996. The intent in discussing it, particularly with the P2996 authors present, is to build a sense of whether it suffices to at least minimally address the printing requirements posed by the P2996 authors; we may take a poll on that point.
Our recent review of P2996R2 was constructive but not conclusive. We'll continue discussion with a goal of establishing consensus on the following points. Please review the meeting summary from the last review<https://github.com/sg16-unicode/sg16-meetings/#april-24th-2024> as well as the ensuing "Follow up on SG16 review of P2996R2" discussion on the SG16 mailing list<https://lists.isocpp.org/sg16/2024/04/index.php> prior to the meeting.
1. The character type(s) and encoding(s) used for names produced and consumed by reflection interfaces. My sense is that we're leaning in the following direction (not unanimously though):
* Names will be produced and consumed in both the ordinary literal encoding via type char and UTF-8 via type char8_t.
* Production of names that contain characters that are not representable in the ordinary literal encoding will produce a string that contains a UCN-like escape sequence for such characters.
* Consumption of names in the ordinary literal encoding will accept a UCN-like escape sequence for characters not in the basic literal character set that may lack representation in the ordinary literal encoding.
2. The use of a distinct type for names (e.g., a type that stores names in an internal representation and exposes them via char and char8_t interfaces).
3. Unicode NFC requirements (see below).
We briefly discussed Unicode normalization form C (NFC) last time. Following adoption of P1949R7 (C++ Identifier Syntax using Unicode Standard Annex 31)<https://wg21.link/p1949r7> as a DR for C++23, identifiers are required to be written in NFC. Conversion to the ordinary literal encoding could result in names that are not in NFC. It will presumably be necessary for P2996 to specify that, for round-trip purposes, conversion to the ordinary literal encoding will not perform character substitutions (e.g., UNC-like escape sequences will be generated instead). Likewise, it will be necessary to specify how names that do not conform to NFC will be handled by reflection interfaces that consume user provided names. Note that current compiler releases exhibit implementation divergence with respect to enforcement of the NFC requirement (https://godbolt.org/z/E35r1K7hE; gcc does diagnose, Clang and EDG do not, MSVC does not yet implement P1949R7).
Finally, and as a separable issue that can be discussed at another time, I think we should discuss differentiating between names and identifiers in the reflection interfaces. This isn't an issue for data_member_spec() since data members are always identifiers (or are unnamed; that is another interesting case, but isn't an SG16 concern), but could be an issue for a hypothetical function_spec() or member_function_spec() interface used for named functions, constructors and destructors, overloaded operators, conversion operators, user-defined literals, etc.... Distinguishing between names and identifiers would avoid the need to parse, e.g., operator bool or ""_udl, when consuming names.
Tom.
I think providing transcoding functions and have them work in a constant evaluated context should be a priority.
It may not necessarily be the most ergonomic, but when you can just transcode from one to another at little to no cost the encoding that some of those names are provided in stop being relevant.
I don’t think P2728R6<https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2728r6.html> does it for me, but I think we should be able to find an ergonomic solution for it.
And I agree with the sentiment of just downright not supporting iostream, to me they are only there for legacy reasons and every job that one would care to do has been replaced with a better tool somewhere else, and I don’t think deprecating it is a bad idea (had it not been for its widespread use), there’s no need to waste energy there even though I believe the problem fixes itself once there’s a transcoder.
My 2c,
From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Tom Honermann via SG16
Sent: Monday, May 6, 2024 19:22
To: SG16 <sg16_at_[hidden]>; Daveed Vandevoorde <daveed_at_[hidden]>; Faisal Vali <faisalv_at_[hidden]>; Andrew Sutton <andrew.n.sutton_at_[hidden]>; Barry Revzin <barry.revzin_at_[hidden]>; Dan Katz <dkatz85_at_[hidden]>; Peter Dimov <pdimov_at_[hidden]>; Wyatt Childers <wcc_at_[hidden]>
Cc: Tom Honermann <tom_at_[hidden]>
Subject: [SG16] Agenda for the 2024-05-08 SG16 meeting
SG16 will hold a meeting on Wednesday, May 8th, at 19:30 UTC (timezone conversion<https://www.timeanddate.com/worldclock/converter.html?iso=20240508T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
The agenda follows.
* D3258R0: Formatting of charN_t<https://wg21.link/d3258r0>.
* P2996R2: Reflection for C++26<http://wg21.link/p2996r2>.
D3258R0 was hastily produced by Corentin following the review of P2996R2 during the 2024-04-24 SG16 meeting<https://github.com/sg16-unicode/sg16-meetings/#april-24th-2024> with the goal of providing a convenient solution for printing UTF-8 text held in char8_t-based storage. It proposes extending std::format() and std::print() to support formatting arguments of Unicode character type (characters and strings of char8_t, char16_t, or char32_t type). It does not propose a solution for iostreams. We won't poll this paper during this meeting for two reasons: 1) the paper is hot off the press and I don't expect everyone to have already read it and internalized all the implications, and 2) I'm going to limit discussion of it to the first half of the meeting so that we continue to make progress on P2996. The intent in discussing it, particularly with the P2996 authors present, is to build a sense of whether it suffices to at least minimally address the printing requirements posed by the P2996 authors; we may take a poll on that point.
Our recent review of P2996R2 was constructive but not conclusive. We'll continue discussion with a goal of establishing consensus on the following points. Please review the meeting summary from the last review<https://github.com/sg16-unicode/sg16-meetings/#april-24th-2024> as well as the ensuing "Follow up on SG16 review of P2996R2" discussion on the SG16 mailing list<https://lists.isocpp.org/sg16/2024/04/index.php> prior to the meeting.
1. The character type(s) and encoding(s) used for names produced and consumed by reflection interfaces. My sense is that we're leaning in the following direction (not unanimously though):
* Names will be produced and consumed in both the ordinary literal encoding via type char and UTF-8 via type char8_t.
* Production of names that contain characters that are not representable in the ordinary literal encoding will produce a string that contains a UCN-like escape sequence for such characters.
* Consumption of names in the ordinary literal encoding will accept a UCN-like escape sequence for characters not in the basic literal character set that may lack representation in the ordinary literal encoding.
2. The use of a distinct type for names (e.g., a type that stores names in an internal representation and exposes them via char and char8_t interfaces).
3. Unicode NFC requirements (see below).
We briefly discussed Unicode normalization form C (NFC) last time. Following adoption of P1949R7 (C++ Identifier Syntax using Unicode Standard Annex 31)<https://wg21.link/p1949r7> as a DR for C++23, identifiers are required to be written in NFC. Conversion to the ordinary literal encoding could result in names that are not in NFC. It will presumably be necessary for P2996 to specify that, for round-trip purposes, conversion to the ordinary literal encoding will not perform character substitutions (e.g., UNC-like escape sequences will be generated instead). Likewise, it will be necessary to specify how names that do not conform to NFC will be handled by reflection interfaces that consume user provided names. Note that current compiler releases exhibit implementation divergence with respect to enforcement of the NFC requirement (https://godbolt.org/z/E35r1K7hE; gcc does diagnose, Clang and EDG do not, MSVC does not yet implement P1949R7).
Finally, and as a separable issue that can be discussed at another time, I think we should discuss differentiating between names and identifiers in the reflection interfaces. This isn't an issue for data_member_spec() since data members are always identifiers (or are unnamed; that is another interesting case, but isn't an SG16 concern), but could be an issue for a hypothetical function_spec() or member_function_spec() interface used for named functions, constructors and destructors, overloaded operators, conversion operators, user-defined literals, etc.... Distinguishing between names and identifiers would avoid the need to parse, e.g., operator bool or ""_udl, when consuming names.
Tom.
Received on 2024-05-06 18:17:31