Date: Mon, 6 May 2024 13:22:28 -0400
SG16 will hold a meeting on Wednesday, May 8th, at 19:30 UTC (timezone
conversion
<https://www.timeanddate.com/worldclock/converter.html?iso=20240508T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
The agenda follows.
* D3258R0: Formatting of charN_t <https://wg21.link/d3258r0>.
* P2996R2: Reflection for C++26 <http://wg21.link/p2996r2>.
D3258R0 was hastily produced by Corentin following the review of P2996R2
during the 2024-04-24 SG16 meeting
<https://github.com/sg16-unicode/sg16-meetings/#april-24th-2024> with
the goal of providing a convenient solution for printing UTF-8 text held
in char8_t-based storage. It proposes extending std::format() and
std::print() to support formatting arguments of Unicode character type
(characters and strings of char8_t, char16_t, or char32_t type). It does
not propose a solution for iostreams. We won't poll this paper during
this meeting for two reasons: 1) the paper is hot off the press and I
don't expect everyone to have already read it and internalized all the
implications, and 2) I'm going to limit discussion of it to the first
half of the meeting so that we continue to make progress on P2996. The
intent in discussing it, particularly with the P2996 authors present, is
to build a sense of whether it suffices to at least minimally address
the printing requirements posed by the P2996 authors; we may take a poll
on that point.
Our recent review of P2996R2 was constructive but not conclusive. We'll
continue discussion with a goal of establishing consensus on the
following points. Please review the meeting summary from the last review
<https://github.com/sg16-unicode/sg16-meetings/#april-24th-2024> as well
as the ensuing "Follow up on SG16 review of P2996R2" discussion on the
SG16 mailing list <https://lists.isocpp.org/sg16/2024/04/index.php>
prior to the meeting.
1. The character type(s) and encoding(s) used for names produced and
consumed by reflection interfaces. My sense is that we're leaning in
the following direction (not unanimously though):
1. Names will be produced and consumed in both the ordinary literal
encoding via type char and UTF-8 via type char8_t.
2. Production of names that contain characters that are not
representable in the ordinary literal encoding will produce a
string that contains a UCN-like escape sequence for such characters.
3. Consumption of names in the ordinary literal encoding will
accept a UCN-like escape sequence for characters not in the
basic literal character set that may lack representation in the
ordinary literal encoding.
2. The use of a distinct type for names (e.g., a type that stores names
in an internal representation and exposes them via char and char8_t
interfaces).
3. Unicode NFC requirements (see below).
We briefly discussed Unicode normalization form C (NFC) last time.
Following adoption of P1949R7 (C++ Identifier Syntax using Unicode
Standard Annex 31) <https://wg21.link/p1949r7> as a DR for C++23,
identifiers are required to be written in NFC. Conversion to the
ordinary literal encoding could result in names that are not in NFC. It
will presumably be necessary for P2996 to specify that, for round-trip
purposes, conversion to the ordinary literal encoding will not perform
character substitutions (e.g., UNC-like escape sequences will be
generated instead). Likewise, it will be necessary to specify how names
that do not conform to NFC will be handled by reflection interfaces that
consume user provided names. Note that current compiler releases exhibit
implementation divergence with respect to enforcement of the NFC
requirement (https://godbolt.org/z/E35r1K7hE; gcc does diagnose, Clang
and EDG do not, MSVC does not yet implement P1949R7).
Finally, and as a separable issue that can be discussed at another time,
I think we should discuss differentiating between names and identifiers
in the reflection interfaces. This isn't an issue for data_member_spec()
since data members are always identifiers (or are unnamed; that is
another interesting case, but isn't an SG16 concern), but could be an
issue for a hypothetical function_spec() or member_function_spec()
interface used for named functions, constructors and destructors,
overloaded operators, conversion operators, user-defined literals,
etc.... Distinguishing between names and identifiers would avoid the
need to parse, e.g., operator bool or ""_udl, when consuming names.
Tom.
conversion
<https://www.timeanddate.com/worldclock/converter.html?iso=20240508T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
The agenda follows.
* D3258R0: Formatting of charN_t <https://wg21.link/d3258r0>.
* P2996R2: Reflection for C++26 <http://wg21.link/p2996r2>.
D3258R0 was hastily produced by Corentin following the review of P2996R2
during the 2024-04-24 SG16 meeting
<https://github.com/sg16-unicode/sg16-meetings/#april-24th-2024> with
the goal of providing a convenient solution for printing UTF-8 text held
in char8_t-based storage. It proposes extending std::format() and
std::print() to support formatting arguments of Unicode character type
(characters and strings of char8_t, char16_t, or char32_t type). It does
not propose a solution for iostreams. We won't poll this paper during
this meeting for two reasons: 1) the paper is hot off the press and I
don't expect everyone to have already read it and internalized all the
implications, and 2) I'm going to limit discussion of it to the first
half of the meeting so that we continue to make progress on P2996. The
intent in discussing it, particularly with the P2996 authors present, is
to build a sense of whether it suffices to at least minimally address
the printing requirements posed by the P2996 authors; we may take a poll
on that point.
Our recent review of P2996R2 was constructive but not conclusive. We'll
continue discussion with a goal of establishing consensus on the
following points. Please review the meeting summary from the last review
<https://github.com/sg16-unicode/sg16-meetings/#april-24th-2024> as well
as the ensuing "Follow up on SG16 review of P2996R2" discussion on the
SG16 mailing list <https://lists.isocpp.org/sg16/2024/04/index.php>
prior to the meeting.
1. The character type(s) and encoding(s) used for names produced and
consumed by reflection interfaces. My sense is that we're leaning in
the following direction (not unanimously though):
1. Names will be produced and consumed in both the ordinary literal
encoding via type char and UTF-8 via type char8_t.
2. Production of names that contain characters that are not
representable in the ordinary literal encoding will produce a
string that contains a UCN-like escape sequence for such characters.
3. Consumption of names in the ordinary literal encoding will
accept a UCN-like escape sequence for characters not in the
basic literal character set that may lack representation in the
ordinary literal encoding.
2. The use of a distinct type for names (e.g., a type that stores names
in an internal representation and exposes them via char and char8_t
interfaces).
3. Unicode NFC requirements (see below).
We briefly discussed Unicode normalization form C (NFC) last time.
Following adoption of P1949R7 (C++ Identifier Syntax using Unicode
Standard Annex 31) <https://wg21.link/p1949r7> as a DR for C++23,
identifiers are required to be written in NFC. Conversion to the
ordinary literal encoding could result in names that are not in NFC. It
will presumably be necessary for P2996 to specify that, for round-trip
purposes, conversion to the ordinary literal encoding will not perform
character substitutions (e.g., UNC-like escape sequences will be
generated instead). Likewise, it will be necessary to specify how names
that do not conform to NFC will be handled by reflection interfaces that
consume user provided names. Note that current compiler releases exhibit
implementation divergence with respect to enforcement of the NFC
requirement (https://godbolt.org/z/E35r1K7hE; gcc does diagnose, Clang
and EDG do not, MSVC does not yet implement P1949R7).
Finally, and as a separable issue that can be discussed at another time,
I think we should discuss differentiating between names and identifiers
in the reflection interfaces. This isn't an issue for data_member_spec()
since data members are always identifiers (or are unnamed; that is
another interesting case, but isn't an SG16 concern), but could be an
issue for a hypothetical function_spec() or member_function_spec()
interface used for named functions, constructors and destructors,
overloaded operators, conversion operators, user-defined literals,
etc.... Distinguishing between names and identifiers would avoid the
need to parse, e.g., operator bool or ""_udl, when consuming names.
Tom.
Received on 2024-05-06 17:22:33