On Sat, Sep 10, 2022 at 3:47 PM Tom Honermann via SG16 <sg16@lists.isocpp.org> wrote:
SG16 will hold a telecon on Wednesday, September 14th, at 19:30 UTC (timezone conversion).
The agenda is:
- Report on the on-going interactions between WG21 and the Unicode Consortium.
- Report on the backward compatibility impact of P1949 (C++ Identifier Syntax using Unicode Standard Annex 31).
- Continued discussion of P2626R0: charN_t incremental adoption: Casting pointers of UTF character types.
SG16 has previously engaged with the Unicode Message Format Working Group (MFWG) (in our March 11th, 2020 telecon) and the Unicode Source Code Ad Hoc Group (SCAHG) (in our May 25th, 2022 telecon with Robin Leroy). Peter and Tom have continued engaging with these groups. Tom will report on their activities (Peter is not expected to be available for this meeting).
Implementation of P1949 in Clang (as a DR unconditionally applied to older language modes) prompted some users to report substantial impact to projects that previously enjoyed use of mathematical symbols in identifiers. Tom will report on some specifics of the impact, discussion between Clang implementors, and on-going work by the SCAHG that may provide a solution in the future. Further discussion should focus on guidance that we may want to offer to implementors.
To clarify, it impacted substantially a very small number of users.The DR status is somewhat beside the point as these users have communicated that it is not desirable for them to be stuck on C++20 forever.Pending Corentin's availability, we'll continue discussion of P2626R0 from our August 24th, 2022 telecon.
I do believe there are core level concerns to be resolved, the main ones being
* Can we specify that destroying a range of charX through a pointer of range of charY is well defined - as we have established that the interface is not usable in common cases if there needs to be a restoring conversion prior to destruction. This is the main issue* Can we control the effect on parent objects (it would be reasonable that any access to an object be undefined while one of the subobjects is casted to a different type, but once the cast is reverted, the object needs to be usable again).
A third one is:
* Is an implementation required to detect access to an object by the wrong type as UB during constant evaluation? For example:
constexpr char test() {
char8_t c8a[] = u8"text";
char *cp = cast_utf_to<char>(c8a);
c8a[0]; // Ok? Or UB?
c8a[0] = 'X'; // Ok? Or UB?
return cp[0]; // Ok? Or UB because an object of char8_t type was written there?
}
constexpr char g = test();
static_assert(g == 'X'); // Ok? If so, this implies aliasing awareness.
static_assert(g == 't'); // Ok? If so, this implies distinct objects in overlapping regions of storage.
Corentin's prototype implementation doesn't diagnose UB in this
example and accepts the first static_assert (and fails the
second).
Tom.