Date: Sun, 30 May 2021 21:33:21 -0400
I am seeking review feedback on a draft of N2653: char8_t: A type for
UTF-8 characters and strings (Revision 1)
<https://rawgit.com/sg16-unicode/sg16/master/papers/n2653.html>. This
paper revises an earlier paper, N2231
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2231.htm>, from 2018.
The revision is a rewrite of much of the original paper and follows the
C++20 adoption of P0482R6 <https://wg21.link/p0482r6>. The primary
motivation is to maintain source code compatibility between C and C++.
Notable differences between what was adopted in C++20 and what is
proposed for C2X in N2653
<https://rawgit.com/sg16-unicode/sg16/master/papers/n2653.html> are:
1. In C++20, char8_t is a fundamental type. The C2X proposal is for a
char8_t typedef name of unsigned char. This is consistent with
existing differences between the languages for wchar_t, char16_t,
and char32_t.
2. In C++20, a UTF-8 string literal may no longer be used to initialize
an array of char, signed char, or unsigned char. The C2X proposal
retains these initializations. This is also consistent with
existing differences for array initialization by a string literal
with a mismatched encoding prefix.
The Design Options section discusses these design decisions in more detail.
I intend to submit this revision to WG14 later this week. Any feedback
is appreciated.
Tom.
UTF-8 characters and strings (Revision 1)
<https://rawgit.com/sg16-unicode/sg16/master/papers/n2653.html>. This
paper revises an earlier paper, N2231
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2231.htm>, from 2018.
The revision is a rewrite of much of the original paper and follows the
C++20 adoption of P0482R6 <https://wg21.link/p0482r6>. The primary
motivation is to maintain source code compatibility between C and C++.
Notable differences between what was adopted in C++20 and what is
proposed for C2X in N2653
<https://rawgit.com/sg16-unicode/sg16/master/papers/n2653.html> are:
1. In C++20, char8_t is a fundamental type. The C2X proposal is for a
char8_t typedef name of unsigned char. This is consistent with
existing differences between the languages for wchar_t, char16_t,
and char32_t.
2. In C++20, a UTF-8 string literal may no longer be used to initialize
an array of char, signed char, or unsigned char. The C2X proposal
retains these initializations. This is also consistent with
existing differences for array initialization by a string literal
with a mismatched encoding prefix.
The Design Options section discusses these design decisions in more detail.
I intend to submit this revision to WG14 later this week. Any feedback
is appreciated.
Tom.
Received on 2021-05-30 20:33:25