I am seeking review feedback on a draft of N2653: char8_t: A type for UTF-8 characters and strings (Revision 1).  This paper revises an earlier paper, N2231, from 2018.

The revision is a rewrite of much of the original paper and follows the C++20 adoption of P0482R6.  The primary motivation is to maintain source code compatibility between C and C++.

Notable differences between what was adopted in C++20 and what is proposed for C2X in N2653 are:

  1. In C++20, char8_t is a fundamental type.  The C2X proposal is for a char8_t typedef name of unsigned char.  This is consistent with existing differences between the languages for wchar_t, char16_t, and char32_t.
  2. In C++20, a UTF-8 string literal may no longer be used to initialize an array of char, signed char, or unsigned char.  The C2X proposal retains these initializations.  This is also consistent with existing differences for array initialization by a string literal with a mismatched encoding prefix.

The Design Options section discusses these design decisions in more detail.

I intend to submit this revision to WG14 later this week.  Any feedback is appreciated.