On Tue, Sep 8, 2020 at 7:21 PM Jens Maurer via SG16 <sg16@lists.isocpp.org> wrote:
Also, I'd like to point out that Unicode apparently
has expressly declared control characters as
out-of-scope (because control characters are not
related to glyphs at all, I guess), but C++ does
expressly recognize several control characters
during lexing ("new-line", "whitespace") as well as
in string-literals.  This feels a bit like an
impedance mismatch.

To be clear, Unicode does assign semantics to a subset of control codes (see table 23-1 of Unicode 13) and the only ones that C++ recognises that are not part of that subset are backspace (\b) and alert (\a). AFAICT C++ doesn't really assign semantics to them, it just allows them to be used in string and character literals.

But still, I don't think this is an issue. See section 23.1 of Unicode 13:

> In  general,  the  use  of  control  codes  constitutes  a  higher-level  protocol  and  is  beyond  the scope of the Unicode Standard. For example, the use of ISO/IEC 6429 control sequences for controlling bidirectional formatting would be a legitimate higher-level protocol layered on top of the plain text of the Unicode Standard. Higher-level protocols are not specifiedby the Unicode Standard; their existence cannot be assumed without a separate agreement between the parties interchanging such data.

The way I see it, even if \b and \a do have semantics in C++, the C++ standard is an instance of such a "separate agreement", and thus can define any "higher-level protocol" for their use.