C++ Logo

std-proposals

Advanced search

Re: [std-proposals] TBAA and extended floating-point types

From: Jan Schultke <janschultke_at_[hidden]>
Date: Sun, 24 Aug 2025 08:45:15 +0200
> In any case, that doesn't require a char8_t type to exist. We could reduce to
> two cases: char is arbitrarily encoded (and the encoding information is kept
> out of band somewhere) or char is UTF8. That's what we've lived with for 20
> years since the transition to UTF8 began, and I don't see char8_t's existence
> helping move the needle here. From my experience, it's hurt more than helped.

No, we couldn't. char is a terrible type in general, and even if the
literal encoding is UTF-8, that doesn't really obsolete char8_t.

Firstly, char* is being used for strings with execution character
encoding in various OS APIs, or for file paths (which are simply byte
sequences, not even text necessarily). Even if the platform uses UTF-8
as a default and string literals are UTF-8, you still end up using
char* all over the place for things that are not UTF-8 at all.

Secondly, char has all sorts of special properties such as strict
aliasing relaxations that are totally unnecessary for text, which has
some negative performance implications.

Thirdly, the standard library (and various other hugely portable
libraries) will never adopt the model of char encoding being
completely arbitrary or char being UTF-8. The former model makes text
processing unimplementable (how do you specify is_digit(char) if char
can be encoded in infinite ways?) or at least requires you to assume
or specify encoding as an extra parameter case-by-case. Either option
is bad. The latter model would make the C++ standard library
incompatible with EBCDIC platforms, and that's not happening either.

It's clear that we need a char8_t type which clearly communicates to
APIs that we are working with UTF-8 code units. I expect that the
char8_t support in the standard library and in OS APIs will improve
over time; even C has a char8_t type (alias) now, and char8_t will
become a whole lot more usable over time. It's not going to happen
overnight.

Received on 2025-08-24 06:45:30