C++ Logo

std-proposals

Advanced search

Re: [std-proposals] TBAA and extended floating-point types

From: Jan Schultke <janschultke_at_[hidden]>
Date: Sun, 24 Aug 2025 18:01:28 +0200
> Sorry, but the third one I disagree. I think it's putting the cart ahead
> of
> the oxen.
>
> We don't *need* the char8_t library APIs you're talking about. We've had
> those
> APIs with char for 50 years and all the existing text-processing libraries
> for
> the arbitrarily-encoded char already do exist that fill in this niche. We
> must
> support arbitrarily-encoded char anyway, so there's little incremental
> value
> in having an identical API that supplies the encoding by the type's name
> instead of an extra parameter somewhere.
>

If we had had those APIs already, adding the char8_t overloads would have
been quite trivial. Just reinterpret_cast to char in the println wrapper
that accepts char8_t or something, and you're good to go. In reality, we
don't have those APIs. We have some APIs that work with char and then you
need to go through OS-specific ways to manipulate some global locale to
make things work. There's no way to simply print a bunch of UTF-8 code
units, which is what we actually want.

We don't even have a cross-platform way to bundle char* with some enum
which specifies the encoding of the string. There's not a single function
in the C standard library that would let you bundle char* with encoding.

Given the total absence of any sound and portable functionality, given that
we're basically starting from scratch (not counting OS-specific
functionality), we may as well design it properly, with a type that is
intended purely for UTF-8 text, as compared to char, which is a bit of an
integer, a bit of a byte type, and a bit of a text type with unspecified
encoding, but ... maybe UTF-8. Why can't we have the sound design that
virtually every language after C++ managed to get right?


> Plus, all those APIs are currently inaccessible if you started your text's
> lifetime with char8_t. So I fear char8_t is locked into a
> chicken-and-the-egg
> problem: you can't use the existing char-based APIs with it, which means
> people won't use char8_t, which means there won't be new char8_t-based
> APIs...
>

Not every part of the problem is chicken-and-egg. Unicode transcoding,
character property tests, std::format, std::from_chars, and many other
utilities don't even require any OS-specific functionality.

And if we "already have" those APIs for UTF-8, then just use them, right?
You can reinterpret_cast char8_t* to char* just fine. You're making it
sound like we have the chicken already then.

Received on 2025-08-24 16:01:45