C++ Logo

std-proposals

Advanced search

Re: [std-proposals] TBAA and extended floating-point types

From: Thiago Macieira <thiago_at_[hidden]>
Date: Sun, 24 Aug 2025 09:10:13 -0500
On Sunday, 24 August 2025 01:45:15 Central Daylight Time Jan Schultke wrote:
> Secondly, char has all sorts of special properties such as strict
> aliasing relaxations that are totally unnecessary for text, which has
> some negative performance implications.

That is true.

> Thirdly, the standard library (and various other hugely portable
> libraries) will never adopt the model of char encoding being
> completely arbitrary or char being UTF-8. The former model makes text
> processing unimplementable (how do you specify is_digit(char) if char
> can be encoded in infinite ways?) or at least requires you to assume
> or specify encoding as an extra parameter case-by-case. Either option
> is bad. The latter model would make the C++ standard library
> incompatible with EBCDIC platforms, and that's not happening either.
>
> It's clear that we need a char8_t type which clearly communicates to
> APIs that we are working with UTF-8 code units. I expect that the
> char8_t support in the standard library and in OS APIs will improve
> over time; even C has a char8_t type (alias) now, and char8_t will
> become a whole lot more usable over time. It's not going to happen
> overnight.

Sorry, but the third one I disagree. I think it's putting the cart ahead of
the oxen.

We don't *need* the char8_t library APIs you're talking about. We've had those
APIs with char for 50 years and all the existing text-processing libraries for
the arbitrarily-encoded char already do exist that fill in this niche. We must
support arbitrarily-encoded char anyway, so there's little incremental value
in having an identical API that supplies the encoding by the type's name
instead of an extra parameter somewhere.

Plus, all those APIs are currently inaccessible if you started your text's
lifetime with char8_t. So I fear char8_t is locked into a chicken-and-the-egg
problem: you can't use the existing char-based APIs with it, which means
people won't use char8_t, which means there won't be new char8_t-based APIs...

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
  Principal Engineer - Intel Platform & System Engineering

Received on 2025-08-24 14:10:25