C++ Logo

std-proposals

Advanced search

Re: [std-proposals] TBAA and extended floating-point types

From: Jason McKesson <jmckesson_at_[hidden]>
Date: Sun, 24 Aug 2025 12:01:06 -0400
On Sun, Aug 24, 2025 at 10:10 AM Thiago Macieira via Std-Proposals
<std-proposals_at_[hidden]> wrote:
>
> On Sunday, 24 August 2025 01:45:15 Central Daylight Time Jan Schultke wrote:
> > Secondly, char has all sorts of special properties such as strict
> > aliasing relaxations that are totally unnecessary for text, which has
> > some negative performance implications.
>
> That is true.
>
> > Thirdly, the standard library (and various other hugely portable
> > libraries) will never adopt the model of char encoding being
> > completely arbitrary or char being UTF-8. The former model makes text
> > processing unimplementable (how do you specify is_digit(char) if char
> > can be encoded in infinite ways?) or at least requires you to assume
> > or specify encoding as an extra parameter case-by-case. Either option
> > is bad. The latter model would make the C++ standard library
> > incompatible with EBCDIC platforms, and that's not happening either.
> >
> > It's clear that we need a char8_t type which clearly communicates to
> > APIs that we are working with UTF-8 code units. I expect that the
> > char8_t support in the standard library and in OS APIs will improve
> > over time; even C has a char8_t type (alias) now, and char8_t will
> > become a whole lot more usable over time. It's not going to happen
> > overnight.
>
> Sorry, but the third one I disagree. I think it's putting the cart ahead of
> the oxen.
>
> We don't *need* the char8_t library APIs you're talking about. We've had those
> APIs with char for 50 years and all the existing text-processing libraries for
> the arbitrarily-encoded char already do exist that fill in this niche. We must
> support arbitrarily-encoded char anyway, so there's little incremental value
> in having an identical API that supplies the encoding by the type's name
> instead of an extra parameter somewhere.
>
> Plus, all those APIs are currently inaccessible if you started your text's
> lifetime with char8_t. So I fear char8_t is locked into a chicken-and-the-egg
> problem: you can't use the existing char-based APIs with it, which means
> people won't use char8_t, which means there won't be new char8_t-based APIs...

You can cast a `char8_t*` string to a `char*` for those APIs. You
can't do it the other way around though.

Received on 2025-08-24 16:01:20