C++ Logo

std-proposals

Advanced search

Re: [std-proposals] TBAA and extended floating-point types

From: Tom Honermann <tom_at_[hidden]>
Date: Sat, 23 Aug 2025 20:55:14 -0400
On 8/19/25 8:09 PM, Thiago Macieira via Std-Proposals wrote:
> On Tuesday, 19 August 2025 13:47:53 Pacific Daylight Time Lénárd Szolnoki via
> Std-Proposals wrote:
>> I think this is also a major hurdle in adopting char8_t when a ton of
>> existing libraries already store utf8 strings in char arrays.
> Echoing this part. char8_t by itself is useless to me: all char arrays are
> UTF-8 if they contain text. I don't need a different type to tell me that and a
> different type only makes things worse if everything where I need it is based
> on char (std::string,std::string_view, C API, QByteArray, etc.)
>
> u8"" producing char8_t string literals is a net negative in the Standard (in
> my opinion) because it made u8"" useless. Some previous code still compiles
> with C++20 and some does not, and the rules for which are not obvious. I'm not
> vocal about it because I have no use for u8"": for all I care and support, ""
> produces UTF-8.

For what it is worth, these concerns are well known in SG16 and were at
the time that char8_t was adopted.

As you know, there is a long history of C++ implementations that do not
target ASCII-based platforms. The typical C++ developer does not use
these implementations nor directly write code for their respective
target platforms, but these implementations remain relevant to the
computing ecosystem and general C++ marketplace. char8_t (and char16_t
and char32_t) enable code that reads/writes text in UTF encodings to be
portable across all C++ implementations. It was never the expectation
that all code would be migrated to char8_t.

The lack of proper support for the char/N/_t encodings in the C++
standard library is acknowledged. We are working towards improving the
situation, but progress has been slow. I with I had more time/energy to
personally devote to such improvements.

There are a few programming models actively deployed in the C++
ecosystem for handling of UTF-8 today.

  * char is assumed to be some implementation-defined character
    encoding, probably something derived from ASCII though not
    necessarily UTF-8, but is also used for UTF-8. Programs that use
    this model must track character encoding themselves and, in general,
    don't do a great job of it. This is the historic programming model
    and remains the dominant model (note that modern POSIX
    implementations still respect locale settings that affect the choice
    of character encoding).
  * char is assumed to be some implementation-defined character
    encoding, char8_t (or wchar_t or char16_t) is used as an internal
    encoding with transcoding performed at program boundaries (system,
    Win32, I/O, etc...). This is the traditional programming model for
    Windows. char8_t provides an alternative to wchar_t with portable
    semantics.
  * char is assumed to be UTF-8. This is very common on Linux and macOS.
    Support for this has improved in the Windows ecosystem, but rough
    corners remain. Note that compiling with MSVC's /utf-8 option is not
    sufficient by itself to enable an assumption of UTF-8 for text held
    in char-based storage; the environment encoding, console encoding,
    and locale region settings also have to be considered.

These all remain valid programming models and the right choice remains
dependent on project goals. The C++ ecosystem is not a monoculture.

Tom.

>
> This is almost the same forstd::float32_t andstd::float64_t, though there's an
> important difference: those types don't promote.
>
>

Received on 2025-08-24 00:55:17