ISOCPP std-proposals List: Re: [std-proposals] charN_t (was: TBAA and extended floating-point types)

From: Thiago Macieira <thiago_at_[hidden]>
Date: Wed, 27 Aug 2025 10:44:52 -0700

On Wednesday, 27 August 2025 01:47:21 Pacific Daylight Time zxuiji wrote:
> Correct me if I'm wrong but isn't the purpose of the char8/16/32_t types not
> to guarantee the encoding used but that the types are unsigned and big
> enough for encodings using the respective amount of bits so that string
> literals like u8"...", u"..." and U"..." can map to a consistent type
> rather than the inconsistent wchar_t? If so then what's the issue? The
> types don't stop arbitrary bytes in files being read as X encoding, only
> convey to the compiler that you'll be working with at that many bytes at a
> time, making it easier to process the encoding in the code.

You're wrong.

We didn't need new types to have a type with the necessary bit widths, because
we already had them: uint_leastNN_t where NN is 8, 16 and 2. Those are the
types that the Standard specifies the charNN_t types should match in size and
representation.

It's true the compiler cannot enforce that the data pointed to by a charNN_t
pointer is properly encoded UTF-8/16/32, when it comes from the user. But by
convention, that's what the type is for: to indicate that it is encoded under
the expected meaning. Whether the called function will misbehave if the
requirement is violated or not is up to the implementation.

And like std::float64_t vs double, this allows us to create an overload set of

  f(const char *)
  f(const char8_t *)
  f(const char16_t *)

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
  Principal Engineer - Intel Platform & System Engineering

Received on 2025-08-27 17:45:09