Date: Sat, 30 Aug 2025 21:33:02 +0100
I'm not saying APIs can't assume unicode usage of those types, that'd be
documented anyways. What I'm saying is that the default assumption that the
type can *only* be used for unicode is wrong. For example could make APIs
that specifically use char16_t for wrapping iconv() and the
MultByteToWideChar so that older encodings can be used during compile time
and still have a consistent ABI across systems instead of the 2 vs 4 bytes
crap we have with wchar_t.
On Wed, 27 Aug 2025 at 18:45, Thiago Macieira <thiago_at_[hidden]> wrote:
> On Wednesday, 27 August 2025 01:47:21 Pacific Daylight Time zxuiji wrote:
> > Correct me if I'm wrong but isn't the purpose of the char8/16/32_t types
> not
> > to guarantee the encoding used but that the types are unsigned and big
> > enough for encodings using the respective amount of bits so that string
> > literals like u8"...", u"..." and U"..." can map to a consistent type
> > rather than the inconsistent wchar_t? If so then what's the issue? The
> > types don't stop arbitrary bytes in files being read as X encoding, only
> > convey to the compiler that you'll be working with at that many bytes at
> a
> > time, making it easier to process the encoding in the code.
>
> You're wrong.
>
> We didn't need new types to have a type with the necessary bit widths,
> because
> we already had them: uint_leastNN_t where NN is 8, 16 and 2. Those are the
> types that the Standard specifies the charNN_t types should match in size
> and
> representation.
>
> It's true the compiler cannot enforce that the data pointed to by a
> charNN_t
> pointer is properly encoded UTF-8/16/32, when it comes from the user. But
> by
> convention, that's what the type is for: to indicate that it is encoded
> under
> the expected meaning. Whether the called function will misbehave if the
> requirement is violated or not is up to the implementation.
>
> And like std::float64_t vs double, this allows us to create an overload
> set of
>
> f(const char *)
> f(const char8_t *)
> f(const char16_t *)
>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> Principal Engineer - Intel Platform & System Engineering
>
documented anyways. What I'm saying is that the default assumption that the
type can *only* be used for unicode is wrong. For example could make APIs
that specifically use char16_t for wrapping iconv() and the
MultByteToWideChar so that older encodings can be used during compile time
and still have a consistent ABI across systems instead of the 2 vs 4 bytes
crap we have with wchar_t.
On Wed, 27 Aug 2025 at 18:45, Thiago Macieira <thiago_at_[hidden]> wrote:
> On Wednesday, 27 August 2025 01:47:21 Pacific Daylight Time zxuiji wrote:
> > Correct me if I'm wrong but isn't the purpose of the char8/16/32_t types
> not
> > to guarantee the encoding used but that the types are unsigned and big
> > enough for encodings using the respective amount of bits so that string
> > literals like u8"...", u"..." and U"..." can map to a consistent type
> > rather than the inconsistent wchar_t? If so then what's the issue? The
> > types don't stop arbitrary bytes in files being read as X encoding, only
> > convey to the compiler that you'll be working with at that many bytes at
> a
> > time, making it easier to process the encoding in the code.
>
> You're wrong.
>
> We didn't need new types to have a type with the necessary bit widths,
> because
> we already had them: uint_leastNN_t where NN is 8, 16 and 2. Those are the
> types that the Standard specifies the charNN_t types should match in size
> and
> representation.
>
> It's true the compiler cannot enforce that the data pointed to by a
> charNN_t
> pointer is properly encoded UTF-8/16/32, when it comes from the user. But
> by
> convention, that's what the type is for: to indicate that it is encoded
> under
> the expected meaning. Whether the called function will misbehave if the
> requirement is violated or not is up to the implementation.
>
> And like std::float64_t vs double, this allows us to create an overload
> set of
>
> f(const char *)
> f(const char8_t *)
> f(const char16_t *)
>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> Principal Engineer - Intel Platform & System Engineering
>
Received on 2025-08-30 20:18:50