Date: Wed, 27 Aug 2025 14:26:50 +0000
I.e. It's just convention.
char8_t just means "I expect the encoded bits be valid utf-8 encoding", but can not be enforced.
It doesn't have to be valid utf-8, but if some function in the standard has as an input char8_t it is just going to assume that it is utf-8
So it's partially using the type system to loosely inform about encoding.
-----Original Message-----
From: Std-Proposals <std-proposals-bounces_at_lists.isocpp.org> On Behalf Of Jason McKesson via Std-Proposals
Sent: Wednesday, August 27, 2025 16:22
To: std-proposals_at_[hidden]
Cc: Jason McKesson <jmckesson_at_gmail.com>
Subject: Re: [std-proposals] charN_t (was: TBAA and extended floating-point types)
On Wed, Aug 27, 2025 at 4:33 AM zxuiji via Std-Proposals <std-proposals_at_lists.isocpp.org> wrote:
>
> Correct me if I'm wrong but isn't the purpose of the char8/16/32_t types not to guarantee the encoding used but that the types are unsigned and big enough for encodings using the respective amount of bits so that string literals like u8"...", u"..." and U"..." can map to a consistent type rather than the inconsistent wchar_t? If so then what's the issue? The types don't stop arbitrary bytes in files being read as X encoding, only convey to the compiler that you'll be working with at that many bytes at a time, making it easier to process the encoding in the code.
While you *can* put whatever data you want into them, the assumption when using such types is that they represent valid data within that encoding. If you pass `std::filesystem::path`'s constructor a `char16_t const*`, it will assume that the string is a valid UTF-16-encoded string and undefined behavior will result if it is not.
The types themselves don't "guarantee" anything, but all of the functions and constructs that consume or generate them *do* make such guarantees/requirements. `u8` literals *will* be in UTF-8 or you get a compile error. Functions that take `char32_t`s should be expected to fail if you pass an invalid codepoint. Etc.
--
Std-Proposals mailing list
Std-Proposals_at_lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
char8_t just means "I expect the encoded bits be valid utf-8 encoding", but can not be enforced.
It doesn't have to be valid utf-8, but if some function in the standard has as an input char8_t it is just going to assume that it is utf-8
So it's partially using the type system to loosely inform about encoding.
-----Original Message-----
From: Std-Proposals <std-proposals-bounces_at_lists.isocpp.org> On Behalf Of Jason McKesson via Std-Proposals
Sent: Wednesday, August 27, 2025 16:22
To: std-proposals_at_[hidden]
Cc: Jason McKesson <jmckesson_at_gmail.com>
Subject: Re: [std-proposals] charN_t (was: TBAA and extended floating-point types)
On Wed, Aug 27, 2025 at 4:33 AM zxuiji via Std-Proposals <std-proposals_at_lists.isocpp.org> wrote:
>
> Correct me if I'm wrong but isn't the purpose of the char8/16/32_t types not to guarantee the encoding used but that the types are unsigned and big enough for encodings using the respective amount of bits so that string literals like u8"...", u"..." and U"..." can map to a consistent type rather than the inconsistent wchar_t? If so then what's the issue? The types don't stop arbitrary bytes in files being read as X encoding, only convey to the compiler that you'll be working with at that many bytes at a time, making it easier to process the encoding in the code.
While you *can* put whatever data you want into them, the assumption when using such types is that they represent valid data within that encoding. If you pass `std::filesystem::path`'s constructor a `char16_t const*`, it will assume that the string is a valid UTF-16-encoded string and undefined behavior will result if it is not.
The types themselves don't "guarantee" anything, but all of the functions and constructs that consume or generate them *do* make such guarantees/requirements. `u8` literals *will* be in UTF-8 or you get a compile error. Functions that take `char32_t`s should be expected to fail if you pass an invalid codepoint. Etc.
--
Std-Proposals mailing list
Std-Proposals_at_lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
Received on 2025-08-27 14:26:55