Date: Wed, 27 Aug 2025 10:22:06 -0400
On Wed, Aug 27, 2025 at 4:33 AM zxuiji via Std-Proposals
<std-proposals_at_[hidden]> wrote:
>
> Correct me if I'm wrong but isn't the purpose of the char8/16/32_t types not to guarantee the encoding used but that the types are unsigned and big enough for encodings using the respective amount of bits so that string literals like u8"...", u"..." and U"..." can map to a consistent type rather than the inconsistent wchar_t? If so then what's the issue? The types don't stop arbitrary bytes in files being read as X encoding, only convey to the compiler that you'll be working with at that many bytes at a time, making it easier to process the encoding in the code.
While you *can* put whatever data you want into them, the assumption
when using such types is that they represent valid data within that
encoding. If you pass `std::filesystem::path`'s constructor a
`char16_t const*`, it will assume that the string is a valid
UTF-16-encoded string and undefined behavior will result if it is not.
The types themselves don't "guarantee" anything, but all of the
functions and constructs that consume or generate them *do* make such
guarantees/requirements. `u8` literals *will* be in UTF-8 or you get a
compile error. Functions that take `char32_t`s should be expected to
fail if you pass an invalid codepoint. Etc.
<std-proposals_at_[hidden]> wrote:
>
> Correct me if I'm wrong but isn't the purpose of the char8/16/32_t types not to guarantee the encoding used but that the types are unsigned and big enough for encodings using the respective amount of bits so that string literals like u8"...", u"..." and U"..." can map to a consistent type rather than the inconsistent wchar_t? If so then what's the issue? The types don't stop arbitrary bytes in files being read as X encoding, only convey to the compiler that you'll be working with at that many bytes at a time, making it easier to process the encoding in the code.
While you *can* put whatever data you want into them, the assumption
when using such types is that they represent valid data within that
encoding. If you pass `std::filesystem::path`'s constructor a
`char16_t const*`, it will assume that the string is a valid
UTF-16-encoded string and undefined behavior will result if it is not.
The types themselves don't "guarantee" anything, but all of the
functions and constructs that consume or generate them *do* make such
guarantees/requirements. `u8` literals *will* be in UTF-8 or you get a
compile error. Functions that take `char32_t`s should be expected to
fail if you pass an invalid codepoint. Etc.
Received on 2025-08-27 14:22:18