Date: Fri, 3 May 2024 02:10:38 +0300
> On Thu, May 2, 2024 at 11:25 PM Tom Honermann <tom_at_[hidden]
> <mailto:tom_at_[hidden]> > wrote:
>
> The (well recognized) problem with iostreams is the implicit use of the
> imbued locale. The consistent behavior for iostreams would be that inserters
> and extractors for charN_t would transcode to the encoding of the imbued
> locale.
The _streams_ do not transcode using the codecvt facet of the imbued
locale. The inserter of `char const*`, for instance, doesn't transcode using
codecvt. It passes the NTCS to the streambuf as-is. (*)
https://eel.is/c++draft/ostream.inserters.character#4
The _streambuf_ then transcodes using the codecvt facet of the imbued
locale.
https://eel.is/c++draft/filebuf#general-7
https://eel.is/c++draft/filebuf#virtuals-10
>From the fact that everyone expects inserting an NTCS in the literal encoding
to work:
std::cout << "Hello, world!" << std::endl;
we can deduce that the streambuf takes a character sequence in the
literal encoding, which it then transcodes using codecvt.
Therefore, the inserter needs to produce a character sequence in the literal
encoding. It can't transcode to the final encoding using codecvt, because the
streambuf will transcode a second time, ruining everything.
Therefore, the inserters of `char8_t const*`, `char16_t const*` and `char32_t
const*` need to transcode from UTF-8, UTF-16 and UTF-32, respectively, to a
character sequence in the literal encoding (or a superset of it), which then to
feed to the streambuf (which will then transcode using codecvt.)
That's coincidentally exactly what inserting the result of std::format does
after Corentin's proposed additions.
(*) Well, technically it does "transcode" from char to the stream type using
ctype::widen, but that's useless for multibyte encodings, so we can reasonably
assume that widening a char to a char is the identity, or a literal encoding of
UTF-8 would stand no chance of working.
So:
input -> inserter -> character sequence in literal encoding -> streambuf ->
output in final encoding determined by locale codecvt
> <mailto:tom_at_[hidden]> > wrote:
>
> The (well recognized) problem with iostreams is the implicit use of the
> imbued locale. The consistent behavior for iostreams would be that inserters
> and extractors for charN_t would transcode to the encoding of the imbued
> locale.
The _streams_ do not transcode using the codecvt facet of the imbued
locale. The inserter of `char const*`, for instance, doesn't transcode using
codecvt. It passes the NTCS to the streambuf as-is. (*)
https://eel.is/c++draft/ostream.inserters.character#4
The _streambuf_ then transcodes using the codecvt facet of the imbued
locale.
https://eel.is/c++draft/filebuf#general-7
https://eel.is/c++draft/filebuf#virtuals-10
>From the fact that everyone expects inserting an NTCS in the literal encoding
to work:
std::cout << "Hello, world!" << std::endl;
we can deduce that the streambuf takes a character sequence in the
literal encoding, which it then transcodes using codecvt.
Therefore, the inserter needs to produce a character sequence in the literal
encoding. It can't transcode to the final encoding using codecvt, because the
streambuf will transcode a second time, ruining everything.
Therefore, the inserters of `char8_t const*`, `char16_t const*` and `char32_t
const*` need to transcode from UTF-8, UTF-16 and UTF-32, respectively, to a
character sequence in the literal encoding (or a superset of it), which then to
feed to the streambuf (which will then transcode using codecvt.)
That's coincidentally exactly what inserting the result of std::format does
after Corentin's proposed additions.
(*) Well, technically it does "transcode" from char to the stream type using
ctype::widen, but that's useless for multibyte encodings, so we can reasonably
assume that widening a char to a char is the identity, or a literal encoding of
UTF-8 would stand no chance of working.
So:
input -> inserter -> character sequence in literal encoding -> streambuf ->
output in final encoding determined by locale codecvt
Received on 2024-05-02 23:10:43