C++ Logo

sg16

Advanced search

Re: Follow up on SG16 review of P2996R2 (Reflection for C++26)

From: Jens Maurer <jens.maurer_at_[hidden]>
Date: Fri, 3 May 2024 09:26:16 +0200
On 03/05/2024 01.10, Peter Dimov via SG16 wrote:
>> On Thu, May 2, 2024 at 11:25 PM Tom Honermann <tom_at_[hidden]
>> <mailto:tom_at_[hidden]> > wrote:
>>
>> The (well recognized) problem with iostreams is the implicit use of the
>> imbued locale. The consistent behavior for iostreams would be that inserters
>> and extractors for charN_t would transcode to the encoding of the imbued
>> locale.
>
> The _streams_ do not transcode using the codecvt facet of the imbued
> locale. The inserter of `char const*`, for instance, doesn't transcode using
> codecvt. It passes the NTCS to the streambuf as-is. (*)
>
> https://eel.is/c++draft/ostream.inserters.character#4
>
> The _streambuf_ then transcodes using the codecvt facet of the imbued
> locale.
>
> https://eel.is/c++draft/filebuf#general-7
> https://eel.is/c++draft/filebuf#virtuals-10

But only for filebufs (i.e. std::fstream). Is there any
evidence the standard prescribes that for std::cout as well?

> From the fact that everyone expects inserting an NTCS in the literal encoding
> to work:
>
> std::cout << "Hello, world!" << std::endl;
>
> we can deduce that the streambuf takes a character sequence in the
> literal encoding, which it then transcodes using codecvt.
>
> Therefore, the inserter needs to produce a character sequence in the literal
> encoding. It can't transcode to the final encoding using codecvt, because the
> streambuf will transcode a second time, ruining everything.

Agreed; the streambuf _might_ transcode, so the inserter can't / shouldn't
transcode, at least not to the final (imbued) encoding.

> Therefore, the inserters of `char8_t const*`, `char16_t const*` and `char32_t
> const*` need to transcode from UTF-8, UTF-16 and UTF-32, respectively, to a
> character sequence in the literal encoding (or a superset of it), which then to
> feed to the streambuf (which will then transcode using codecvt.)
>
> That's coincidentally exactly what inserting the result of std::format does
> after Corentin's proposed additions.
>
> (*) Well, technically it does "transcode" from char to the stream type using
> ctype::widen, but that's useless for multibyte encodings, so we can reasonably
> assume that widening a char to a char is the identity,

We say that in the std::locale section, I think.

> or a literal encoding of
> UTF-8 would stand no chance of working.
>
> So:
>
> input -> inserter -> character sequence in literal encoding -> streambuf ->
> output in final encoding determined by locale codecvt

Jens

Received on 2024-05-03 07:26:28