C++ Logo

sg16

Advanced search

Re: Follow up on SG16 review of P2996R2 (Reflection for C++26)

From: Peter Dimov <pdimov_at_[hidden]>
Date: Fri, 3 May 2024 02:10:38 +0300
> On Thu, May 2, 2024 at 11:25 PM Tom Honermann <tom_at_[hidden]
> <mailto:tom_at_[hidden]> > wrote:
>
> The (well recognized) problem with iostreams is the implicit use of the
> imbued locale. The consistent behavior for iostreams would be that inserters
> and extractors for charN_t would transcode to the encoding of the imbued
> locale.

The _streams_ do not transcode using the codecvt facet of the imbued
locale. The inserter of `char const*`, for instance, doesn't transcode using
codecvt. It passes the NTCS to the streambuf as-is. (*)

https://eel.is/c++draft/ostream.inserters.character#4

The _streambuf_ then transcodes using the codecvt facet of the imbued
locale.

https://eel.is/c++draft/filebuf#general-7
https://eel.is/c++draft/filebuf#virtuals-10

>From the fact that everyone expects inserting an NTCS in the literal encoding
to work:

std::cout << "Hello, world!" << std::endl;

we can deduce that the streambuf takes a character sequence in the
literal encoding, which it then transcodes using codecvt.

Therefore, the inserter needs to produce a character sequence in the literal
encoding. It can't transcode to the final encoding using codecvt, because the
streambuf will transcode a second time, ruining everything.

Therefore, the inserters of `char8_t const*`, `char16_t const*` and `char32_t
const*` need to transcode from UTF-8, UTF-16 and UTF-32, respectively, to a
character sequence in the literal encoding (or a superset of it), which then to
feed to the streambuf (which will then transcode using codecvt.)

That's coincidentally exactly what inserting the result of std::format does
after Corentin's proposed additions.

(*) Well, technically it does "transcode" from char to the stream type using
ctype::widen, but that's useless for multibyte encodings, so we can reasonably
assume that widening a char to a char is the identity, or a literal encoding of
UTF-8 would stand no chance of working.

So:

input -> inserter -> character sequence in literal encoding -> streambuf ->
  output in final encoding determined by locale codecvt

Received on 2024-05-02 23:10:43