ISOCPP sg16 List: Re: Follow up on SG16 review of P2996R2 (Reflection for C++26)

From: Peter Dimov <pdimov_at_[hidden]>
Date: Tue, 30 Apr 2024 19:24:09 +0300

> Jens Maurer wrote:
> > On 30/04/2024 13.32, Peter Dimov via SG16 wrote:
> > > Corentin Jabot wrote:
> > >> Very rough draft https://isocpp.org/files/papers/D3258R0.pdf
> > >
> > > Looks good. I'm however not sure that it makes sense to format
> > > char8_t (char16_t is borderline.)
> > >
> > > Or maybe that's only intended for {:?} ?
> > >
> > >> What about iostream?
> > >> This is a story for another paper (One that an enthusiastic reader
> > >> is encouraged to write!)
> > >
> > > Here's that paper:
> > >
> > > Insert at the end of
> > > https://eel.is/c++draft/ostream.inserters.character
> > > the following:
> > >
> > > template<class charT, class traits>
> > > basic_ostream<charT, traits>& operator<<(basic_ostream<charT,
> > > traits>& out, const char8_t* s); template<class charT, class traits>
> > > basic_ostream<charT, traits>& operator<<(basic_ostream<charT,
> > > traits>& out, const char16_t* s); template<class traits>
> > > basic_ostream<char, traits>& operator<<(basic_ostream<char,
> > > traits>& out, const char32_t* s);
> >
> > We should also (or maybe: only?) support u8string_view etc.
>
> Yes, probably. std::string_view and std::string don't support mixed output
> now, but of course it would make sense to add it, doing the same thing.

There's a subtlety here. Unicode char sequences must be treated as
a whole and processing them is not equivalent to processing each char_type
(code unit) sequentially. This is not an issue for char_type const* literals
and basic_string<char_type>, because we can assume that these hold an
entire string. But it becomes an issue for string_view, because you can in
principle split a string into two consecutive string_views in the middle of
an encoded code point.

I'm not overly concerned by this case breaking, but it's one way in which
string_views are a bit different.

Another peculiarity (which at present makes it impossible for wcout to
be able to output char const* even when both literal encodings are
Unicode) is that for char -> wchar_t the stream is obligated to convert
by using widen() which only supports 1->1 mapping (and not e.g. 3->2
needed for UTF-98 -> UTF-16.) But for the new overloads we can just
ignore the requirement to call widen() and defer to Corentin's section.

Received on 2024-04-30 16:24:15