C++ Logo

sg16

Advanced search

Re: [isocpp-sg16] Follow up on SG16 review of P2996R2 (Reflection for C++26)

From: Tiago Freire <tmiguelf_at_[hidden]>
Date: Thu, 9 May 2024 14:36:19 +0000

> But there's no single "input", there are three.

Which is just a number. Let's say the output encoding is EBCDIC


You want to do:
std::cout << u8"My string"

How many transcodings do you need?
1. You transcode utf8 > EBCDIC

What ends up in in the stream buffer?
EBCDIC


You want to do:
std::cout << u16"My string"

How many transcodings do you need?
1. You transcode utf16 > EBCDIC

What ends up in in the stream buffer?
EBCDIC

Do I need utf16 > utf8 > EBCDIC?
No


You want to do:
std::cout << u32"My string"

How many transcodings do you need?
1. You transcode utf32 > EBCDIC

What ends up in in the stream buffer?
EBCDIC

Do I need utf32 > utf8 > EBCDIC?
No


> You have to transcode to a single encoding which to pass to the stream buffer

Yes.

> and that single encoding is the intermediate one.

Why not have it match the output one?
Yes, this changes from stream to stream but so what? This is something that must occur regardless.


> So having fixed UTF-8 as that one single encoding has the advantages of being fixed (at compile time) and being able to represent each of the input three.
> Having it variable and runtime-dependent... does not.

We have established that the output encoding is not fixed at compile time. Do we agree with that?

So, in order for this alternative mechanism to work where you have an intermediate utf8 encoding (utf32 > utf8 > EBCDIC) where utf8 is what ends up in your stream buffer,
there must be a runtime conversion from utf8 to the output encoding (ex. EBCDIC). Do we agree with this statement?

If we agree with both of these statements why not just transcode before the data reaches the stream buffer as opposed to after?

What exactly looks different to the user when they do this:
std::cout << u16"My string"
if the transcoding occurs after data is set onto the underlying buffer as opposed to before?

I can tell you what is different if you do it before, you can do A > C when A > B > C doesn't work, do it after and you have no way out of this dilemma.

Can we agree that if the input encoding matches the output encoding that there should be no transcoding whatsoever? And that it should be able to print all characters of that encoding exactly as-is regardless of any character that cannot be mapped to utf8?

Received on 2024-05-09 14:36:23