Date: Thu, 9 May 2024 03:26:19 +0300
Tiago Freire wrote:
> That is why I think this model:
>
> > input encoding -> (program uses intermediate UTF-8 throughout) -> output
> encoding
>
> is misguided, that middle step doesn’t actually solve anything, it just
> introduces an extra middleman where more things can go wrong.
It's not misguided at all.
If we designed std::cout today, we'd make it do exactly that. In
`std::cout << x;`, the inserter (operator<<) would serialize `x` to
a sequence of characters in an encoding that can represent
everything (i.e. UTF-8), the stream would then pass that UTF-8
to the stream buffer, the stream buffer would then transcode
to the output encoding and write it out.
We can't achieve this today because of backward compatibility,
but we _can_ achieve it in the special case when the literal
encoding is UTF-8. So we should do that.
Similarly, we can achieve it in the wide case when the wide
literal encoding is UTF-16 or UTF-32, and we should try to do
that, as well.
> That is why I think this model:
>
> > input encoding -> (program uses intermediate UTF-8 throughout) -> output
> encoding
>
> is misguided, that middle step doesn’t actually solve anything, it just
> introduces an extra middleman where more things can go wrong.
It's not misguided at all.
If we designed std::cout today, we'd make it do exactly that. In
`std::cout << x;`, the inserter (operator<<) would serialize `x` to
a sequence of characters in an encoding that can represent
everything (i.e. UTF-8), the stream would then pass that UTF-8
to the stream buffer, the stream buffer would then transcode
to the output encoding and write it out.
We can't achieve this today because of backward compatibility,
but we _can_ achieve it in the special case when the literal
encoding is UTF-8. So we should do that.
Similarly, we can achieve it in the wide case when the wide
literal encoding is UTF-16 or UTF-32, and we should try to do
that, as well.
Received on 2024-05-09 00:26:23