Date: Thu, 9 May 2024 20:04:20 +0300
> Tiago Freire wrote:
> > Why not have it match the output one?
>
> But I already answered this in my initial reply, and in my previous one.
OK, let's do it again.
void print( std::ostream& os )
{
os << "Hello";
os << u8", ";
os << L"world!";
}
Suppose `os` has a teebuf that outputs to a file and the terminal.
We have three input encodings:
- ordinary literal; varies at compile time, fixed at runtime; e.g. EBCDIC
- u8 literal; fixed at UTF-8, never varies
- wide literal; varies at compile time, fixed at runtime; e.g. some Japanese double-byte IBM encoding
and two output encodings:
- file encoding; varies at runtime; e.g. EUC-JP
- terminal encoding; varies at runtime; e.g. whatever the EBCDIC equivalent of EUC-JP is
Now at least two of the three inserters will need to transcode to
the streambuf encoding. In the case of the latter being fixed at UTF-8,
the u8 literal is passed through, and the other two are transcoded
without loss of information and without dependence on the runtime
environment.
The streambufs then transcode UTF-8 into the output encodings.
If we pick one of the two output encodings for the streambuf
encoding, the conversions adapt accordingly. Note however that
(a) now all of them depend on the runtime environment and
(b) you now have a quadratic number of transcodings to test
for the output1 -> output2 case.
For me, the superiority of the first approach from software
engineering and testability perspective is obvious.
> > Why not have it match the output one?
>
> But I already answered this in my initial reply, and in my previous one.
OK, let's do it again.
void print( std::ostream& os )
{
os << "Hello";
os << u8", ";
os << L"world!";
}
Suppose `os` has a teebuf that outputs to a file and the terminal.
We have three input encodings:
- ordinary literal; varies at compile time, fixed at runtime; e.g. EBCDIC
- u8 literal; fixed at UTF-8, never varies
- wide literal; varies at compile time, fixed at runtime; e.g. some Japanese double-byte IBM encoding
and two output encodings:
- file encoding; varies at runtime; e.g. EUC-JP
- terminal encoding; varies at runtime; e.g. whatever the EBCDIC equivalent of EUC-JP is
Now at least two of the three inserters will need to transcode to
the streambuf encoding. In the case of the latter being fixed at UTF-8,
the u8 literal is passed through, and the other two are transcoded
without loss of information and without dependence on the runtime
environment.
The streambufs then transcode UTF-8 into the output encodings.
If we pick one of the two output encodings for the streambuf
encoding, the conversions adapt accordingly. Note however that
(a) now all of them depend on the runtime environment and
(b) you now have a quadratic number of transcodings to test
for the output1 -> output2 case.
For me, the superiority of the first approach from software
engineering and testability perspective is obvious.
Received on 2024-05-09 17:04:24