Date: Thu, 9 May 2024 19:14:34 +0000
The answer is actually neither.
There's an encoding here that was neglected of mention, that which your tee interface expects as an input. That is what it should use.
Writing data to a stream buff that is associated to a tee doesn't magically split it into 2 outputs with differing encodings (nor would it be able to synchronize with 2 different devices from the same buffer),
not least because the two other ends have different encodings, at least for one of them the following must occur:
1. The tee must be associated with a routine that must be called and run
2. The tee routine must read from the buffer, and transcode it into a separate buffer to service at least one of the inputs (if not both of them).
The fact that the tee splits one stream into two with the destination having completely different encodings, is an implementation detail that is only relevant to the tee.
The job of the std::ostream in this case is not to service a file or a console, but to service the tee.
If the best solution is to use utf-8 as the input to the tee because it has the best chances to transcode into multiple distinct encodings, then use that... to implement your tee.
That is what the tee should "tell" to std::ostream to use as its input. This is an implementation detail of the tee, not of std::ostream.
But if you implement your tee such that it expects something else as the encoding, then that is what you should use on your stream buffer.
The same way if your stream is not a tee, it should use whatever that stream expects.
And I fail to see how any of these makes things more or less testable.
You have inputs I, J, K, you have inputs Y and Z.
If you want to make sure it comes out right you are going to have to test the combination of all of them.
I have never heard of an application that only cares about if the input goes well as far as the stream buffer regardless of either or not the output still comes out correctly in the file or terminal.
Adding an extra encoding to the stream buffer can only make it so that things go wrong in more cases not less.
A > B > C implies A > C, but A > C does not imply A > B > C
-----Original Message-----
From: Peter Dimov <pdimov_at_[hidden]m>
Sent: Thursday, May 9, 2024 19:04
To: 'Tiago Freire' <tmiguelf_at_hotmail.com>; sg16_at_lists.isocpp.org; 'Victor Zverovich' <victor.zverovich_at_gmail.com>
Cc: 'Tom Honermann' <tom_at_honermann.net>; 'Faisal Vali' <faisalv_at_gmail.com>; 'Dan Katz' <dkatz85_at_bloomberg.net>; 'Barry Revzin' <barry.revzin_at_gmail.com>; 'Andrew Sutton' <andrew.n.sutton_at_gmail.com>; 'Daveed Vandevoorde' <daveed_at_edg.com>; 'Wyatt Childers' <wcc_at_edg.com>
Subject: RE: [isocpp-sg16] [SG16] Follow up on SG16 review of P2996R2 (Reflection for C++26)
> Tiago Freire wrote:
> > Why not have it match the output one?
>
> But I already answered this in my initial reply, and in my previous one.
OK, let's do it again.
void print( std::ostream& os )
{
os << "Hello";
os << u8", ";
os << L"world!";
}
Suppose `os` has a teebuf that outputs to a file and the terminal.
We have three input encodings:
- ordinary literal; varies at compile time, fixed at runtime; e.g. EBCDIC
- u8 literal; fixed at UTF-8, never varies
- wide literal; varies at compile time, fixed at runtime; e.g. some Japanese double-byte IBM encoding
and two output encodings:
- file encoding; varies at runtime; e.g. EUC-JP
- terminal encoding; varies at runtime; e.g. whatever the EBCDIC equivalent of EUC-JP is
Now at least two of the three inserters will need to transcode to the streambuf encoding. In the case of the latter being fixed at UTF-8, the u8 literal is passed through, and the other two are transcoded without loss of information and without dependence on the runtime environment.
The streambufs then transcode UTF-8 into the output encodings.
If we pick one of the two output encodings for the streambuf encoding, the conversions adapt accordingly. Note however that
(a) now all of them depend on the runtime environment and
(b) you now have a quadratic number of transcodings to test for the output1 -> output2 case.
For me, the superiority of the first approach from software engineering and testability perspective is obvious.
There's an encoding here that was neglected of mention, that which your tee interface expects as an input. That is what it should use.
Writing data to a stream buff that is associated to a tee doesn't magically split it into 2 outputs with differing encodings (nor would it be able to synchronize with 2 different devices from the same buffer),
not least because the two other ends have different encodings, at least for one of them the following must occur:
1. The tee must be associated with a routine that must be called and run
2. The tee routine must read from the buffer, and transcode it into a separate buffer to service at least one of the inputs (if not both of them).
The fact that the tee splits one stream into two with the destination having completely different encodings, is an implementation detail that is only relevant to the tee.
The job of the std::ostream in this case is not to service a file or a console, but to service the tee.
If the best solution is to use utf-8 as the input to the tee because it has the best chances to transcode into multiple distinct encodings, then use that... to implement your tee.
That is what the tee should "tell" to std::ostream to use as its input. This is an implementation detail of the tee, not of std::ostream.
But if you implement your tee such that it expects something else as the encoding, then that is what you should use on your stream buffer.
The same way if your stream is not a tee, it should use whatever that stream expects.
And I fail to see how any of these makes things more or less testable.
You have inputs I, J, K, you have inputs Y and Z.
If you want to make sure it comes out right you are going to have to test the combination of all of them.
I have never heard of an application that only cares about if the input goes well as far as the stream buffer regardless of either or not the output still comes out correctly in the file or terminal.
Adding an extra encoding to the stream buffer can only make it so that things go wrong in more cases not less.
A > B > C implies A > C, but A > C does not imply A > B > C
-----Original Message-----
From: Peter Dimov <pdimov_at_[hidden]m>
Sent: Thursday, May 9, 2024 19:04
To: 'Tiago Freire' <tmiguelf_at_hotmail.com>; sg16_at_lists.isocpp.org; 'Victor Zverovich' <victor.zverovich_at_gmail.com>
Cc: 'Tom Honermann' <tom_at_honermann.net>; 'Faisal Vali' <faisalv_at_gmail.com>; 'Dan Katz' <dkatz85_at_bloomberg.net>; 'Barry Revzin' <barry.revzin_at_gmail.com>; 'Andrew Sutton' <andrew.n.sutton_at_gmail.com>; 'Daveed Vandevoorde' <daveed_at_edg.com>; 'Wyatt Childers' <wcc_at_edg.com>
Subject: RE: [isocpp-sg16] [SG16] Follow up on SG16 review of P2996R2 (Reflection for C++26)
> Tiago Freire wrote:
> > Why not have it match the output one?
>
> But I already answered this in my initial reply, and in my previous one.
OK, let's do it again.
void print( std::ostream& os )
{
os << "Hello";
os << u8", ";
os << L"world!";
}
Suppose `os` has a teebuf that outputs to a file and the terminal.
We have three input encodings:
- ordinary literal; varies at compile time, fixed at runtime; e.g. EBCDIC
- u8 literal; fixed at UTF-8, never varies
- wide literal; varies at compile time, fixed at runtime; e.g. some Japanese double-byte IBM encoding
and two output encodings:
- file encoding; varies at runtime; e.g. EUC-JP
- terminal encoding; varies at runtime; e.g. whatever the EBCDIC equivalent of EUC-JP is
Now at least two of the three inserters will need to transcode to the streambuf encoding. In the case of the latter being fixed at UTF-8, the u8 literal is passed through, and the other two are transcoded without loss of information and without dependence on the runtime environment.
The streambufs then transcode UTF-8 into the output encodings.
If we pick one of the two output encodings for the streambuf encoding, the conversions adapt accordingly. Note however that
(a) now all of them depend on the runtime environment and
(b) you now have a quadratic number of transcodings to test for the output1 -> output2 case.
For me, the superiority of the first approach from software engineering and testability perspective is obvious.
Received on 2024-05-09 19:14:37