I think mixing text and binary in the same layer is the fatal flaw of iostream.
A binary stream that offers more than a bag of bytes would suffer the same issue as iostream.
Separation of text, locales and bytes at different layers is really important
If you have a binary_stream object and you can somehow set a bom on it, which not only is text specific, but worse, Unicode specific, something has gone wrong
When reading, we can implicitly handle bom and UTF-X -> UTF-Y conversion
and moves the state from the stream to the view
view = stream | asUtfX
when writing, having transcoding handle the bom let you handle the case where the text already has a bom without introducing a state on the stream.
text | toUtf8BOM >> stream
text | toUtf16BOM(stream.endianness) >> stream
> stdout is a special case
I don't believe it is, and I believe iostream should not have made this assumption.
Maybe consoles are a special case, and maybe the best way to handle them would be to have an encoding-aware console API.
But short of that stdin/out deal with bag of bytes and text abstractions should be
layered on top rather than baked in.
Another solution would be to have both text_stream and binary_stream, but as you say, cout can be either depending on use cases
Note that everything that applies to text probably applies to everything else
Serialization of a given arbitrary type on a stream should depend on the specifics of the application rather than be a property of the stream.
The later is arguably nicer but gives no control to applications over how things are serialized.
> 1/ Text _is_ bytes. Unicode specifies BOM and little/big endian versions
> so I think prepending a bom ( should you want to ) and byte-swapping can be
> done along transcoding
Maybe separate BOM handling into a format state of the stream. Consider
enum class std::bom_handling
none, ///< Do not read or write BOM.
not_utf8, ///< Read and write BOM only in UTf-16 and UTF-32.
all ///< Read and write BOM in all 3 encoding forms.
This feels like a more elegant design because it separates concerns better.
> 3/ stream don't have an explicit encoding ( or locale, for that matter ),
> but they might expect one - think for example about writing to stdout.
stdout is a special case and we have a way to handle it - execution
character set. The only problem is that in order to go from Unicode to
ECS right now you'd need to use std::codecvt which is a horrible mess.
Still, there are some utilities that treat stdin/stdout as a stream of
raw bytes so it doesn't always represent a text stream.
SG16 Unicode mailing list