I think mixing text and binary in the same layer is the fatal flaw of iostream.
A binary stream that offers more than a bag of bytes would suffer the same issue as iostream.

Separation of text, locales and bytes at different layers is really important

If you have a binary_stream object and you can somehow set a bom on it, which not only is text specific, but worse, Unicode specific, something has gone wrong

When reading, we can implicitly handle bom and UTF-X -> UTF-Y conversion
and moves the state from the stream to the view

view = stream | asUtfX

when writing, having transcoding handle the bom let you handle the case where the text already has a bom without introducing a state on the stream.

text | toUtf8BOM >> stream 
text | toUtf16BOM(stream.endianness) >> stream 

> stdout is a special case
I don't believe it is, and I believe iostream should not have made this assumption.

Maybe consoles are a special case, and maybe the best way to handle them would be to have an encoding-aware console API.

But short of that stdin/out deal with bag of bytes and text abstractions should be
layered on top rather than baked in.

Another solution would be to have both text_stream and binary_stream, but as you say, cout can be either depending on use cases


Note that everything that applies to text probably applies to everything else
Serialization of a given arbitrary type on a stream should depend on the specifics of the application rather than be a property of the stream.

stream.write(std::as_bytes(date));
vs
stream.write(date);

The later is arguably nicer but gives no control to applications over how things are serialized. 



On Thu, 7 Feb 2019 at 12:42 Lyberta <lyberta@lyberta.net> wrote:
Corentin:
> 1/  Text _is_ bytes. Unicode specifies BOM and little/big endian versions
> so I think prepending a bom ( should you want to ) and byte-swapping can be
> done along transcoding

Maybe separate BOM handling into a format state of the stream. Consider
this:

enum class std::bom_handling
{
        none, ///< Do not read or write BOM.
        not_utf8, ///< Read and write BOM only in UTf-16 and UTF-32.
        all ///< Read and write BOM in all 3 encoding forms.
};

BinaryOutputStream& stream;
stream.GetFormat().SetBOMHandling(std::bom_handling::all);

This feels like a more elegant design because it separates concerns better.

>
> 3/ stream don't have an explicit encoding ( or locale, for that matter ),
> but they might expect one - think for example about writing to stdout.

stdout is a special case and we have a way to handle it - execution
character set. The only problem is that in order to go from Unicode to
ECS right now you'd need to use std::codecvt which is a horrible mess.

Still, there are some utilities that treat stdin/stdout as a stream of
raw bytes so it doesn't always represent a text stream.

_______________________________________________
SG16 Unicode mailing list
Unicode@isocpp.open-std.org
http://www.open-std.org/mailman/listinfo/unicode