Subject: Re: [SG16-Unicode] std::byte based I/O library
From: Corentin (corentin.jabot_at_[hidden])
Date: 2019-02-07 06:23:03
I think mixing text and binary in the same layer is the fatal flaw of
A binary stream that offers more than a bag of bytes would suffer the same
issue as iostream.
Separation of text, locales and bytes at different layers is really
If you have a binary_stream object and you can somehow set a bom on it,
which not only is text specific, but worse, Unicode specific, something has
When reading, we can implicitly handle bom and UTF-X -> UTF-Y conversion
and moves the state from the stream to the view
view = stream | asUtfX
when writing, having transcoding handle the bom let you handle the case
where the text already has a bom without introducing a state on the stream.
text | toUtf8BOM >> stream
text | toUtf16BOM(stream.endianness) >> stream
> stdout is a special case
I don't believe it is, and I believe iostream should not have made this
Maybe consoles are a special case, and maybe the best way to handle them
would be to have an encoding-aware console API.
But short of that stdin/out deal with bag of bytes and text abstractions
layered on top rather than baked in.
Another solution would be to have both text_stream and binary_stream, but
as you say, cout can be either depending on use cases
Note that everything that applies to text probably applies to everything
Serialization of a given arbitrary type on a stream should depend on the
specifics of the application rather than be a property of the stream.
The later is arguably nicer but gives no control to applications over how
things are serialized.
On Thu, 7 Feb 2019 at 12:42 Lyberta <lyberta_at_[hidden]> wrote:
> > 1/ Text _is_ bytes. Unicode specifies BOM and little/big endian versions
> > so I think prepending a bom ( should you want to ) and byte-swapping can
> > done along transcoding
> Maybe separate BOM handling into a format state of the stream. Consider
> enum class std::bom_handling
> none, ///< Do not read or write BOM.
> not_utf8, ///< Read and write BOM only in UTf-16 and UTF-32.
> all ///< Read and write BOM in all 3 encoding forms.
> BinaryOutputStream& stream;
> This feels like a more elegant design because it separates concerns better.
> > 3/ stream don't have an explicit encoding ( or locale, for that matter ),
> > but they might expect one - think for example about writing to stdout.
> stdout is a special case and we have a way to handle it - execution
> character set. The only problem is that in order to go from Unicode to
> ECS right now you'd need to use std::codecvt which is a horrible mess.
> Still, there are some utilities that treat stdin/stdout as a stream of
> raw bytes so it doesn't always represent a text stream.
> SG16 Unicode mailing list
SG16 list run by herb.sutter at gmail.com