sg16: Re: [SG16-Unicode] std::byte based I/O library

From: Lyberta <lyberta_at_[hidden]>
Date: Thu, 07 Feb 2019 14:04:00 +0000

Corentin:
> I think mixing text and binary in the same layer is the fatal flaw of
> iostream.

The problem is deeper. iostream doesn't have binary IO at all. You can
do .read and .write but those are something half-baked. Also you have
the madness of std::fpos which is std::ios::pos_type.

> A binary stream that offers more than a bag of bytes would suffer the same
> issue as iostream.

No, there are a big usefullness of reading and writing as customization
points. Same as >> and << was done for text, just not as operators.

> Separation of text, locales and bytes at different layers is really
> important

Sure, you can probably do

std::io::write(stream, std::format(...));

to format text and write as bytes but you need a way to convert that
text to bytes.

> If you have a binary_stream object and you can somehow set a bom on it,
> which not only is text specific, but worse, Unicode specific, something has
> gone wrong

Bom is not really different from endianness, floating point format and
other properties of low level types. I consider strings to be low level
because they are a link between binary layer and text layer.

> When reading, we can implicitly handle bom and UTF-X -> UTF-Y conversion
> and moves the state from the stream to the view
>
> view = stream | asUtfX

That feels weird, why would you suddenly convert stream to text? The
usual use case is to read some text and keep reading binary data after
that. Or... hmm, gonna recheck after libstdc++ gets ranges.

> when writing, having transcoding handle the bom let you handle the case
> where the text already has a bom without introducing a state on the stream.
>
> text | toUtf8BOM >> stream
> text | toUtf16BOM(stream.endianness) >> stream

I think that using >> and << is bad idea. Also, why would you want to
transcode text before serializing? I think std::text should already
handle enough information. I think storing bytes of BOM inside std::text
is a terrible idea.

> But short of that stdin/out deal with bag of bytes and text abstractions
> should be
> layered on top rather than baked in.

Sure, the usual case is reading some string and then do text parsing on
top of it. std::scan as inverse of std::format?

>
> Another solution would be to have both text_stream and binary_stream, but
> as you say, cout can be either depending on use cases

It should be trivial to construct text_stream from binary_stream. But,
do we really need separate classes? Consider

std::io::read
std::io::write

for binary io and

std::io::format
std::io::scan

for text io. I know the names may be not the best. I think Isabella
Muerte proposed those names. I don't see a reason why a single stream
class can't implement both of those.

> Note that everything that applies to text probably applies to everything
> else
> Serialization of a given arbitrary type on a stream should depend on the
> specifics of the application rather than be a property of the stream.

In practice, there are usually very little different ways to serialize
built-in types. There's LEB128 for integers in WebAssembly and for that
reason I have LEB128::Integer class there but in 90% of cases there
exists an obvious way to serialize data and it should be default.

> stream.write(std::as_bytes(date));
> vs
> stream.write(date);
>
> The later is arguably nicer but gives no control to applications over how
> things are serialized.

That's why you write custom classes if you want more control and
remember that you can't add you own member functions to std classes so
you need to alter your classes or provide non-members. My library will
automatically pick up classes with .Read and .Write member functions or
Read and Write non-members. That's why I propose std::io::read and
std::io::write as customization points.

Received on 2019-02-07 15:04:44