C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] std::byte based I/O library

From: JeanHeyd Meneide <phdofthehouse_at_[hidden]>
Date: Thu, 7 Feb 2019 11:45:48 -0500
Dear SG16,

     There should be no "text IO". Binding attributes on stream objects to
"handle text" is the mistake IO streams made; it absolutely muddled the
concerns of IO streams and turned them into stateful messes on the same
level of floating point and having to save / restore FP registers after
each call or specific use in order to not break everything downstream /
used after you made a few calls with specific "sticky" flags.

     If you want IO in your text library, make it a single, atomic call
where all the information required to serialize the text properly is done
in a single function. Combined state and mixed concerns makes for leaky
abstractions, and they are the road to hell built on good intentions. Put
all of the defaults for how you think formatting should be done in a single
structure, then pass it as an object and its configuration into the
serializer and save everyone the nightmare. Do not make the defaults
invisible state and impossible to properly share.

Sincerely,
ThePhD


On Thu, Feb 7, 2019 at 9:04 AM Lyberta <lyberta_at_[hidden]> wrote:

> Corentin:
> > I think mixing text and binary in the same layer is the fatal flaw of
> > iostream.
>
> The problem is deeper. iostream doesn't have binary IO at all. You can
> do .read and .write but those are something half-baked. Also you have
> the madness of std::fpos which is std::ios::pos_type.
>
> > A binary stream that offers more than a bag of bytes would suffer the
> same
> > issue as iostream.
>
> No, there are a big usefullness of reading and writing as customization
> points. Same as >> and << was done for text, just not as operators.
>
> > Separation of text, locales and bytes at different layers is really
> > important
>
> Sure, you can probably do
>
> std::io::write(stream, std::format(...));
>
> to format text and write as bytes but you need a way to convert that
> text to bytes.
>
> > If you have a binary_stream object and you can somehow set a bom on it,
> > which not only is text specific, but worse, Unicode specific, something
> has
> > gone wrong
>
> Bom is not really different from endianness, floating point format and
> other properties of low level types. I consider strings to be low level
> because they are a link between binary layer and text layer.
>
> > When reading, we can implicitly handle bom and UTF-X -> UTF-Y conversion
> > and moves the state from the stream to the view
> >
> > view = stream | asUtfX
>
> That feels weird, why would you suddenly convert stream to text? The
> usual use case is to read some text and keep reading binary data after
> that. Or... hmm, gonna recheck after libstdc++ gets ranges.
>
> > when writing, having transcoding handle the bom let you handle the case
> > where the text already has a bom without introducing a state on the
> stream.
> >
> > text | toUtf8BOM >> stream
> > text | toUtf16BOM(stream.endianness) >> stream
>
> I think that using >> and << is bad idea. Also, why would you want to
> transcode text before serializing? I think std::text should already
> handle enough information. I think storing bytes of BOM inside std::text
> is a terrible idea.
>
> > But short of that stdin/out deal with bag of bytes and text abstractions
> > should be
> > layered on top rather than baked in.
>
> Sure, the usual case is reading some string and then do text parsing on
> top of it. std::scan as inverse of std::format?
>
> >
> > Another solution would be to have both text_stream and binary_stream, but
> > as you say, cout can be either depending on use cases
>
> It should be trivial to construct text_stream from binary_stream. But,
> do we really need separate classes? Consider
>
> std::io::read
> std::io::write
>
> for binary io and
>
> std::io::format
> std::io::scan
>
> for text io. I know the names may be not the best. I think Isabella
> Muerte proposed those names. I don't see a reason why a single stream
> class can't implement both of those.
>
> > Note that everything that applies to text probably applies to everything
> > else
> > Serialization of a given arbitrary type on a stream should depend on the
> > specifics of the application rather than be a property of the stream.
>
> In practice, there are usually very little different ways to serialize
> built-in types. There's LEB128 for integers in WebAssembly and for that
> reason I have LEB128::Integer class there but in 90% of cases there
> exists an obvious way to serialize data and it should be default.
>
> > stream.write(std::as_bytes(date));
> > vs
> > stream.write(date);
> >
> > The later is arguably nicer but gives no control to applications over how
> > things are serialized.
>
> That's why you write custom classes if you want more control and
> remember that you can't add you own member functions to std classes so
> you need to alter your classes or provide non-members. My library will
> automatically pick up classes with .Read and .Write member functions or
> Read and Write non-members. That's why I propose std::io::read and
> std::io::write as customization points.
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode
>

Received on 2019-02-07 17:46:05