Subject: Re: [SG16-Unicode] std::byte based I/O library
From: Corentin (corentin.jabot_at_[hidden])
Date: 2019-02-07 12:41:22
On Thu, 7 Feb 2019 at 17:46 JeanHeyd Meneide <phdofthehouse_at_[hidden]>
> Dear SG16,
> There should be no "text IO". Binding attributes on stream objects to
> "handle text" is the mistake IO streams made; it absolutely muddled the
> concerns of IO streams and turned them into stateful messes on the same
> level of floating point and having to save / restore FP registers after
> each call or specific use in order to not break everything downstream /
> used after you made a few calls with specific "sticky" flags.
> If you want IO in your text library, make it a single, atomic call
> where all the information required to serialize the text properly is done
> in a single function. Combined state and mixed concerns makes for leaky
> abstractions, and they are the road to hell built on good intentions. Put
> all of the defaults for how you think formatting should be done in a single
> structure, then pass it as an object and its configuration into the
> serializer and save everyone the nightmare. Do not make the defaults
> invisible state and impossible to properly share.
> On Thu, Feb 7, 2019 at 9:04 AM Lyberta <lyberta_at_[hidden]> wrote:
>> > I think mixing text and binary in the same layer is the fatal flaw of
>> > iostream.
>> The problem is deeper. iostream doesn't have binary IO at all. You can
>> do .read and .write but those are something half-baked. Also you have
>> the madness of std::fpos which is std::ios::pos_type.
>> > A binary stream that offers more than a bag of bytes would suffer the
>> > issue as iostream.
>> No, there are a big usefullness of reading and writing as customization
>> points. Same as >> and << was done for text, just not as operators.
>> > Separation of text, locales and bytes at different layers is really
>> > important
>> Sure, you can probably do
>> std::io::write(stream, std::format(...));
>> to format text and write as bytes but you need a way to convert that
>> text to bytes.
>> > If you have a binary_stream object and you can somehow set a bom on it,
>> > which not only is text specific, but worse, Unicode specific, something
>> > gone wrong
>> Bom is not really different from endianness, floating point format and
>> other properties of low level types. I consider strings to be low level
>> because they are a link between binary layer and text layer.
>> > When reading, we can implicitly handle bom and UTF-X -> UTF-Y conversion
>> > and moves the state from the stream to the view
>> > view = stream | asUtfX
>> That feels weird, why would you suddenly convert stream to text? The
>> usual use case is to read some text and keep reading binary data after
>> that. Or... hmm, gonna recheck after libstdc++ gets ranges.
>> > when writing, having transcoding handle the bom let you handle the case
>> > where the text already has a bom without introducing a state on the
>> > text | toUtf8BOM >> stream
>> > text | toUtf16BOM(stream.endianness) >> stream
>> I think that using >> and << is bad idea. Also, why would you want to
>> transcode text before serializing? I think std::text should already
>> handle enough information. I think storing bytes of BOM inside std::text
>> is a terrible idea.
>> > But short of that stdin/out deal with bag of bytes and text abstractions
>> > should be
>> > layered on top rather than baked in.
>> Sure, the usual case is reading some string and then do text parsing on
>> top of it. std::scan as inverse of std::format?
>> > Another solution would be to have both text_stream and binary_stream,
>> > as you say, cout can be either depending on use cases
>> It should be trivial to construct text_stream from binary_stream. But,
>> do we really need separate classes? Consider
>> for binary io and
>> for text io. I know the names may be not the best. I think Isabella
>> Muerte proposed those names. I don't see a reason why a single stream
>> class can't implement both of those.
>> > Note that everything that applies to text probably applies to everything
>> > else
>> > Serialization of a given arbitrary type on a stream should depend on the
>> > specifics of the application rather than be a property of the stream.
>> In practice, there are usually very little different ways to serialize
>> built-in types. There's LEB128 for integers in WebAssembly and for that
>> reason I have LEB128::Integer class there but in 90% of cases there
>> exists an obvious way to serialize data and it should be default.
>> > stream.write(std::as_bytes(date));
>> > vs
>> > stream.write(date);
>> > The later is arguably nicer but gives no control to applications over
>> > things are serialized.
>> That's why you write custom classes if you want more control and
>> remember that you can't add you own member functions to std classes so
>> you need to alter your classes or provide non-members. My library will
>> automatically pick up classes with .Read and .Write member functions or
>> Read and Write non-members. That's why I propose std::io::read and
>> std::io::write as customization points.
>> SG16 Unicode mailing list
> SG16 Unicode mailing list
SG16 list run by herb.sutter at gmail.com