C++ Logo

sg16

Advanced search

Unicode streams

From: Lyberta <lyberta_at_[hidden]>
Date: Tue, 14 May 2019 07:26:00 +0000
So I'm slowly migrating my code base to char8_t and one of the biggest
features I miss is text streams.

I noticed that the last meeting was mostly focused on streams too so I
want to share a few thoughts.

I think the basic unit of text streams should be Unicode scalar value.
For my first iteration I plan to use virtual functions so actual
encoding would be completely hidden.

Since every scalar value would need to be encoded separately, in the
future I could just take InputRange to write and OutputRange to read,
then simply read or write scalar values one by one into them.

File IO will be done using byte streams proposed here:

https://github.com/Lyberta/cpp-io

Binary stream will be a private member and a wrapper class will
implement text IO on top of it.

First, the scalar value will be converted to code units, then for each
individual code unit there will be optional endianness conversion. The
same sequence of operations will be done in reverse when reading.

Of course, endianness concerns are only valid for byte IO. Non-byte IO
(like memory text streams) will be conceptually in terms of
std::vector<CodeUnit>.

I don't want to use operator<< and >> for IO since I think this is a bad
design. I plan to introduce customization point, something that can be
later standardized as std::unicode::read? Hmm, I already use "read" for
binary IO... What about "format" and "scan"... Hmm, std::format is
proposed for legacy text handling...


Received on 2019-05-14 09:27:20