Subject: Re: [boost] [review] [text] Text formal review
From: Zach Laine (whatwasthataddress_at_[hidden])
Date: 2020-06-22 02:10:50
On Sun, Jun 21, 2020 at 9:16 AM JeanHeyd Meneide via SG16
Thanks for the review, JeanHeyd.
> This is a review for the Boost.Text library, submitted a day late (but
> hopefully not a dollar short! (U.S. colloquialism, don't mind me!)).
> The library has 3 somewhat related but (somewhat?) separable
> sub-libraries. In "building block" order, these are:
> - A string layer (a new std::string)
> - A unicode layer (algorithms and data)
> - A text layer (string, but if it gave a single flying crap about Unicode)
> Layer 2
> This is the layer I am -- on a library design level and a personal
> philosophy level -- the most opposed to.
> But my answer is still to accept it (well, modulo it being based on
> the above string type. Please just use std::string).
That seems to be the consensus.
> [[ text ]] [[ rope ]]
> While these containers can be evaluated individually, other reviews
> have picked up a great deal of pickings at them and so I won't bother.
> There was some grumbling about how a rope-like data structure is not
> interesting enough to be included and I will just quietly wave that
> off as "my use case is the only use case that matters and therefore I
> don't care about other people's invariants or needs".
> There are many implicitly (and explicitly) stated and maintained
> opinions in this layer:
> - UTF-8 is the way, truth, and life.
> - Unicode is the only encoding that matters ever, for all time, in perpetuity.
> - Allocators are shit!
> - NFC is probably the best we can do here for varying reasons.
> - Who needs non-contiguous storage anyways?
> - Who needs non-SBO storage, anyways?
> These are all opinions, many of which are present in the design of the
> text container. And they allow this text container to ship. But that
> lack of flexibility -- while okay for Qt or Apple's CoreText or
> whatever other platform-specific hoo-ha you want to get involved with
> -- does not help. In fact, it cuts them off: more than one person
> during Meeting C++ spoke to me of Boost.Text and said it could not
> meet their needs because it maintained encoding or normalization
> invariants that did not interoperate with their existing system.
> Storage is also an issue: while "I use boost::text::string underneath"
> is fine and dandy, many systems (next to none, maybe?) are going to
> speak in "text" or its related string type. They will want the
> underlying container to speak to. For duck-type purposes, it works.
> But for everyone else, it fails.
> Since the string layer uses an `int` for its size and capacity, it is
> lopsidedly incompatible with existing STL's implementations of string,
> to the point that a reinterpret_cast -- however evil -- is not
> suitable for transporting a reference-without-copy into these APIs.
When text::text changes to use std::string, the size_type will
naturally have to change to size_t. I intend to change the size_type
of the others to be internally consistent. In for a penny, in for a
> God bless string_view and its friends, because it allows us to at
> least continue to talk to some APIs since the text type guarantees
> contiguous storage. This means that at the boundaries of an
> application -- or even as a plugin to a wider ecosystem -- I am paying
> a (sometimes obscene) cost to interoperate between
> std::string/llvm::SmallString/unicode_code_unit_sequence and all the
> other things people have developed to sit between them and what they
> believe their string needs are. And while it is whack that so many of
> these classes exist,
> they do.
> That lack of interoperability -- and once again, the lack of an
> allocator template parameter -- hampers this library from COMPLETELY
> DOMINATING the string scene. It will always be used as a solution,
> maybe even 80% of the time. Those seeking more will have to figure out
> how to build their own UTF16 containers, or their own special-encoded
> containers, with very little support from the text library (save for
> some transcoding functions they can leverage, but only from specific
> Unicode encodings).
I hope and expect that the idea floated earlier in the review -- of
adding Unicode-invariant-maintaining free functions -- will make
implementing your own types all but trivial.
SG16 list run by firstname.lastname@example.org