Thank you for the feedback.

On Thu, Aug 29, 2019 at 10:59 AM Niall Douglas <s_sourceforge@nedprod.com> wrote:

> 1) Submit pull requests to JeanHeyd for features that you need. His
> target is C++23 of course and I at least don't want to see him get
> distracted with supporting earlier standards. If we need that to
> evaluate the design, so be it, but please try and assist where
> possible. I see this work as SG16's #1 priority for C++23.

If he breaks the requisite parts out into a standalone git repo, I'll be
happy to submit issue requests to that.

In my opinion, every WG21 library proposal should come with a reference
implementation in a standalone, highly reusable, submodularisable git
repo. I know that I don't *quite* achieve that myself, but I think it
particularly important to avoid bundling reference implementation in
with lots of other unrelated stuff.

People won't lock in dependencies on bundles, too ripe for breaking
change. They like self contained, single purpose, orthogonal git repos.

The implementation needs a bit more work. Itprovides a bit more than what the paper says, but needs more work before its ready to use. I plan to spend the next 4 months working on it, if Grant Proposals and University Class Proposals work out. Otherwise, it will be the usual free time work improved and be hopefully production ready near midsummer 2020.

That being said, it is going to be by no way shippable with something like LLFIO in the next month or so. Apologies.

> 2) Write papers proposing changes to the interface that you would need
> (e.g., deterministic exceptions). Papers will help us more quickly
> evaluate and consider changes. Any such papers should include
> motivation, examples, and proposed changes (preferably wording if
> applicable as well). But you know all that of course ;)

Alas my plate is overflowing. I have four new papers in progress, such
is my overload that none of those four will make Belfast.

Same.

>> 3. Anywhere where you might throw an exception, you need to use a
>> deterministic alternative like error_code, Outcome, etc.
>
> I think this needs a paper.

I gotta be blunt here, if text reencoding throws exceptions, it has the
wrong design.

Error handling in the text API is done with your choice of error handler (the default uses the replacement character to encode errors into the text itself). You never throw unless you are:

A) working with a dynamic container (e.g., std::basic_string/std::vector)

B) you ask for it by injecting the throw_error_handler directly

We can make a `herbception_throw_handler` as a test drive when Herb's implementation makes progress. The base decoding objects are constexpr-capable as well. Unfortunately, this means that intrinsic use on MSVC is out of the question, because they have not implemented std::is_constant_evaluated, nor have any internal compiler tricks for it. GCC and clang should be okay here. (In MSVC, that just means I need a TEXT_LIB_CONSTEXPR macro, which defaults to nothingness in VC++. I will prefer this approach heavily because WideCharToMultiByte is the easy way to get fast performance on Windows and that's not constexpr-blessed.)

>> 5. cmake build system suitable for reuse from a cmake superproject (i.e.
>> targets only modern cmake, exports the targets consumed by the
>> superproject)
>
> This seems like something that you or others could help out with.

I can give advice.

The text implementation is currently buried in my "bag of literally everything" library (ThePhD/phd) where I put incubating ideas. When they're ready, they get moved out into standalone repositories (e.g., https://github.com/ThePhD/out_ptr, https://github.com/ThePhD/itsy_bitsy). Text is going to be moved out into its own library soon, with its own CMake file that can be add_subdirectory'd in with little to no hassle. I haven't quite gotten INSTALL targets perfectly right on either of those 2 libraries, so I'll need to learn how to make the CMake loop easy.

>> 6. Any public headers should not include anything "heavy" like Ranges,
>> Variant etc as some of my user base like to include LLFIO headers in
>> global headers. String is acceptable, as LLFIO includes <filesystem>.
>
> This library is fundamentally ranges based. I don't see any way to
> avoid dependencies on ranges in the interface; that is something we
> explicitly desire.

Unfortunately Ranges-based libraries would be a showstopper for me.

It is for most people. I use ranges mostly to avoid redefining concept checks and for ranges::begin/ranges::end and other ADL-based shenanigans. Personally, lowering the requirement on the API is going to be better than concept checking and more; I'm going to be manually implementing the ADL extension points I need and probably including a `text/span` polyfill to erase the fact that most implementations don't have a span. (Or letting the user provide their own, but every minute spent trying to do polyfill garbage is less time spent writing good Unicode abstractions.) The goal will likely be C++17 to start, then down to C++14 later on if I can do so without compromising the performance and algorithmic capabilities for most encodings.

The problem is their current hefty impact on build times. Most of my
clients, and consumers of my libraries, therefore ban Ranges in their
codebases.

Even long term, I can see outright bans on Ranges in header files
continuing until compilers bake Ranges into the language. So expect a
substantial subset of C++ users refusing to use *any* Ranges based
library, including standard libraries, for at least a decade or more to
come.

Once again, being blunt here, I can't see why text reencoding needs
anything more than span<T> and basic_string_view<T>. Ranges
auto-understands both of those. Again, if the design really needs
Ranges, it has to be the wrong design.

Iterators (and their child, Ranges) have been the backbone of libraries communicating with one another since Stepanov got it right. Hardcoding span/string_view as the communication medium is the fundamentally wrong choice, because it fundamentally excludes a wide range of use cases, including Boost.Text's unencoded_rope, libstdc++'s and SGI's __gnu_cxx::rope<T> storage, gap_buffer implementations, ring_buffer implementations on the network, and more.

My implementation of this library will most certainly specialize when it detects you have handed it contiguous iterators/ranges and do "the fast thing" when it can, but by no means should pointers be the layer of abstraction that text encoding is built on. At the least, RandomAccessIterators are useful, but most of the unicode algorithms "base implementation" can be done with Forward or Bidirectional Iterators. We need to have an implementation that works for most things, and lean on our library implementers to do QoI. Not that it will be left entirely to Standard Library maintainers: a critical part of the paper is allowing users to specialize encode, decode and transcode calls with their own (potentially ADL, though I really loathe ADL) customization points to be far more tailored to their encodings and use cases.

This does not necessitate std::ranges or range-v3: most of what is necessary are simple adl_(c)begin, adl_(c)end, and maybe adl_size free functions. Checks like "is this a contiguous iterator" can just be backported hardcodings of what is speified as concepts and requirements in the C++20 Working Draft. It's work-around-able,

but every workaround is another day of mucking around playing "how many features did this compiler not implement under /std:c++14 | -std=c++14".

Sincerely,

JeanHeyd