If you are *really* bored, I guess you could take a gander at
I didn't tag them SG-16 because... well, they are trying to tie up loose ends in lexing. Not very unicode-y.
I do have some plan to bring forth a paper sometime before the end of the year for a number of (non-tailored) unicode algorithms (grapheme clusterization, word segmentation, casing, normalization in the initial work, once we agree on the shape of this things we can add more depending on needs and resources) and related facilities, and more generally how to design these things.
I would however not feel comfortable to present anything not backed by implementation and it is, as you can imagine, a lot of work.
But the good news is, I do have a design in my head and I think it will work, and be realistically standardizable - if and only if I get a lot of help with the specification, once we agree on direction.
I hope you'll like it :)
There are things I do consider out of scope of my current effort, but that needs to get done if we want to land on a good story in the end.
* uN support in fmt - this is a critical bit of infrastructure but it has some implementation and specification challenges, namely it's not implementable on platforms not supporting unicode in some form.
* An explicit cast char*<->char8_t*. This is outside of my area of expertise but a zero-cost, contiguous utf view over char* known to be utf8 would be nice. without it the best we can do is random access and cheap but not free.
* Generally improve the library support for u8. We won't see either love and adoption until we can offer usability.
* I'm not touching transcoding with a 10 feet pole, I trust that this will happen with JeanHeyd's work.
* Thinking about tailoring, unicode/cldr locales, localized numbers and dates formatting. Personally I think this is out of scope for 26 and I'm kinda hoping libicu 4x matures.
But there are questions worth asking. Namely can C++ mandate a dependency on a lib like icu/icu4x, share a common implementation for all vendors, or are we not concerned about the implementation burden of that? Because I am. I think a locale object compliant with unicode is necessary, but if implementers can't take dependencies, an implementation of the cldr seems... asking too much.
I'm already not really comfortable with the implementation burden for non tailored things (and at the same time unwilling to make the design amenable to ICU).
* If you folks think text processing would need some rope-like structure, I'm not sure it's text related but... I guess you could talk about that!
* Some unicode algorithms are unbounded, and may require allocation. A small_vector would help specification, but again that doesn't seem very in scope of this group.
* We need *some* unicode properties, I'm not sure which to be honest. My current intent is to only provide things that would be generally useful outside of the algorithms that are otherwise provided.
Other random things in my mind
* char_traits specialization depreciation?
* What the heck do we do with ctypes and std::locale?
* can we mandate, in the meantime, that on UTF-8 platforms the locale at the start of main is "C.UTF-8"?
* what the heck do we do with regex?
* Is there interest in pursuing the work of trying to get rid of the term character in the core language to designate code points and code units or does that only annoy me? Can we refer to Unicode directly?
With some optimism, a bit of luck and a lot of tears we might get to have some usable unicode in 26?
I hope this inspires you.