On my todo list for a couple months. I had some time to think about it, and want some feedback on what I'm thinking before getting too deep. What I believe we need are the collection of classification functions that only depend on the form of codepoints, or for UTF-16 and -8 code units. In ICU these are often macros, which for C++ should be inline functions. I believe these should be wide contract, and noexcept, which implies that, for example, an `is_high_surrogate` would return false for a char32_t above the code point range. ICU also has `safe` vs `` versions of macros, which I believe should be reversed today ( https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/utf8_8h.html )

I believe the functions should be in terms of char{8,16,32}_t, and that we leave byte in particular out until we deal with IO. wchar_t is at this point unportable, so I think it's not a good candidate either.

Areas for functions.
Codepoint classification
scalar value, code point value, validity, encoding length, high/low surrogate, BOM classification
char16_t
similar - nothing other than BOM miss for BE/LE UTF-16
char8_t
lead byte, trail byte, counts, etc (see ICU)