<div dir="ltr">On my todo list for a couple months. I had some time to think about it, and want some feedback on what I&#39;m thinking before getting too deep. What I believe we need are the collection of classification functions that only depend on the form of codepoints, or for UTF-16 and -8 code units. In ICU these are often macros, which for C++ should be inline functions. I believe these should be wide contract, and noexcept, which implies that, for example, an `is_high_surrogate` would return false for a char32_t above the code point range. ICU also has `safe` vs `` versions of macros, which I believe should be reversed today ( <a href="https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/utf8_8h.html">https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/utf8_8h.html</a> )<br><br>I believe the functions should be in terms of char{8,16,32}_t, and that we leave byte in particular out until we deal with IO. wchar_t is at this point unportable, so I think it&#39;s not a good candidate either. <br><br>Areas for functions. <br>Codepoint classification<br>    scalar value, code point value, validity, encoding length, high/low surrogate, BOM classification<br>char16_t <br>    similar - nothing other than BOM miss for BE/LE UTF-16<br>char8_t <br>     lead byte, trail byte, counts, etc (see ICU)<br><br></div>

