On 12/5/18 8:05 AM, Steve Downey wrote:
`codepoint` also, which is probably "just" a char32_t?
No, I think a type that isn't convertible from code unit types is desirable. (I'm interpreting your response as implying that 'codepoint' would just be a type alias of 'char32_t' as opposed to a distinct strong type)
Thinking about the std::isalnum example we discussed this week. The problem was that it was being called with code unit values, but its parameter type means something more like a code point. Code like the following is well-formed and follows current recommendations for correct use of std::isalnum, but is nevertheless incorrect for multibyte encodings that reuse valid leading code unit values as trailing code unit values (e.g.; Shift-JIS).
void f(const char *s) {
while (*s) {
if (std::isalnum(static_cast<unsigned char>(*s++)) {
...
}
}
}
Use of a distinct type for code points that is not implicitly convertible from a code unit type prevents these kinds of problems.
Tom.