Date: Sat, 25 Feb 2023 19:28:08 -0500
I think I agree that using the char32_t type as the Unicode scalar value
type that is the lingua franca for Unicode algorithms makes sense. The
advantages of a `scalar_value` would be a place to hang the contract off
of, to possibly provide a checked mode, and to signal that validation has
been done.
But, it would probably just encourage people to write a range cast, like
views::transform([](char32_t c){return scalar_value{c};}) which would help
no one.
Writing a validation view for scalar values is fairly trivial.
Unicode algorithms, however, must not over trust their input then. We
should not have UB in the std library. So that implies that any property
lookups for int32 s that aren't scalar values have to say NO and the
algorithms should handle that. Many do naturally. But we should not repeat
the mistakes of the original C character classification APIs.
type that is the lingua franca for Unicode algorithms makes sense. The
advantages of a `scalar_value` would be a place to hang the contract off
of, to possibly provide a checked mode, and to signal that validation has
been done.
But, it would probably just encourage people to write a range cast, like
views::transform([](char32_t c){return scalar_value{c};}) which would help
no one.
Writing a validation view for scalar values is fairly trivial.
Unicode algorithms, however, must not over trust their input then. We
should not have UB in the std library. So that implies that any property
lookups for int32 s that aren't scalar values have to say NO and the
algorithms should handle that. Many do naturally. But we should not repeat
the mistakes of the original C character classification APIs.
Received on 2023-02-26 00:28:20