C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] D1628R0 (Unicode character properties)

From: Steve Downey <sdowney_at_[hidden]>
Date: Thu, 28 Mar 2019 08:58:03 -0400
These are plumbing for writing unicode algorithms portably, such as regex.
I don't think they are something most users should interact with. As such,
I think the wide contract is appropriate. It's reasonable for the binary
properties to say false for non code point values.

I also, on reflection, think excluding char and wchar is misguided. The
contract is that char8 etc are utf-8 etc, but it's also frequently the case
that the execution narrow and wide encodings are also unicode. That you
have possible GIGO errors isn't a good reason to block possible correct
use. It just encourages casting.

On Thu, Mar 28, 2019, 05:10 Lyberta <lyberta_at_[hidden]> wrote:

> >> I guess, but do we really want our users to shove random integers in it
> >>
> >
> > Yes. I really want a wide contract there
>
> But... why?
>
> >> Yes, contract or invariant means strong type, not dumb char32_t
> >>
> >
> > TR 44 is purposefully dumb by design too.
>
> I guess it was written by people with more of a C mindset. I'm looking
> at std::chrono and love how I can never shove an integer there because
> it is ambiguous. Same with text - an integer is ambiguous without
> character set or encoding. I know this api has Unicode in its name
> but... I think I gotta try to come up with properties design that is
> compatible with my design and see if there are any bad points.
>
> Also, I know this is a bit obscure, but what about non-Unicode? I think
> having relatively universal free functions is fine and then if they get
> std::unicode_code_point as template parameter, they will select unicode
> implementation. Hence again, strong types are important.
>
> Also, consider std::ascii_character, std::shift_jis_something.. I don't
> know Shift-JIS. :/
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode
>

Received on 2019-03-28 13:58:16