C++ Logo

sg16

Advanced search

Re: Core formal unicode classifications functions

From: Steve Downey <sdowney_at_[hidden]>
Date: Sun, 3 Sep 2023 20:59:21 -0400
On Sun, Sep 3, 2023 at 8:48 PM Thiago Macieira via SG16 <
sg16_at_[hidden]> wrote:

> On Sunday, 3 September 2023 16:47:39 PDT Steve Downey via SG16 wrote:
> > > Where do the names come from? Does Unicode define them, somewhere?
> >
> > ICU4C has used these forever, and they've leaked out into other APIs.
>
> Unfortunately, that's not a good enough reason. Though that plus the fact
> that
> they are better names may be.
>
> I know I don't mess up checking for lead before trail, and have to look up
high vs low. But I don't really have a strong opinion, just an appeal to
authority.

I do think it's important that I could write a replacement for whatever
decoder we provide with mostly the same underlying primitives. The standard
shouldn't withhold stable underlying interfaces without a good reason, and
UTF-8 and -16 are very stable.
These are the functions you need to have to write things like "advance to
next start, starting at this byte" and we might as well standardise them
rather than leave attractive nuisances in the detail:: namespace.
Which is how I ended up taking this on.


> Still, the documentation for those will say something like
>
> "is_leading_surrogate
> Returns true if the codepoint is a Unicode High Surrogate, false otherwise"
>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> Software Architect - Intel DCAI Cloud Engineering
>
>
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2023-09-04 00:59:33