C++ Logo

std-proposals

Advanced search

Re: Enhancement of std::regex

From: Nozomu Katō <noz.ka_at_[hidden]>
Date: Mon, 29 Jul 2019 19:56:06 +0900
Hello,

On Mon, 29 Jul 2019 02:11:00 +0000, Lyberta via Std-Proposals wrote:
> In my experience, anything that was based on char or wchar_t can't
> really support Unicode as a drop-in.

The proposed syntax option does not support char or wchar_t.

> Your text only mentions code points yet well-formed UTF can only contain
> scalar values and I'm a strong opponent of code point interfaces.

std::regex of C++ references RegExp of ECMAScript since the beginning
and my proposal is intended to import the new features of RegExp. As of
now RegExp does not support anything beyond the code point, such as the
Unicode normalization and the grapheme clusters.

If RegExp supports such features someday, I or someone might try to
re-enhance std::regex in accordance with a revised spec of RegExp. But
even if such features are supported, need for searching based on code
point would remain, because it is likely to be very slow to do searching
with doing normalization and/or considering grapheme clusters.

> Do you know how regex handles grapheme clusters? Is it ever valid to
> break a grapheme cluster when matching or replacing?

I have not devoted much time for that theme.

> In any case, feel free to get acquainted with SG16:
>
> https://github.com/sg16-unicode/sg16

I have been paying attention to the group since the previous
std-text-wg. I was thinking that I would consider to join if any topic
concerning regex would be raised.

--
Nozomu

Received on 2019-07-29 05:58:06