Date: Mon, 29 Jul 2019 02:11:00 +0000
Nozomu KatÅ via Std-Proposals:
> Hello,
>
> I am preparing a proposal document to enhance std::regex:
> http://www.akenotsuki.com/misc/srell/en/proposal/draft.html
>
> The document is almost done except the Technical Specifications section.
> But this proposal needs a presenter as I cannot attend a face-to-face
> meeting. If you like this proposal and can attend face-to-face committee
> meetings of C++, please contact me.
>
> Regards,
> Nozomu
>
In my experience, anything that was based on char or wchar_t can't
really support Unicode as a drop-in.
I'm working on my own Unicode proposal that ditches any "execution
character set" nonsense and provides clean interface:
https://github.com/Lyberta/cpp-unicode
Regex is on the very end of the list, we don't have even means to
iterate scalar values, grapheme clusters or to read or write Unicode to
files, so going straight to regex feels strange.
Your text only mentions code points yet well-formed UTF can only contain
scalar values and I'm a strong opponent of code point interfaces.
Do you know how regex handles grapheme clusters? Is it ever valid to
break a grapheme cluster when matching or replacing?
In any case, feel free to get acquainted with SG16:
https://github.com/sg16-unicode/sg16
> Hello,
>
> I am preparing a proposal document to enhance std::regex:
> http://www.akenotsuki.com/misc/srell/en/proposal/draft.html
>
> The document is almost done except the Technical Specifications section.
> But this proposal needs a presenter as I cannot attend a face-to-face
> meeting. If you like this proposal and can attend face-to-face committee
> meetings of C++, please contact me.
>
> Regards,
> Nozomu
>
In my experience, anything that was based on char or wchar_t can't
really support Unicode as a drop-in.
I'm working on my own Unicode proposal that ditches any "execution
character set" nonsense and provides clean interface:
https://github.com/Lyberta/cpp-unicode
Regex is on the very end of the list, we don't have even means to
iterate scalar values, grapheme clusters or to read or write Unicode to
files, so going straight to regex feels strange.
Your text only mentions code points yet well-formed UTF can only contain
scalar values and I'm a strong opponent of code point interfaces.
Do you know how regex handles grapheme clusters? Is it ever valid to
break a grapheme cluster when matching or replacing?
In any case, feel free to get acquainted with SG16:
https://github.com/sg16-unicode/sg16
Received on 2019-07-28 21:13:26