C++ Logo

SG16

Advanced search

Subject: Re: Changes to Unicode UTS #18, Unicode Regular Expressions, level 3 (tailoring) dropped.
From: Peter Brett (pbrett_at_[hidden])
Date: 2021-01-25 10:59:37


Hi Tom,

Thanks for writing this up for the mailing list.

Another way of interpreting this change is:

A UTS-18 regex implementation is not locale-dependent.

Since many of the current challenges in the current std::regex are related to locale-dependent behaviour, this represents an opportunity for a huge simplification in specification and implementation.

Best regards,

                               Peter

From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Tom Honermann via SG16
Sent: 22 January 2021 15:45
To: SG16 <sg16_at_[hidden]>
Cc: Tom Honermann <tom_at_[hidden]>; Hana Dusíková <hanicka_at_[hidden]>
Subject: [SG16] Changes to Unicode UTS #18, Unicode Regular Expressions, level 3 (tailoring) dropped.

EXTERNAL MAIL

Thanks to Peter Brett for pointing this out to me the other day. The latest revision of the Unicode UTS#18<https://urldefense.com/v3/__https:/www.unicode.org/reports/tr18/__;!!EHscmS1ygiU1lA!TItgaoWRtRefL389PG6ln_FEbfBqdRoKf9n7evvhQDHQmkyoQ6LRMXJBUE1O7A$> covering regular expressions has some significant changes. In particular, the level 3 specification covering tailoring has been dropped.

The current version of the UTS is available at https://www.unicode.org/reports/tr18/tr18-21.html$>.

Diff marks relative to the prior version can be viewed at
https://www.unicode.org/reports/tr18/tr18-20.html$>.

These changes appear to have been approved at UTC meeting #162 in May, 2020. Minutes are available at
https://www.unicode.org/L2/L2020/20015.htm$>; search for "UTS #18". The relevant papers covering the changes appear to be:

  * L2/19-364R2<
https://urldefense.com/v3/__http:/www.unicode.org/L2/L2019/19364r2-feedback-uts18.pdf__;!!EHscmS1ygiU1lA!TItgaoWRtRefL389PG6ln_FEbfBqdRoKf9n7evvhQDHQmkyoQ6LRMXJhviPplw$>, Feedback on Proposed Update UTS #18, Markus Scherer, 2010-10-08.
  * L2/20-017<https://urldefense.com/v3/__http:/www.unicode.org/L2/L2020/20017-pubrev.html__;!!EHscmS1ygiU1lA!TItgaoWRtRefL389PG6ln_FEbfBqdRoKf9n7evvhQDHQmkyoQ6LRMXKdIe0_yg$>, Comments on Public Review Issues.

     * See the link to issue 404<https://urldefense.com/v3/__https:/www.unicode.org/review/pri404/feedback.html__;!!EHscmS1ygiU1lA!TItgaoWRtRefL389PG6ln_FEbfBqdRoKf9n7evvhQDHQmkyoQ6LRMXJgo6iyNA$>.

  * L2/20-032<https://urldefense.com/v3/__http:/www.unicode.org/L2/L2020/20032-uts18-20-draft.pdf__;!!EHscmS1ygiU1lA!TItgaoWRtRefL389PG6ln_FEbfBqdRoKf9n7evvhQDHQmkyoQ6LRMXLKm1jREA$>, Document for review, Mark Davis and Andy Heninger, 2019-11-08.
  * L2/20-051<https://urldefense.com/v3/__http:/www.unicode.org/L2/L2020/20051-uts18-fdbk-response.pdf__;!!EHscmS1ygiU1lA!TItgaoWRtRefL389PG6ln_FEbfBqdRoKf9n7evvhQDHQmkyoQ6LRMXKSzAnLHw$>, Response to feedback on UTS18, Mark Davis, 2020-01-14.

The upshot for us is that, when we get back around to looking at std::regex replacements, we won't have to spend time worrying about support for tailoring.

Tom.



SG16 list run by sg16-owner@lists.isocpp.org