sg16: Re: [SG16] Draft of d2071 - Named escape sequences

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Wed, 20 Oct 2021 11:09:29 +0200

On Wed, Oct 20, 2021, 10:44 Peter Brett via SG16 <sg16_at_[hidden]>
wrote:

> Hi Steve,
>
>
>
> Does the paper need any updates now that the \u{xxxx} syntax for universal
> character names is likely to be in C++23?
>
>
>
I think only the wording is chapter 11 (modify lex.charset) is needed. This
is independently of \u, except they modify the same place.

In addition the grammar in lex.charset need to be modified to reflect that
a named-escape-sequence is a universal-character-name (the paper doesn't do
that)
This is necessary and sufficient to allow \N to be used everywhere \u \U
and \u{} can be.

While i am there, i would like to plea that the matching algorithm is
reconsidered, we have no implementation experience with strict, case
sensitive matching, and i believe this strict matching is user hostile.

Thanks,
>
>
>
> Peter
>
>
>
> *From:* SG16 <sg16-bounces_at_[hidden]> *On Behalf Of *Steve Downey
> via SG16
> *Sent:* 20 October 2021 03:47
> *To:* SG16 <sg16_at_[hidden]>
> *Cc:* Steve Downey <sdowney_at_[hidden]>
> *Subject:* [SG16] Draft of d2071 - Named escape sequences
>
>
>
> EXTERNAL MAIL
>
>
>
> Find attached a draft of Named universal character escapes. It provides
> rebased wording based on EWGs last directions, alternatively provides
> wording for UCNs rather than escapes, in light of p2290 Delimited Escape
> Sequences, and additionally emphasizes we are choosing to ignore how the
> Unicode standard recommends that Names should be matched against the
> Unicode Database, and provides a wording alternative for that.
>
> Wording has not been reviewed, but doesn't change significantly from past
> versions, except that changes to 5.13.5 [lex.string] paragraph 14
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/lex.string*14__;Kysj!!EHscmS1ygiU1lA!RXWdmk3flleynfL18yroZC5qNwXH66IJ9vhSeUFvH2IQ92gsTQo3icbIApyLwA$>
> : no longer seem to be necessary, as the new machinery takes care of
> everything. I believe.
>
>
>
> We have a slot in EWG on the 7th, unless something has changed, for this
> paper. I can do more wording work after this, although the primary goal, in
> my opinion, is to get design agreement for '23.
>
>
>
>
>
> On Tue, Oct 19, 2021 at 3:32 PM Tom Honermann via SG16 <
> sg16_at_[hidden]> wrote:
>
> My apologies for being so late in communicating this agenda for tomorrow's
> SG16 telecon.
>
> *Please note that there has been a schedule change.* Tomorrow's telecon
> was originally scheduled for 2021-10-27 but was moved forward a week to
> avoid conflicts with CppCon. The shared calendar has been updated (which
> triggered the sending of new meeting invitations).
>
> SG16 will hold a telecon on Wednesday, October *20th* (not the 27th) at
> 19:30 UTC (timezone conversion
> <https://urldefense.com/v3/__https:/www.timeanddate.com/worldclock/converter.html?iso=20211020T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest__;!!EHscmS1ygiU1lA!RXWdmk3flleynfL18yroZC5qNwXH66IJ9vhSeUFvH2IQ92gsTQo3icZlT-pXQQ$>
> ).
>
> The agenda is:
>
> - D2071R1: Named universal character escapes
>
>
> - Add named escape sequences to *universal-character-name* so that
> these escape sequences can be used everywhere, not just in string literals.
> - Use Unicode rules for matching names rather than requiring exact
> case-sensitive names.
>
>
> - P1885R8: Naming Text Encodings to Demystify Them
> <https://urldefense.com/v3/__https:/wg21.link/p1885r8__;!!EHscmS1ygiU1lA!RXWdmk3flleynfL18yroZC5qNwXH66IJ9vhSeUFvH2IQ92gsTQo3icZ8j5nuGA$>
>
>
> - Continue discussions of issues raised on the LEWG and SG16 mailing
> lists.
> - Prohibit mapping to IANA encodings when CHAR_BIT is not 8?
> - Address special cases for IANA mapping purposes:
>
>
> - Is UTF-16 valid for ordinary strings when CHAR_BIT is >= 16?
> - Is UTF-16 valid for wide strings when CHAR_BIT is >= 16 and
> sizeof(wchar_t) is 1?
> - Is the underlying representation of a wide string required to
> match an encoding scheme for the encoding form when
> sizeof(wchar_t) is not 1?
> - Limit mapping of wide strings when sizeof(wchar_t) is not 1 to
> other, unknown, and the UCS/UTF variants?
>
> A draft of D2071R1 is not yet available, but is expected to be sent to the
> SG16 mailing list later today. That draft will address EWG feedback from
> Prague
> <https://urldefense.com/v3/__https:/wiki.edg.com/bin/view/Wg21prague/P2071R0-EWG__;!!EHscmS1ygiU1lA!RXWdmk3flleynfL18yroZC5qNwXH66IJ9vhSeUFvH2IQ92gsTQo3icY2cw3Xvg$>
> except that it will specify relaxed checking of character names for
> implementation reasons (the prototype implementation that Corentin did
> demonstrated how relaxed checking enables a smaller database of valid
> character names).
>
> P1885 is back on the agenda to settle questions that have continued to be
> raised on the LEWG and SG16 mailing lists. Corentin indicated intent to
> change the encoding querying functions to always return unknown when
> CHAR_BIT is not 8; we'll discuss the ramifications of that intent.
> Concerns about intended behavior for wide strings continue to be raised, so
> we'll discuss and poll various approaches for dealing with them.
>
> Tom.
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
> <https://urldefense.com/v3/__https:/lists.isocpp.org/mailman/listinfo.cgi/sg16__;!!EHscmS1ygiU1lA!RXWdmk3flleynfL18yroZC5qNwXH66IJ9vhSeUFvH2IQ92gsTQo3icYUUQdpQg$>
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2021-10-20 04:09:46