C++ Logo

sg16

Advanced search

Re: [SG16] Draft of d2071 - Named escape sequences

From: Steve Downey <sdowney_at_[hidden]>
Date: Wed, 20 Oct 2021 09:19:38 -0400
Paper is updated with both concerns, including quoting the *should*
direction from the relevant Unicode standard.


On Wed, Oct 20, 2021, 05:10 Corentin Jabot via SG16 <sg16_at_[hidden]>
wrote:

>
>
> On Wed, Oct 20, 2021, 10:44 Peter Brett via SG16 <sg16_at_[hidden]>
> wrote:
>
>> Hi Steve,
>>
>>
>>
>> Does the paper need any updates now that the \u{xxxx} syntax for
>> universal character names is likely to be in C++23?
>>
>>
>>
> I think only the wording is chapter 11 (modify lex.charset) is needed.
> This is independently of \u, except they modify the same place.
>
> In addition the grammar in lex.charset need to be modified to reflect that
> a named-escape-sequence is a universal-character-name (the paper doesn't do
> that)
> This is necessary and sufficient to allow \N to be used everywhere \u \U
> and \u{} can be.
>
> While i am there, i would like to plea that the matching algorithm is
> reconsidered, we have no implementation experience with strict, case
> sensitive matching, and i believe this strict matching is user hostile.
>
>
> Thanks,
>>
>>
>>
>> Peter
>>
>>
>>
>> *From:* SG16 <sg16-bounces_at_[hidden]> *On Behalf Of *Steve Downey
>> via SG16
>> *Sent:* 20 October 2021 03:47
>> *To:* SG16 <sg16_at_[hidden]>
>> *Cc:* Steve Downey <sdowney_at_[hidden]>
>> *Subject:* [SG16] Draft of d2071 - Named escape sequences
>>
>>
>>
>> EXTERNAL MAIL
>>
>>
>>
>> Find attached a draft of Named universal character escapes. It provides
>> rebased wording based on EWGs last directions, alternatively provides
>> wording for UCNs rather than escapes, in light of p2290 Delimited Escape
>> Sequences, and additionally emphasizes we are choosing to ignore how the
>> Unicode standard recommends that Names should be matched against the
>> Unicode Database, and provides a wording alternative for that.
>>
>> Wording has not been reviewed, but doesn't change significantly from past
>> versions, except that changes to 5.13.5 [lex.string] paragraph 14
>> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/lex.string*14__;Kysj!!EHscmS1ygiU1lA!RXWdmk3flleynfL18yroZC5qNwXH66IJ9vhSeUFvH2IQ92gsTQo3icbIApyLwA$>
>> : no longer seem to be necessary, as the new machinery takes care of
>> everything. I believe.
>>
>>
>>
>> We have a slot in EWG on the 7th, unless something has changed, for this
>> paper. I can do more wording work after this, although the primary goal, in
>> my opinion, is to get design agreement for '23.
>>
>>
>>
>>
>>
>> On Tue, Oct 19, 2021 at 3:32 PM Tom Honermann via SG16 <
>> sg16_at_[hidden]> wrote:
>>
>> My apologies for being so late in communicating this agenda for
>> tomorrow's SG16 telecon.
>>
>> *Please note that there has been a schedule change.* Tomorrow's telecon
>> was originally scheduled for 2021-10-27 but was moved forward a week to
>> avoid conflicts with CppCon. The shared calendar has been updated (which
>> triggered the sending of new meeting invitations).
>>
>> SG16 will hold a telecon on Wednesday, October *20th* (not the 27th) at
>> 19:30 UTC (timezone conversion
>> <https://urldefense.com/v3/__https:/www.timeanddate.com/worldclock/converter.html?iso=20211020T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest__;!!EHscmS1ygiU1lA!RXWdmk3flleynfL18yroZC5qNwXH66IJ9vhSeUFvH2IQ92gsTQo3icZlT-pXQQ$>
>> ).
>>
>> The agenda is:
>>
>> - D2071R1: Named universal character escapes
>>
>>
>> - Add named escape sequences to *universal-character-name* so that
>> these escape sequences can be used everywhere, not just in string literals.
>> - Use Unicode rules for matching names rather than requiring exact
>> case-sensitive names.
>>
>>
>> - P1885R8: Naming Text Encodings to Demystify Them
>> <https://urldefense.com/v3/__https:/wg21.link/p1885r8__;!!EHscmS1ygiU1lA!RXWdmk3flleynfL18yroZC5qNwXH66IJ9vhSeUFvH2IQ92gsTQo3icZ8j5nuGA$>
>>
>>
>> - Continue discussions of issues raised on the LEWG and SG16 mailing
>> lists.
>> - Prohibit mapping to IANA encodings when CHAR_BIT is not 8?
>> - Address special cases for IANA mapping purposes:
>>
>>
>> - Is UTF-16 valid for ordinary strings when CHAR_BIT is >= 16?
>> - Is UTF-16 valid for wide strings when CHAR_BIT is >= 16 and
>> sizeof(wchar_t) is 1?
>> - Is the underlying representation of a wide string required to
>> match an encoding scheme for the encoding form when
>> sizeof(wchar_t) is not 1?
>> - Limit mapping of wide strings when sizeof(wchar_t) is not 1
>> to other, unknown, and the UCS/UTF variants?
>>
>> A draft of D2071R1 is not yet available, but is expected to be sent to
>> the SG16 mailing list later today. That draft will address EWG feedback
>> from Prague
>> <https://urldefense.com/v3/__https:/wiki.edg.com/bin/view/Wg21prague/P2071R0-EWG__;!!EHscmS1ygiU1lA!RXWdmk3flleynfL18yroZC5qNwXH66IJ9vhSeUFvH2IQ92gsTQo3icY2cw3Xvg$>
>> except that it will specify relaxed checking of character names for
>> implementation reasons (the prototype implementation that Corentin did
>> demonstrated how relaxed checking enables a smaller database of valid
>> character names).
>>
>> P1885 is back on the agenda to settle questions that have continued to be
>> raised on the LEWG and SG16 mailing lists. Corentin indicated intent to
>> change the encoding querying functions to always return unknown when
>> CHAR_BIT is not 8; we'll discuss the ramifications of that intent.
>> Concerns about intended behavior for wide strings continue to be raised, so
>> we'll discuss and poll various approaches for dealing with them.
>>
>> Tom.
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>> <https://urldefense.com/v3/__https:/lists.isocpp.org/mailman/listinfo.cgi/sg16__;!!EHscmS1ygiU1lA!RXWdmk3flleynfL18yroZC5qNwXH66IJ9vhSeUFvH2IQ92gsTQo3icYUUQdpQg$>
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2021-10-20 08:19:54