C++ Logo


Advanced search

Re: [SG16] Draft of d2071 - Named escape sequences

From: Peter Brett <pbrett_at_[hidden]>
Date: Wed, 20 Oct 2021 08:44:33 +0000
Hi Steve,

Does the paper need any updates now that the \u{xxxx} syntax for universal character names is likely to be in C++23?



From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Steve Downey via SG16
Sent: 20 October 2021 03:47
To: SG16 <sg16_at_[hidden]>
Cc: Steve Downey <sdowney_at_gmail.com>
Subject: [SG16] Draft of d2071 - Named escape sequences


Find attached a draft of Named universal character escapes. It provides rebased wording based on EWGs last directions, alternatively provides wording for UCNs rather than escapes, in light of p2290 Delimited Escape Sequences, and additionally emphasizes we are choosing to ignore how the Unicode standard recommends that Names should be matched against the Unicode Database, and provides a wording alternative for that.

Wording has not been reviewed, but doesn't change significantly from past versions, except that changes to 5.13.5 [lex.string] paragraph 14<https://urldefense.com/v3/__http:/eel.is/c**Adraft/lex.string*14__;Kysj!!EHscmS1ygiU1lA!RXWdmk3flleynfL18yroZC5qNwXH66IJ9vhSeUFvH2IQ92gsTQo3icbIApyLwA$>: no longer seem to be necessary, as the new machinery takes care of everything. I believe.

We have a slot in EWG on the 7th, unless something has changed, for this paper. I can do more wording work after this, although the primary goal, in my opinion, is to get design agreement for '23.

On Tue, Oct 19, 2021 at 3:32 PM Tom Honermann via SG16 <sg16_at_[hidden]<mailto:sg16_at_[hidden]>> wrote:

My apologies for being so late in communicating this agenda for tomorrow's SG16 telecon.

Please note that there has been a schedule change. Tomorrow's telecon was originally scheduled for 2021-10-27 but was moved forward a week to avoid conflicts with CppCon. The shared calendar has been updated (which triggered the sending of new meeting invitations).

SG16 will hold a telecon on Wednesday, October 20th (not the 27th) at 19:30 UTC (timezone conversion<https://urldefense.com/v3/__https:/www.timeanddate.com/worldclock/converter.html?iso=20211020T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest__;!!EHscmS1ygiU1lA!RXWdmk3flleynfL18yroZC5qNwXH66IJ9vhSeUFvH2IQ92gsTQo3icZlT-pXQQ$>).

The agenda is:

  * D2071R1: Named universal character escapes

     * Add named escape sequences to universal-character-name so that these escape sequences can be used everywhere, not just in string literals.
     * Use Unicode rules for matching names rather than requiring exact case-sensitive names.

  * P1885R8: Naming Text Encodings to Demystify Them<https://urldefense.com/v3/__https:/wg21.link/p1885r8__;!!EHscmS1ygiU1lA!RXWdmk3flleynfL18yroZC5qNwXH66IJ9vhSeUFvH2IQ92gsTQo3icZ8j5nuGA$>

     * Continue discussions of issues raised on the LEWG and SG16 mailing lists.
     * Prohibit mapping to IANA encodings when CHAR_BIT is not 8?
     * Address special cases for IANA mapping purposes:

        * Is UTF-16 valid for ordinary strings when CHAR_BIT is >= 16?
        * Is UTF-16 valid for wide strings when CHAR_BIT is >= 16 and sizeof(wchar_t) is 1?
        * Is the underlying representation of a wide string required to match an encoding scheme for the encoding form when sizeof(wchar_t) is not 1?
        * Limit mapping of wide strings when sizeof(wchar_t) is not 1 to other, unknown, and the UCS/UTF variants?

A draft of D2071R1 is not yet available, but is expected to be sent to the SG16 mailing list later today. That draft will address EWG feedback from Prague<https://urldefense.com/v3/__https:/wiki.edg.com/bin/view/Wg21prague/P2071R0-EWG__;!!EHscmS1ygiU1lA!RXWdmk3flleynfL18yroZC5qNwXH66IJ9vhSeUFvH2IQ92gsTQo3icY2cw3Xvg$> except that it will specify relaxed checking of character names for implementation reasons (the prototype implementation that Corentin did demonstrated how relaxed checking enables a smaller database of valid character names).

P1885 is back on the agenda to settle questions that have continued to be raised on the LEWG and SG16 mailing lists. Corentin indicated intent to change the encoding querying functions to always return unknown when CHAR_BIT is not 8; we'll discuss the ramifications of that intent. Concerns about intended behavior for wide strings continue to be raised, so we'll discuss and poll various approaches for dealing with them.

SG16 mailing list

Received on 2021-10-20 03:44:44