ISOCPP sg16 List: Re: [isocpp-sg16] Agenda for the 2025-05-14 SG16 meeting

From: Alisdair Meredith <alisdairm_at_[hidden]>
Date: Wed, 14 May 2025 13:46:11 -0400

I was looking at the concern you raise as I was re-reading my paper to present today :)

The intent of my paper was to as close to an editorial fix as possible, so I wanted
to minimize the level of wording changes. If I were to action your feedback, I would
be concerned that we do not normatively apply the UTF8 handling of carriage returns
to the "source text” produced by the implementation-defined mappings.

My ideal form would have source text be the output of translation phase 1, guaranteed
to be UTF8 encoded, and with the carriage return hack for line endings applied, so that
new-line is synonymous with the new-line glyph.

That would be a little more work wording phase 1, and I really do not want to make
changes that seem bigger than might land as an NB editorial comment for C++26.

That said, if the group agrees, I will work on providing that wording as an option so
that it is available for consideration in subsequent reviews.

AlisdairM

> On May 14, 2025, at 1:37 PM, Corentin Jabot <corentinjabot_at_[hidden]> wrote:
>
> I said I'd give feedback to Alisdair on P3556R0 before the meeting.
>
> So briefly:
> - Using "source file" is fine, ship it.
> - For "source text", if we want to distinguish phase 1 and post-phase-1 by using that term (I don't love it, but it seems adequate), I think we are missing a definition for it.
>
> Maybe you could improve by adding a paragraph at the end of phase 1:
>
> > This sequence of translation character set elements is termed the _source text_.
>
> (There are probably less awkward ways to do that, but we mention "sequence of translation character set elements" twice in the last two paragraphs of phase 1)
>
> This paper, for me, does not resolve the confusion of the use of the term "header file" (https://github.com/cplusplus/CWG/issues/665) - but I don't think we necessarily want do that in this paper.
>
>
> Nit: top of page 9, "source text of the source file" seems redundant
>
> --
> P3657R0
>
> I wish we had a grammar for comments. Other than that, ship it.
>
>
> Thanks for working on this, Alisdair!
>
> On Wed, May 14, 2025 at 5:32 AM Tom Honermann via SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>> SG16 will hold a meeting today/tomorrow, Wednesday, May 14th, at 19:30 UTC (timezone conversion <https://www.timeanddate.com/worldclock/converter.html?iso=20250514T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
>>
>> If you need a .ics file to import into your calendar, you can download it here <https://documents.isocpp.org/remote.php/dav/public-calendars/R7imgS2LJD9xfeWN/94A3D3A0-70B9-4847-935F-9453DB2BB216.ics?export>.
>>
>> The agenda follows.
>>
>> P3658R0: Adjust identifier following new Unicode recommendations <https://wg21.link/p3658r0>.
>> P3556R0: Input files are source files <https://wg21.link/p3556r0>.
>> P3657R0: A Grammar for Whitespace Characters <https://wg21.link/p3657r0>.
>> P3658R0, by our good friend Robin Leroy, seeks to adjust the character allowances for identifiers to include a more consistent set of mathematical symbols. This recommendation comes from the UTC in the wake of the adoption of P1949R7 (C++ Identifier Syntax using Unicode Standard Annex 31) <https://wg21.link/p1949r7> for C++23, a paper I'm sure you all remember well. Deployment of P1949 was found to break some existing code that used identifiers containing mathematical symbols that were made invalid by the adoption of P1949R7, but that seemed quite reasonable considering similar identifiers that were not made invalid. The UTC investigated and produced a recommendation for general purpose programming languages as published in UTS #55 (Unicode Source Code Handling) <https://www.unicode.org/reports/tr55/>. The Unicode stability policy prohibited directly changing the XID_Start and XID_Continue properties, so a Mathematical Compatibility Notation Profile <https://www.unicode.org/reports/tr31/#Mathematical_Compatibility_Notation_Profile> was defined with corresponding ID_Compat_Math_Start <https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AID_Compat_Math_Start%3A%5D&g=&i=> and ID_Compat_Math_Continue <https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AID_Compat_Math_Continue%3A%5D&g=&i=> properties to identify the member characters. The proposed changes are rather straight forward; modify the identifier-start <https://eel.is/c++draft/lex.name#nt:identifier-start> and identifier-continue <https://eel.is/c++draft/lex.name#nt:identifier-continue> grammar productions to include characters identified by the new properties.
>>
>> P3556R0 and P3657R0 come to us courtesy of Alisdair Meredith. These papers are intended to clarify core language wording related to input/source file terminology and the specification of whitespace characters. Both papers are near editorial in nature, but sufficiently complicated to warrant CWG review; SG16 was requested to review since these touch topics near and dear to us. P3556R0 does not include any intended impact to existing implementations. P3657R0 includes two normative changes; it addresses CWG 1655 (Line endings in raw string literals) <https://wg21.link/cwg1655> and it removes a case of IFNDR from [lex.comment]p1 <https://eel.is/c++draft/lex.comment#1.sentence-4> as previously proposed by Corentin in P2348R3 (Whitespaces Wording Revamp) <https://wg21.link/p2348r3>.
>>
>>
>>
>> Tom.
>>
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>> Link to this post: http://lists.isocpp.org/sg16/2025/05/4571.php

Received on 2025-05-14 17:46:38