ISOCPP sg16 List: Re: [isocpp-sg16] Agenda for the 2025-05-14 SG16 meeting

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Wed, 14 May 2025 20:06:05 +0200

On Wed, May 14, 2025 at 7:46 PM Alisdair Meredith <alisdairm_at_[hidden]> wrote:

> I was looking at the concern you raise as I was re-reading my paper to
> present today :)
>
> The intent of my paper was to as close to an editorial fix as possible, so
> I wanted
> to minimize the level of wording changes. If I were to action your
> feedback, I would
> be concerned that we do not normatively apply the UTF8 handling of
> carriage returns
> to the "source text” produced by the implementation-defined mappings.
>
> My ideal form would have source text be the output of translation phase 1,
> guaranteed
> to be UTF8 encoded, and with the carriage return hack for line endings
> applied, so that
> new-line is synonymous with the new-line glyph.
>

After phase 1, what you call the source text is a "sequence of translation
character set elements".
Which is isomorphic to "a sequence of Unicode code points/scalar values",
which are not encoded
(and for which an implementation may choose whatever representation they
choose)

The two paths in phase 1 are
UTF-8 -> Unicode code points
Implementation-defined input -> Unicode code points.

Either way, at the start of that sentence, we have a sequence of Unicode
code points, which we call translation set elements
https://eel.is/c++draft/lex.phases#1.2 .
And everything from that point on applies to that sequence.
I'm suggesting that to avoid confusion, we should introduce a term to refer
to that sequence, which is what "source text" is in your paper.
Except you are introducing the "source text" term without defining it.

(Note that we cannot define "source file"/"input file" or whatever we want
to call it.
And we specifically don't want to. That's agreed upon already; anything
from which we can extract things that represent C++ code qualifies, and we
are happy with that term being nebulous.)

>
> That would be a little more work wording phase 1, and I really do not want
> to make
> changes that seem bigger than might land as an NB editorial comment for
> C++26.
>
> That said, if the group agrees, I will work on providing that wording as
> an option so
> that it is available for consideration in subsequent reviews.
>

>
> AlisdairM
>
> On May 14, 2025, at 1:37 PM, Corentin Jabot <corentinjabot_at_[hidden]>
> wrote:
>
> I said I'd give feedback to Alisdair on P3556R0 before the meeting.
>
> So briefly:
> - Using "source file" is fine, ship it.
> - For "source text", if we want to distinguish phase 1 and post-phase-1
> by using that term (I don't love it, but it seems adequate), I think we are
> missing a definition for it.
>
> Maybe you could improve by adding a paragraph at the end of phase 1:
>
> > This sequence of translation character set elements is termed
> the _source text_.
>
> (There are probably less awkward ways to do that, but we mention "sequence
> of translation character set elements" twice in the last two paragraphs of
> phase 1)
>
> This paper, for me, does not resolve the confusion of the use of the term
> "header file" (https://github.com/cplusplus/CWG/issues/665) - but I
> don't think we necessarily want do that in this paper.
>
>
> Nit: top of page 9, "source text of the source file" seems redundant
>
> --
> P3657R0
>
> I wish we had a grammar for comments. Other than that, ship it.
>
>
> Thanks for working on this, Alisdair!
>
> On Wed, May 14, 2025 at 5:32 AM Tom Honermann via SG16 <
> sg16_at_[hidden]> wrote:
>
>> SG16 will hold a meeting *today/tomorrow*, Wednesday, May 14th, at 19:30
>> UTC (timezone conversion
>> <https://www.timeanddate.com/worldclock/converter.html?iso=20250514T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>
>> ).
>>
>> If you need a .ics file to import into your calendar, you can download it
>> here
>> <https://documents.isocpp.org/remote.php/dav/public-calendars/R7imgS2LJD9xfeWN/94A3D3A0-70B9-4847-935F-9453DB2BB216.ics?export>
>> .
>>
>> The agenda follows.
>>
>> - P3658R0: Adjust *identifier* following new Unicode recommendations
>> <https://wg21.link/p3658r0>.
>> - P3556R0: Input files are source files <https://wg21.link/p3556r0>.
>> - P3657R0: A Grammar for Whitespace Characters
>> <https://wg21.link/p3657r0>.
>>
>> *P3658R0*, by our good friend Robin Leroy, seeks to adjust the character
>> allowances for identifiers to include a more consistent set of mathematical
>> symbols. This recommendation comes from the UTC in the wake of the adoption
>> of P1949R7 (C++ Identifier Syntax using Unicode Standard Annex 31)
>> <https://wg21.link/p1949r7> for C++23, a paper I'm sure you all remember
>> well. Deployment of P1949 was found to break some existing code that used
>> identifiers containing mathematical symbols that were made invalid by the
>> adoption of P1949R7, but that seemed quite reasonable considering similar
>> identifiers that were not made invalid. The UTC investigated and produced a
>> recommendation for general purpose programming languages as published in UTS
>> #55 (Unicode Source Code Handling)
>> <https://www.unicode.org/reports/tr55/>. The Unicode stability policy
>> prohibited directly changing the XID_Start and XID_Continue properties,
>> so a Mathematical Compatibility Notation Profile
>> <https://www.unicode.org/reports/tr31/#Mathematical_Compatibility_Notation_Profile>
>> was defined with corresponding ID_Compat_Math_Start
>> <https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AID_Compat_Math_Start%3A%5D&g=&i=>
>> and ID_Compat_Math_Continue
>> <https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AID_Compat_Math_Continue%3A%5D&g=&i=>
>> properties to identify the member characters. The proposed changes are
>> rather straight forward; modify the *identifier-start*
>> <https://eel.is/c++draft/lex.name#nt:identifier-start> and
>> *identifier-continue*
>> <https://eel.is/c++draft/lex.name#nt:identifier-continue> grammar
>> productions to include characters identified by the new properties.
>>
>> *P3556R0* and *P3657R0* come to us courtesy of Alisdair Meredith. These
>> papers are intended to clarify core language wording related to
>> input/source file terminology and the specification of whitespace
>> characters. Both papers are near editorial in nature, but sufficiently
>> complicated to warrant CWG review; SG16 was requested to review since these
>> touch topics near and dear to us. P3556R0 does not include any intended
>> impact to existing implementations. P3657R0 includes two normative changes;
>> it addresses CWG 1655 (Line endings in raw string literals)
>> <https://wg21.link/cwg1655> and it removes a case of IFNDR from
>> [lex.comment]p1 <https://eel.is/c++draft/lex.comment#1.sentence-4> as
>> previously proposed by Corentin in P2348R3 (Whitespaces Wording Revamp)
>> <https://wg21.link/p2348r3>.
>>
>>
>> Tom.
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>> Link to this post: http://lists.isocpp.org/sg16/2025/05/4571.php
>>
>
>

Received on 2025-05-14 18:06:30