Date: Wed, 14 May 2025 14:45:29 -0400
On 5/14/25 2:06 PM, Corentin Jabot via SG16 wrote:
>
>
> On Wed, May 14, 2025 at 7:46 PM Alisdair Meredith <alisdairm_at_[hidden]>
> wrote:
>
> I was looking at the concern you raise as I was re-reading my
> paper to present today :)
>
> The intent of my paper was to as close to an editorial fix as
> possible, so I wanted
> to minimize the level of wording changes. If I were to action
> your feedback, I would
> be concerned that we do not normatively apply the UTF8 handling of
> carriage returns
> to the "source text” produced by the implementation-defined mappings.
>
> My ideal form would have source text be the output of translation
> phase 1, guaranteed
> to be UTF8 encoded, and with the carriage return hack for line
> endings applied, so that
> new-line is synonymous with the new-line glyph.
>
>
> After phase 1, what you call the source text is a "sequence of
> translation character set elements".
> Which is isomorphic to "a sequence of Unicode code points/scalar
> values", which are not encoded
> (and for which an implementation may choose whatever representation
> they choose)
>
> The two paths in phase 1 are
> UTF-8 -> Unicode code points
> Implementation-defined input -> Unicode code points.
>
> Either way, at the start of that sentence, we have a sequence of
> Unicode code points, which we call translation set elements
> https://eel.is/c++draft/lex.phases#1.2 .
> And everything from that point on applies to that sequence.
> I'm suggesting that to avoid confusion, we should introduce a term to
> refer to that sequence, which is what "source text" is in your paper.
> Except you are introducing the "source text" term without defining it.
>
> (Note that we cannot define "source file"/"input file" or whatever we
> want to call it.
> And we specifically don't want to. That's agreed upon already;
> anything from which we can extract things that represent C++ code
> qualifies, and we are happy with that term being nebulous.)
+1 to Corentin's above reply.
Tom.
>
> That would be a little more work wording phase 1, and I really do
> not want to make
> changes that seem bigger than might land as an NB editorial
> comment for C++26.
>
> That said, if the group agrees, I will work on providing that
> wording as an option so
> that it is available for consideration in subsequent reviews.
>
>
> AlisdairM
>
>> On May 14, 2025, at 1:37 PM, Corentin Jabot
>> <corentinjabot_at_[hidden]> wrote:
>>
>> I said I'd give feedback to Alisdair on P3556R0 before the meeting.
>>
>> So briefly:
>> - Using "source file" is fine, ship it.
>> - For "source text", if we want to distinguish phase 1 and
>> post-phase-1 by using that term (I don't love it, but it seems
>> adequate), I think we are missing a definition for it.
>>
>> Maybe you could improve by adding a paragraph at the end of phase 1:
>>
>> > This sequence of translation character set elements is termed
>> the _source text_.
>>
>> (There are probably less awkward ways to do that, but we mention
>> "sequence of translation character set elements" twice in the
>> last two paragraphs of phase 1)
>>
>> This paper, for me, does not resolve the confusion of the use of
>> the term "header file"
>> (https://github.com/cplusplus/CWG/issues/665) - but I don't think
>> we necessarily want do that in this paper.
>>
>>
>> Nit: top of page 9, "source text of the source file" seems redundant
>>
>> --
>> P3657R0
>>
>> I wish we had a grammar for comments. Other than that, ship it.
>>
>>
>> Thanks for working on this, Alisdair!
>>
>> On Wed, May 14, 2025 at 5:32 AM Tom Honermann via SG16
>> <sg16_at_[hidden]> wrote:
>>
>> SG16 will hold a meeting *today/tomorrow*, Wednesday, May
>> 14th, at 19:30 UTC (timezone conversion
>> <https://www.timeanddate.com/worldclock/converter.html?iso=20250514T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
>>
>> If you need a .ics file to import into your calendar, you can
>> download it here
>> <https://documents.isocpp.org/remote.php/dav/public-calendars/R7imgS2LJD9xfeWN/94A3D3A0-70B9-4847-935F-9453DB2BB216.ics?export>.
>>
>> The agenda follows.
>>
>> * P3658R0: Adjust /identifier/ following new Unicode
>> recommendations <https://wg21.link/p3658r0>.
>> * P3556R0: Input files are source files
>> <https://wg21.link/p3556r0>.
>> * P3657R0: A Grammar for Whitespace Characters
>> <https://wg21.link/p3657r0>.
>>
>> *P3658R0*, by our good friend Robin Leroy, seeks to adjust
>> the character allowances for identifiers to include a more
>> consistent set of mathematical symbols. This recommendation
>> comes from the UTC in the wake of the adoption of P1949R7
>> (C++ Identifier Syntax using Unicode Standard Annex 31)
>> <https://wg21.link/p1949r7> for C++23, a paper I'm sure you
>> all remember well. Deployment of P1949 was found to break
>> some existing code that used identifiers containing
>> mathematical symbols that were made invalid by the adoption
>> of P1949R7, but that seemed quite reasonable considering
>> similar identifiers that were not made invalid. The UTC
>> investigated and produced a recommendation for general
>> purpose programming languages as published in UTS #55
>> (Unicode Source Code Handling)
>> <https://www.unicode.org/reports/tr55/>. The Unicode
>> stability policy prohibited directly changing the XID_Start
>> and XID_Continue properties, so a Mathematical Compatibility
>> Notation Profile
>> <https://www.unicode.org/reports/tr31/#Mathematical_Compatibility_Notation_Profile>
>> was defined with corresponding ID_Compat_Math_Start
>> <https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AID_Compat_Math_Start%3A%5D&g=&i=>
>> and ID_Compat_Math_Continue
>> <https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AID_Compat_Math_Continue%3A%5D&g=&i=>
>> properties to identify the member characters. The proposed
>> changes are rather straight forward; modify the
>> /identifier-start/
>> <https://eel.is/c++draft/lex.name#nt:identifier-start> and
>> /identifier-continue/
>> <https://eel.is/c++draft/lex.name#nt:identifier-continue>
>> grammar productions to include characters identified by the
>> new properties.
>>
>> *P3556R0* and *P3657R0* come to us courtesy of Alisdair
>> Meredith. These papers are intended to clarify core language
>> wording related to input/source file terminology and the
>> specification of whitespace characters. Both papers are near
>> editorial in nature, but sufficiently complicated to warrant
>> CWG review; SG16 was requested to review since these touch
>> topics near and dear to us. P3556R0 does not include any
>> intended impact to existing implementations. P3657R0 includes
>> two normative changes; it addresses CWG 1655 (Line endings in
>> raw string literals) <https://wg21.link/cwg1655> and it
>> removes a case of IFNDR from [lex.comment]p1
>> <https://eel.is/c++draft/lex.comment#1.sentence-4> as
>> previously proposed by Corentin in P2348R3 (Whitespaces
>> Wording Revamp) <https://wg21.link/p2348r3>.
>>
>>
>>
>> Tom.
>>
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>> Link to this post: http://lists.isocpp.org/sg16/2025/05/4571.php
>>
>
>
>
>
> On Wed, May 14, 2025 at 7:46 PM Alisdair Meredith <alisdairm_at_[hidden]>
> wrote:
>
> I was looking at the concern you raise as I was re-reading my
> paper to present today :)
>
> The intent of my paper was to as close to an editorial fix as
> possible, so I wanted
> to minimize the level of wording changes. If I were to action
> your feedback, I would
> be concerned that we do not normatively apply the UTF8 handling of
> carriage returns
> to the "source text” produced by the implementation-defined mappings.
>
> My ideal form would have source text be the output of translation
> phase 1, guaranteed
> to be UTF8 encoded, and with the carriage return hack for line
> endings applied, so that
> new-line is synonymous with the new-line glyph.
>
>
> After phase 1, what you call the source text is a "sequence of
> translation character set elements".
> Which is isomorphic to "a sequence of Unicode code points/scalar
> values", which are not encoded
> (and for which an implementation may choose whatever representation
> they choose)
>
> The two paths in phase 1 are
> UTF-8 -> Unicode code points
> Implementation-defined input -> Unicode code points.
>
> Either way, at the start of that sentence, we have a sequence of
> Unicode code points, which we call translation set elements
> https://eel.is/c++draft/lex.phases#1.2 .
> And everything from that point on applies to that sequence.
> I'm suggesting that to avoid confusion, we should introduce a term to
> refer to that sequence, which is what "source text" is in your paper.
> Except you are introducing the "source text" term without defining it.
>
> (Note that we cannot define "source file"/"input file" or whatever we
> want to call it.
> And we specifically don't want to. That's agreed upon already;
> anything from which we can extract things that represent C++ code
> qualifies, and we are happy with that term being nebulous.)
+1 to Corentin's above reply.
Tom.
>
> That would be a little more work wording phase 1, and I really do
> not want to make
> changes that seem bigger than might land as an NB editorial
> comment for C++26.
>
> That said, if the group agrees, I will work on providing that
> wording as an option so
> that it is available for consideration in subsequent reviews.
>
>
> AlisdairM
>
>> On May 14, 2025, at 1:37 PM, Corentin Jabot
>> <corentinjabot_at_[hidden]> wrote:
>>
>> I said I'd give feedback to Alisdair on P3556R0 before the meeting.
>>
>> So briefly:
>> - Using "source file" is fine, ship it.
>> - For "source text", if we want to distinguish phase 1 and
>> post-phase-1 by using that term (I don't love it, but it seems
>> adequate), I think we are missing a definition for it.
>>
>> Maybe you could improve by adding a paragraph at the end of phase 1:
>>
>> > This sequence of translation character set elements is termed
>> the _source text_.
>>
>> (There are probably less awkward ways to do that, but we mention
>> "sequence of translation character set elements" twice in the
>> last two paragraphs of phase 1)
>>
>> This paper, for me, does not resolve the confusion of the use of
>> the term "header file"
>> (https://github.com/cplusplus/CWG/issues/665) - but I don't think
>> we necessarily want do that in this paper.
>>
>>
>> Nit: top of page 9, "source text of the source file" seems redundant
>>
>> --
>> P3657R0
>>
>> I wish we had a grammar for comments. Other than that, ship it.
>>
>>
>> Thanks for working on this, Alisdair!
>>
>> On Wed, May 14, 2025 at 5:32 AM Tom Honermann via SG16
>> <sg16_at_[hidden]> wrote:
>>
>> SG16 will hold a meeting *today/tomorrow*, Wednesday, May
>> 14th, at 19:30 UTC (timezone conversion
>> <https://www.timeanddate.com/worldclock/converter.html?iso=20250514T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
>>
>> If you need a .ics file to import into your calendar, you can
>> download it here
>> <https://documents.isocpp.org/remote.php/dav/public-calendars/R7imgS2LJD9xfeWN/94A3D3A0-70B9-4847-935F-9453DB2BB216.ics?export>.
>>
>> The agenda follows.
>>
>> * P3658R0: Adjust /identifier/ following new Unicode
>> recommendations <https://wg21.link/p3658r0>.
>> * P3556R0: Input files are source files
>> <https://wg21.link/p3556r0>.
>> * P3657R0: A Grammar for Whitespace Characters
>> <https://wg21.link/p3657r0>.
>>
>> *P3658R0*, by our good friend Robin Leroy, seeks to adjust
>> the character allowances for identifiers to include a more
>> consistent set of mathematical symbols. This recommendation
>> comes from the UTC in the wake of the adoption of P1949R7
>> (C++ Identifier Syntax using Unicode Standard Annex 31)
>> <https://wg21.link/p1949r7> for C++23, a paper I'm sure you
>> all remember well. Deployment of P1949 was found to break
>> some existing code that used identifiers containing
>> mathematical symbols that were made invalid by the adoption
>> of P1949R7, but that seemed quite reasonable considering
>> similar identifiers that were not made invalid. The UTC
>> investigated and produced a recommendation for general
>> purpose programming languages as published in UTS #55
>> (Unicode Source Code Handling)
>> <https://www.unicode.org/reports/tr55/>. The Unicode
>> stability policy prohibited directly changing the XID_Start
>> and XID_Continue properties, so a Mathematical Compatibility
>> Notation Profile
>> <https://www.unicode.org/reports/tr31/#Mathematical_Compatibility_Notation_Profile>
>> was defined with corresponding ID_Compat_Math_Start
>> <https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AID_Compat_Math_Start%3A%5D&g=&i=>
>> and ID_Compat_Math_Continue
>> <https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AID_Compat_Math_Continue%3A%5D&g=&i=>
>> properties to identify the member characters. The proposed
>> changes are rather straight forward; modify the
>> /identifier-start/
>> <https://eel.is/c++draft/lex.name#nt:identifier-start> and
>> /identifier-continue/
>> <https://eel.is/c++draft/lex.name#nt:identifier-continue>
>> grammar productions to include characters identified by the
>> new properties.
>>
>> *P3556R0* and *P3657R0* come to us courtesy of Alisdair
>> Meredith. These papers are intended to clarify core language
>> wording related to input/source file terminology and the
>> specification of whitespace characters. Both papers are near
>> editorial in nature, but sufficiently complicated to warrant
>> CWG review; SG16 was requested to review since these touch
>> topics near and dear to us. P3556R0 does not include any
>> intended impact to existing implementations. P3657R0 includes
>> two normative changes; it addresses CWG 1655 (Line endings in
>> raw string literals) <https://wg21.link/cwg1655> and it
>> removes a case of IFNDR from [lex.comment]p1
>> <https://eel.is/c++draft/lex.comment#1.sentence-4> as
>> previously proposed by Corentin in P2348R3 (Whitespaces
>> Wording Revamp) <https://wg21.link/p2348r3>.
>>
>>
>>
>> Tom.
>>
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>> Link to this post: http://lists.isocpp.org/sg16/2025/05/4571.php
>>
>
>
Received on 2025-05-14 18:45:34