C++ Logo

sg16

Advanced search

Re: [isocpp-sg16] Agenda for the 2025-07-16 SG16 meeting

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 16 Jul 2025 09:03:38 -0400
This is your friendly reminder that this meeting is happening *today*.

Tom.

On 7/13/25 1:29 AM, Tom Honermann via SG16 wrote:
>
> SG16 will hold a meeting on Wednesday, July 16th, at 19:30 UTC
> (timezone conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20250716T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
>
> If you need a .ics file to import into your calendar, you can download
> it here
> <https://documents.isocpp.org/remote.php/dav/public-calendars/R7imgS2LJD9xfeWN/94A3D3A0-70B9-4847-935F-9453DB2BB216.ics?export>.
>
> The agenda follows.
>
> * P2843R2: Preprocessing is never undefined <https://wg21.link/p2843r2>.
> * LWG issue 4070: Transcoding by
> std::formatter<std::filesystem::path> <https://wg21.link/lwg4070>.
> * LWG issue 4090: Underspecified use of locale facets for
> locale-dependent std::format <https://wg21.link/lwg4090>.
>
> *P2843R2* was approved for C++26 in Sofia. Though this paper has been
> around for a while, it wasn't on my radar for SG16 review until
> recently and scheduling conflicts prevented SG16 review in advance of
> the Sofia meeting. The paper amends [lex.comment]p1
> <https://eel.is/c++draft/lex.comment#1> to strike the highlighted
> wording below.
>
> The characters /* start a comment, which terminates with the
> characters */. These comments do not nest. The characters // start
> a comment, which terminates immediately before the next new-line
> character. *If there is a form-feed or a vertical-tab character in
> such a comment, only whitespace characters shall appear between it
> and the new-line that terminates the comment; no diagnostic is
> required.*
>
> The status quo is thus that the following vertical whitespace
> characters may now appear anywhere in both block and line comments
> (note that U+000A {LINE FEED (LF)} and U+000D {CARRIAGE RETURN (CR)}
> form new-line characters for UTF-8 input files; for other input files,
> new-line is implementation-defined ([lex.phases]p1
> <https://eel.is/c++draft/lex.phases#1.1>)).
>
> * U+000B LINE TABULATION (VT)
> * U+000C FORM FEED (FF)
> * U+0085 NEXT LINE (NEL)
> * U+2028 LINE SEPARATOR
> * U+2029 PARAGRAPH SEPARATOR
>
> These characters likely don't pose a concern in block comments, but
> their allowance in line comments allows for line break spoofing. For
> this reason, the Unicode standard recommends that computer languages
> meet the UAX31-R3a <https://www.unicode.org/reports/tr31/#R3a>
> requirement described in UAX #31 section 4.1, "Whitespace"
> <https://www.unicode.org/reports/tr31/#Whitespace>; see UTS #55
> section 3.2, "Whitespace and Syntax"
> <https://www.unicode.org/reports/tr55/#Whitespace-Syntax>, the
> description of line break spoofing in section 1.2.1, "Line Break
> Spoofing" <https://www.unicode.org/reports/tr55/#Spoofing-LB>,
> presentation recommendations in section 4.1.1, "Atoms"
> <https://www.unicode.org/reports/tr55/#Atoms>, the Unicode 16.0
> section 5.8, "Newline Guidelines"
> <https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-5/#G10213>,
> and finally UAX #14 section 6, "Line Breaking Algorithm"
> <https://unicode.org/reports/tr14/#Algorithm>. C++ does not currently
> claim conformance with the UAX31-R3a requirement ([uaxid.pattern]
> <https://eel.is/c++draft/uaxid.pattern>).
>
> There are at least a couple of ways in which the line break spoofing
> concerns can be mitigated:
>
> 1. By mapping all vertical whitespace characters to new-line and
> thereby conforming to UAX31-R3a-1
> <https://www.unicode.org/reports/tr31/#R3a-1>.
> 2. By prohibiting vertical whitespace characters that are not mapped
> to new-line in line comments thereby conforming to UAX31-R3a-2
> <https://www.unicode.org/reports/tr31/#R3a-2> (with the
> declaration of a profile).
>
> Alternatively, this concern can be deemed a presentation issue that
> does not warrant language restrictions.
>
> This issue was discussed in EWG in Sofia. The minutes
> <https://wiki.edg.com/bin/view/Wg21sofia2025/NotesEWGP2843> are very
> terse though. EWG was apparently reluctant to adopt the Unicode
> recommendation, but did not want to preserve the previous status-quo
> of (some) vertical whitespace characters rendering the program IFNDR.
> Some implementors expressed concerns that existing implementations
> take advantage of vector instructions to quickly scan and discard line
> comments and that a requirement to diagnose would prevent such
> techniques. I extended an invitation to the people that made those
> comments to attend the SG16 meeting so that this concern can be
> clearly discussed.
>
> An NB comment is anticipated; this discussion is intended to prepare
> SG16 to offer a recommendation.
>
> *LWG 4070* and *LWG 4090* were last discussed by SG16 during the
> 2024-06-12 SG16 meeting
> <https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2024.md#june-12th-2024>.
> Proposed resolutions guided by the previous SG16 discussion were
> recently drafted. We'll review and poll forwarding to LWG.
>
> Tom.
>
>

Received on 2025-07-16 13:03:49