C++ Logo

sg16

Advanced search

[isocpp-sg16] Agenda for the 2025-07-16 SG16 meeting

From: Tom Honermann <tom_at_[hidden]>
Date: Sun, 13 Jul 2025 01:29:15 -0400
SG16 will hold a meeting on Wednesday, July 16th, at 19:30 UTC (timezone
conversion
<https://www.timeanddate.com/worldclock/converter.html?iso=20250716T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).

If you need a .ics file to import into your calendar, you can download
it here
<https://documents.isocpp.org/remote.php/dav/public-calendars/R7imgS2LJD9xfeWN/94A3D3A0-70B9-4847-935F-9453DB2BB216.ics?export>.

The agenda follows.

  * P2843R2: Preprocessing is never undefined <https://wg21.link/p2843r2>.
  * LWG issue 4070: Transcoding by std::formatter<std::filesystem::path>
    <https://wg21.link/lwg4070>.
  * LWG issue 4090: Underspecified use of locale facets for
    locale-dependent std::format <https://wg21.link/lwg4090>.

*P2843R2* was approved for C++26 in Sofia. Though this paper has been
around for a while, it wasn't on my radar for SG16 review until recently
and scheduling conflicts prevented SG16 review in advance of the Sofia
meeting. The paper amends [lex.comment]p1
<https://eel.is/c++draft/lex.comment#1> to strike the highlighted
wording below.

    The characters /* start a comment, which terminates with the
    characters */. These comments do not nest. The characters // start a
    comment, which terminates immediately before the next new-line
    character. *If there is a form-feed or a vertical-tab character in
    such a comment, only whitespace characters shall appear between it
    and the new-line that terminates the comment; no diagnostic is
    required.*

The status quo is thus that the following vertical whitespace characters
may now appear anywhere in both block and line comments (note that
U+000A {LINE FEED (LF)} and U+000D {CARRIAGE RETURN (CR)} form new-line
characters for UTF-8 input files; for other input files, new-line is
implementation-defined ([lex.phases]p1
<https://eel.is/c++draft/lex.phases#1.1>)).

  * U+000B LINE TABULATION (VT)
  * U+000C FORM FEED (FF)
  * U+0085 NEXT LINE (NEL)
  * U+2028 LINE SEPARATOR
  * U+2029 PARAGRAPH SEPARATOR

These characters likely don't pose a concern in block comments, but
their allowance in line comments allows for line break spoofing. For
this reason, the Unicode standard recommends that computer languages
meet the UAX31-R3a <https://www.unicode.org/reports/tr31/#R3a>
requirement described in UAX #31 section 4.1, "Whitespace"
<https://www.unicode.org/reports/tr31/#Whitespace>; see UTS #55 section
3.2, "Whitespace and Syntax"
<https://www.unicode.org/reports/tr55/#Whitespace-Syntax>, the
description of line break spoofing in section 1.2.1, "Line Break
Spoofing" <https://www.unicode.org/reports/tr55/#Spoofing-LB>,
presentation recommendations in section 4.1.1, "Atoms"
<https://www.unicode.org/reports/tr55/#Atoms>, the Unicode 16.0 section
5.8, "Newline Guidelines"
<https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-5/#G10213>,
and finally UAX #14 section 6, "Line Breaking Algorithm"
<https://unicode.org/reports/tr14/#Algorithm>. C++ does not currently
claim conformance with the UAX31-R3a requirement ([uaxid.pattern]
<https://eel.is/c++draft/uaxid.pattern>).

There are at least a couple of ways in which the line break spoofing
concerns can be mitigated:

 1. By mapping all vertical whitespace characters to new-line and
    thereby conforming to UAX31-R3a-1
    <https://www.unicode.org/reports/tr31/#R3a-1>.
 2. By prohibiting vertical whitespace characters that are not mapped to
    new-line in line comments thereby conforming to UAX31-R3a-2
    <https://www.unicode.org/reports/tr31/#R3a-2> (with the declaration
    of a profile).

Alternatively, this concern can be deemed a presentation issue that does
not warrant language restrictions.

This issue was discussed in EWG in Sofia. The minutes
<https://wiki.edg.com/bin/view/Wg21sofia2025/NotesEWGP2843> are very
terse though. EWG was apparently reluctant to adopt the Unicode
recommendation, but did not want to preserve the previous status-quo of
(some) vertical whitespace characters rendering the program IFNDR. Some
implementors expressed concerns that existing implementations take
advantage of vector instructions to quickly scan and discard line
comments and that a requirement to diagnose would prevent such
techniques. I extended an invitation to the people that made those
comments to attend the SG16 meeting so that this concern can be clearly
discussed.

An NB comment is anticipated; this discussion is intended to prepare
SG16 to offer a recommendation.

*LWG 4070* and *LWG 4090* were last discussed by SG16 during the
2024-06-12 SG16 meeting
<https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2024.md#june-12th-2024>.
Proposed resolutions guided by the previous SG16 discussion were
recently drafted. We'll review and poll forwarding to LWG.

Tom.

Received on 2025-07-13 05:29:21