ISOCPP sg16 List: Re: [isocpp-sg16] Agenda for the 2026-02-25 SG16 meeting

From: <bengt.gustafsson_at_[hidden]>
Date: Sat, 28 Feb 2026 18:01:40 +0000

This thread is getting messy, so I will try to sum up answers to (hopefully) these questions here.

1. The question about whether \u003A counts as a colon and starts a format specifier:

In P2412 the generated pp-tokens are inspected so \u003A definitely counts as a colon and starts a format specifier. In the P3951 proposal I think that the source text is scanned for colons but it seems unclear whether this happens before or after converting universal-character-names so I won't speculate on this.

2. The lambda examples:

No, those lambdas don't need to be enclosed in parentheses, I think this was settled in an earlier reply.

It is worth noting that the comma in the first lambda does not end the expression field, we're modelling expression and not assignment-expression to be able to write template argument/parameter lists like in this example. In P 3412 the result of each expression field is enclosed in an extra parentesis to avoid treating a, b in f"{a, b} {c}" as two std::format arguments. Unfortunately, if no parenthesis is added this will output a and b and ignore c rather than outputting b and c. I'd like this to be an error but due to the definition of std::format extra arguments are just ignored. The semantics of P3951 is the same.

3. The #define examples:

As macro expansion occurs in phase 4 the : in the macro COLON does not affect the lexing in phase 3. This is the same in both proposals.

The #defines for <: and :> work the same, the alternate tokens are replaced into the expression in phase 4.

4. Regarding alternate tokens in the expression field (no macros involved)

Here the two proposals work differently, and this relates to 1 above. In P3412 the pp-token stream is inspected for the pp-token : so as :>, :] and :: are distinct pp-tokens they don't start a format specifier. In P3951 (as I understand it) a colon in the input character stream starts a format specifier, so at least the first colon in :>, and :: start a format specifier. The colon in :] does not as on the character level it is inside a bracket (assuming that [: and :] are properly nested).

None of the proposals discuss the usage of alternate tokens for brackets and braces in the context of counting parenthesis levels. This needs to be added and the reasonable solution is to follow the rule in [lex.digraph] §2: "In all respects of the language, each alternative token behaves the same, respectively, as its primary token, except for its spelling". This is easier to accomplish with the P3412 approach of inspecting the generated pp-tokens, with P3951 you can't get a :> token so if ] is not available in the source character set you have to use a UCN.

P3412 has another problem which is that :> does not start a format specifier even though the format specifier for a right aligned field with space as fill character can start with a > character. This can however be fixed by explicitly stating the fill character as in : >. A more sophisticated idea was used in an earlier revision of P3412: Treat :> as a right bracket when an unmatched [ (possibly spelled as <: ) has been seen and as a colon starting a format specifier otherwise.

5. Starting expression fields when character sets lacking braces.

I don't think any of the proposals discusses whether the detection of the { that starts an expression-field can be spelled as an UCN but I think it should. This is primarily as this definitively works for std::format format strings and that it provides a way to use f-literals even if the source character set doesn't have braces. This aligns with the reasoning that the string-literal parts of the f-literal should work as a string literal and the expression-field parts should work as a pp-token sequence.

As it works on the pp-token level P3412 needs to address the new parenthesis type [: :] but this is an easy fix of course.

In conclusion I think the differences in these areas between the proposals are fairly small, the main one being that P3951 requires a parenthesis to be able to use the :: operator while P3412 requires a space between : and > when a right alignment is intended, and a :^: combination when a default range formatter with element format specifier is to be used.

Either proposal going forward has a few fairly simple issues to clarify in this area.

Bengt

________________________________
From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Sent: Thursday, February 26, 2026 3:57 PM
To: Corentin Jabot <corentinjabot_at_[hidden]>
Cc: sg16_at_[hidden] <sg16_at_[hidden]>; Tom Honermann <tom_at_[hidden]>; Barry Revzin <barry.revzin_at_[hidden]>; Bengt Gustafsson <bengt.gustafsson_at_[hidden]>
Subject: Re: [isocpp-sg16] Agenda for the 2026-02-25 SG16 meeting

On Thu, Feb 26, 2026 at 3:02 AM Corentin Jabot <corentinjabot_at_[hidden]<mailto:corentinjabot_at_[hidden]>> wrote:

On Wed, Feb 25, 2026 at 11:15 PM Hubert Tong via SG16 <sg16_at_[hidden]<mailto:sg16_at_[hidden]>> wrote:
On Wed, Feb 25, 2026 at 4:53 PM Tom Honermann via SG16 <sg16_at_[hidden]<mailto:sg16_at_[hidden]>> wrote:
On 2/25/26 3:10 PM, Corentin Jabot via SG16 wrote:

On Wed, Feb 25, 2026, 20:31 Tom Honermann via SG16 <sg16_at_[hidden]<mailto:sg16_at_[hidden]>> wrote:

While re-reading the papers today, I encountered a couple of questions related to lexing of interpolated literals and handling of digraphs and UCNs. If time permits today, we can discuss these examples. I'm using the syntax from P3412R3 below, but I think the questions apply to P2951R0 as well.

P3412R3 section 7.1 prompted me to think of these lambda examples concerning lexical scanning for ',' and ':'. Note that angle brackets are not used for bracket matching. These might be worth adding as examples.

  * f"{[]<int,int>{}}" // lambda must be enclosed in parenthesis.
  * f"{[] post (r:r>0) { return 1; }}" // lambda must be enclosed in parenthesis.

Ignore the examples above. They are nonsense that I put together too quickly without thinking things through. I was trying to find counter examples to the following claim from section 7.1 of P3412R3<https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3412r3.pdf>. I failed. Thanks to Barry for correcting me offline.

Note that lambdas, which may contain any type of code including for instance goto labels, always contain this code inside matched braces, so any colons will be ignored when detecting the expression-field end. The same goes for statement-expressions of gcc and blocks of Clang.

It makes sense for digraphs to not be recognized as part of an f-literal; that is consistent with string literals. Within string literals, digraphs aren't needed because UCNs can be used to specify characters that are in the basic character set but not necessarily available in the input source file encoding. Normally, UCNs are not permitted to match members of the basic character set in language syntax. What about in extraction fields? Should the following be well-formed?

  * f"The answer is {boolVar ? 17 \u003A 42}"; // U+003A is ':'

There appears to be a tension with regard to lexical scanning of f-literals and parsing of extraction fields. Can macros allow for use of digraphs and UCNs in extraction fields? Should these be well-formed?

  * #define COLON :
f"The answer is {boolVar ? 17 COLON 42}";
  * #define LEFT_SQUARE_BRACKET <:
#define RIGHT_SQUARE_BRACKET :>
f"The size is {sizeof int LEFT_SQUARE_BRACKET 42 RIGHT_SQUARE_BRACKET}";

Tom.

When parsing an embedded expression, they should be parsed as expressions (digraphs, no ASCII ucn etc). when parsing the string literal outside of embedded expression fragments, the rules of string literals should apply (no digraphs, etc).

That matches what I was thinking and what lead me to ask the question.

anything else would be a great implementation burden and weird form a user standpoint.

Whether macro expands seems like an lewg question (if we follow the model of these being nested expressions, then macro expansion should happen for consistency but it depends when an implementation can actually parse these fragments)

Yes, that is a LEWG question. But it is relevant to SG16 with regard to the ability to specify the '[', ']', '{', '}', '#', and '##' tokens in extraction fields in source files that have an encoding that doesn't support them. If macro expansion doesn't occur, then we presumably need to support use of digraphs or UCNs in some way.

I think you guys meant EWG (not LEWG). If there is no '{' or '}' in the source file encoding, then I think the only answer is that the feature cannot be used with such source files.

How would such an implementation deal with std::format today?

I am not convinced this arises in practice; however:
For std::format usage, it is possible to handle this by using escape sequences in the string.
Trigraphs also exist (as an extension in newer versions of C++) as a solution for non-raw strings.

I suppose the trigraph solution also works for the new feature, but I don't want to promote it in any way since it affects code portability.

For EBCDIC, it is not the case that the character does not exist (AFAIK): It is merely the case that the character is encoded in different positions across EBCDIC variants.
In an EBCDIC scenario, the IBM compiler provides the ability for a user to declare which EBCDIC encoding is used. Both in-file and out-of-file solutions exist, including filesystem metadata support in certain environments.

-- HT

Received on 2026-02-28 18:01:48