Date: Wed, 25 Feb 2026 20:05:28 -0500
On Feb 25, 2026, at 5:17 PM, Hubert Tong <hubert.reinterpretcast_at_[hidden]> wrote:
On Wed, Feb 25, 2026 at 4:53 PM Tom Honermann via SG16 <sg16_at_[hidden]> wrote:I think you guys meant EWG (not LEWG). If there is no '{' or '}' in the source file encoding, then I think the only answer is that the feature cannot be used with such source files.On 2/25/26 3:10 PM, Corentin Jabot via SG16 wrote:
On Wed, Feb 25, 2026, 20:31 Tom Honermann via SG16 <sg16_at_[hidden]> wrote:
While re-reading the papers today, I encountered a couple of questions related to lexing of interpolated literals and handling of digraphs and UCNs. If time permits today, we can discuss these examples. I'm using the syntax from P3412R3 below, but I think the questions apply to P2951R0 as well.
P3412R3 section 7.1 prompted me to think of these lambda examples concerning lexical scanning for ',' and ':'. Note that angle brackets are not used for bracket matching. These might be worth adding as examples.
- f"{[]<int,int>{}}" // lambda must be enclosed in parenthesis.
- f"{[] post (r:r>0) { return 1; }}" // lambda must be enclosed in parenthesis.
Ignore the examples above. They are nonsense that I put together too quickly without thinking things through. I was trying to find counter examples to the following claim from section 7.1 of P3412R3. I failed. Thanks to Barry for correcting me offline.
Note that lambdas, which may contain any type of code including for instance goto labels, always contain this code inside matched braces, so any colons will be ignored when detecting the expression-field end. The same goes for statement-expressions of gcc and blocks of Clang.
It makes sense for digraphs to not be recognized as part of an f-literal; that is consistent with string literals. Within string literals, digraphs aren't needed because UCNs can be used to specify characters that are in the basic character set but not necessarily available in the input source file encoding. Normally, UCNs are not permitted to match members of the basic character set in language syntax. What about in extraction fields? Should the following be well-formed?
- f"The answer is {boolVar ? 17 \u003A 42}"; // U+003A is ':'
There appears to be a tension with regard to lexical scanning of f-literals and parsing of extraction fields. Can macros allow for use of digraphs and UCNs in extraction fields? Should these be well-formed?
- #define COLON :
f"The answer is {boolVar ? 17 COLON 42}";- #define LEFT_SQUARE_BRACKET <:
#define RIGHT_SQUARE_BRACKET :>
f"The size is {sizeof int LEFT_SQUARE_BRACKET 42 RIGHT_SQUARE_BRACKET}";Tom.
When parsing an embedded expression, they should be parsed as expressions (digraphs, no ASCII ucn etc). when parsing the string literal outside of embedded expression fragments, the rules of string literals should apply (no digraphs, etc).That matches what I was thinking and what lead me to ask the question.
anything else would be a great implementation burden and weird form a user standpoint.
Whether macro expands seems like an lewg question (if we follow the model of these being nested expressions, then macro expansion should happen for consistency but it depends when an implementation can actually parse these fragments)Yes, that is a LEWG question. But it is relevant to SG16 with regard to the ability to specify the '[', ']', '{', '}', '#', and '##' tokens in extraction fields in source files that have an encoding that doesn't support them. If macro expansion doesn't occur, then we presumably need to support use of digraphs or UCNs in some way.
-- HT
Received on 2026-02-26 01:05:44
