Date: Thu, 26 Feb 2026 09:02:09 +0100
On Wed, Feb 25, 2026 at 11:15 PM Hubert Tong via SG16 <sg16_at_[hidden]>
wrote:
> On Wed, Feb 25, 2026 at 4:53 PM Tom Honermann via SG16 <
> sg16_at_[hidden]> wrote:
>
>> On 2/25/26 3:10 PM, Corentin Jabot via SG16 wrote:
>>
>>
>>
>> On Wed, Feb 25, 2026, 20:31 Tom Honermann via SG16 <sg16_at_[hidden]>
>> wrote:
>>
>>> While re-reading the papers today, I encountered a couple of questions
>>> related to lexing of interpolated literals and handling of digraphs and
>>> UCNs. If time permits today, we can discuss these examples. I'm using the
>>> syntax from P3412R3 below, but I think the questions apply to P2951R0 as
>>> well.
>>>
>>> P3412R3 section 7.1 prompted me to think of these lambda examples
>>> concerning lexical scanning for ',' and ':'. Note that angle brackets are
>>> not used for bracket matching. These might be worth adding as examples.
>>>
>>> - f"{[]<int,int>{}}" // lambda must be enclosed in
>>> parenthesis.
>>> - f"{[] post (r:r>0) { return 1; }}" // lambda must be enclosed in
>>> parenthesis.
>>>
>>> Ignore the examples above. They are nonsense that I put together too
>> quickly without thinking things through. I was trying to find counter
>> examples to the following claim from section 7.1 of P3412R3
>> <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3412r3.pdf>.
>> I failed. Thanks to Barry for correcting me offline.
>>
>> Note that lambdas, which may contain any type of code including for
>> instance goto labels, always contain this code inside matched braces, so
>> any colons will be ignored when detecting the expression-field end. The
>> same goes for *statement-expressions* of gcc and *blocks* of Clang.
>>
>>
>>>
>>> It makes sense for digraphs to not be recognized as part of an
>>> f-literal; that is consistent with string literals. Within string literals,
>>> digraphs aren't needed because UCNs can be used to specify characters that
>>> are in the basic character set but not necessarily available in the input
>>> source file encoding. Normally, UCNs are not permitted to match members of
>>> the basic character set in language syntax. What about in extraction
>>> fields? Should the following be well-formed?
>>>
>>> - f"The answer is {boolVar ? 17 \u003A 42}"; // U+003A is ':'
>>>
>>> There appears to be a tension with regard to lexical scanning of
>>> f-literals and parsing of extraction fields. Can macros allow for use of
>>> digraphs and UCNs in extraction fields? Should these be well-formed?
>>>
>>> - #define COLON :
>>> f"The answer is {boolVar ? 17 COLON 42}";
>>> - #define LEFT_SQUARE_BRACKET <:
>>> #define RIGHT_SQUARE_BRACKET :>
>>> f"The size is {sizeof int LEFT_SQUARE_BRACKET 42
>>> RIGHT_SQUARE_BRACKET}";
>>>
>>> Tom.
>>>
>> When parsing an embedded expression, they should be parsed as expressions
>> (digraphs, no ASCII ucn etc). when parsing the string literal outside of
>> embedded expression fragments, the rules of string literals should apply
>> (no digraphs, etc).
>>
>> That matches what I was thinking and what lead me to ask the question.
>>
>>
>> anything else would be a great implementation burden and weird form a
>> user standpoint.
>>
>> Whether macro expands seems like an lewg question (if we follow the model
>> of these being nested expressions, then macro expansion should happen for
>> consistency but it depends when an implementation can actually parse these
>> fragments)
>>
>> Yes, that is a LEWG question. But it is relevant to SG16 with regard to
>> the ability to specify the '[', ']', '{', '}', '#', and '##' tokens in
>> extraction fields in source files that have an encoding that doesn't
>> support them. If macro expansion doesn't occur, then we presumably need to
>> support use of digraphs or UCNs in some way.
>>
> I think you guys meant EWG (not LEWG). If there is no '{' or '}' in the
> source file encoding, then I think the only answer is that the feature
> cannot be used with such source files.
>
How would such an implementation deal with std::format today?
>
> -- HT
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
> Link to this post: http://lists.isocpp.org/sg16/2026/02/4684.php
>
wrote:
> On Wed, Feb 25, 2026 at 4:53 PM Tom Honermann via SG16 <
> sg16_at_[hidden]> wrote:
>
>> On 2/25/26 3:10 PM, Corentin Jabot via SG16 wrote:
>>
>>
>>
>> On Wed, Feb 25, 2026, 20:31 Tom Honermann via SG16 <sg16_at_[hidden]>
>> wrote:
>>
>>> While re-reading the papers today, I encountered a couple of questions
>>> related to lexing of interpolated literals and handling of digraphs and
>>> UCNs. If time permits today, we can discuss these examples. I'm using the
>>> syntax from P3412R3 below, but I think the questions apply to P2951R0 as
>>> well.
>>>
>>> P3412R3 section 7.1 prompted me to think of these lambda examples
>>> concerning lexical scanning for ',' and ':'. Note that angle brackets are
>>> not used for bracket matching. These might be worth adding as examples.
>>>
>>> - f"{[]<int,int>{}}" // lambda must be enclosed in
>>> parenthesis.
>>> - f"{[] post (r:r>0) { return 1; }}" // lambda must be enclosed in
>>> parenthesis.
>>>
>>> Ignore the examples above. They are nonsense that I put together too
>> quickly without thinking things through. I was trying to find counter
>> examples to the following claim from section 7.1 of P3412R3
>> <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3412r3.pdf>.
>> I failed. Thanks to Barry for correcting me offline.
>>
>> Note that lambdas, which may contain any type of code including for
>> instance goto labels, always contain this code inside matched braces, so
>> any colons will be ignored when detecting the expression-field end. The
>> same goes for *statement-expressions* of gcc and *blocks* of Clang.
>>
>>
>>>
>>> It makes sense for digraphs to not be recognized as part of an
>>> f-literal; that is consistent with string literals. Within string literals,
>>> digraphs aren't needed because UCNs can be used to specify characters that
>>> are in the basic character set but not necessarily available in the input
>>> source file encoding. Normally, UCNs are not permitted to match members of
>>> the basic character set in language syntax. What about in extraction
>>> fields? Should the following be well-formed?
>>>
>>> - f"The answer is {boolVar ? 17 \u003A 42}"; // U+003A is ':'
>>>
>>> There appears to be a tension with regard to lexical scanning of
>>> f-literals and parsing of extraction fields. Can macros allow for use of
>>> digraphs and UCNs in extraction fields? Should these be well-formed?
>>>
>>> - #define COLON :
>>> f"The answer is {boolVar ? 17 COLON 42}";
>>> - #define LEFT_SQUARE_BRACKET <:
>>> #define RIGHT_SQUARE_BRACKET :>
>>> f"The size is {sizeof int LEFT_SQUARE_BRACKET 42
>>> RIGHT_SQUARE_BRACKET}";
>>>
>>> Tom.
>>>
>> When parsing an embedded expression, they should be parsed as expressions
>> (digraphs, no ASCII ucn etc). when parsing the string literal outside of
>> embedded expression fragments, the rules of string literals should apply
>> (no digraphs, etc).
>>
>> That matches what I was thinking and what lead me to ask the question.
>>
>>
>> anything else would be a great implementation burden and weird form a
>> user standpoint.
>>
>> Whether macro expands seems like an lewg question (if we follow the model
>> of these being nested expressions, then macro expansion should happen for
>> consistency but it depends when an implementation can actually parse these
>> fragments)
>>
>> Yes, that is a LEWG question. But it is relevant to SG16 with regard to
>> the ability to specify the '[', ']', '{', '}', '#', and '##' tokens in
>> extraction fields in source files that have an encoding that doesn't
>> support them. If macro expansion doesn't occur, then we presumably need to
>> support use of digraphs or UCNs in some way.
>>
> I think you guys meant EWG (not LEWG). If there is no '{' or '}' in the
> source file encoding, then I think the only answer is that the feature
> cannot be used with such source files.
>
How would such an implementation deal with std::format today?
>
> -- HT
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
> Link to this post: http://lists.isocpp.org/sg16/2026/02/4684.php
>
Received on 2026-02-26 08:02:33
