Date: Thu, 26 Feb 2026 09:57:25 -0500
On Thu, Feb 26, 2026 at 3:02 AM Corentin Jabot <corentinjabot_at_[hidden]>
wrote:
>
>
> On Wed, Feb 25, 2026 at 11:15 PM Hubert Tong via SG16 <
> sg16_at_[hidden]> wrote:
>
>> On Wed, Feb 25, 2026 at 4:53 PM Tom Honermann via SG16 <
>> sg16_at_[hidden]> wrote:
>>
>>> On 2/25/26 3:10 PM, Corentin Jabot via SG16 wrote:
>>>
>>>
>>>
>>> On Wed, Feb 25, 2026, 20:31 Tom Honermann via SG16 <
>>> sg16_at_[hidden]> wrote:
>>>
>>>> While re-reading the papers today, I encountered a couple of questions
>>>> related to lexing of interpolated literals and handling of digraphs and
>>>> UCNs. If time permits today, we can discuss these examples. I'm using the
>>>> syntax from P3412R3 below, but I think the questions apply to P2951R0 as
>>>> well.
>>>>
>>>> P3412R3 section 7.1 prompted me to think of these lambda examples
>>>> concerning lexical scanning for ',' and ':'. Note that angle brackets are
>>>> not used for bracket matching. These might be worth adding as examples.
>>>>
>>>> - f"{[]<int,int>{}}" // lambda must be enclosed in
>>>> parenthesis.
>>>> - f"{[] post (r:r>0) { return 1; }}" // lambda must be enclosed in
>>>> parenthesis.
>>>>
>>>> Ignore the examples above. They are nonsense that I put together too
>>> quickly without thinking things through. I was trying to find counter
>>> examples to the following claim from section 7.1 of P3412R3
>>> <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3412r3.pdf>.
>>> I failed. Thanks to Barry for correcting me offline.
>>>
>>> Note that lambdas, which may contain any type of code including for
>>> instance goto labels, always contain this code inside matched braces, so
>>> any colons will be ignored when detecting the expression-field end. The
>>> same goes for *statement-expressions* of gcc and *blocks* of Clang.
>>>
>>>
>>>>
>>>> It makes sense for digraphs to not be recognized as part of an
>>>> f-literal; that is consistent with string literals. Within string literals,
>>>> digraphs aren't needed because UCNs can be used to specify characters that
>>>> are in the basic character set but not necessarily available in the input
>>>> source file encoding. Normally, UCNs are not permitted to match members of
>>>> the basic character set in language syntax. What about in extraction
>>>> fields? Should the following be well-formed?
>>>>
>>>> - f"The answer is {boolVar ? 17 \u003A 42}"; // U+003A is ':'
>>>>
>>>> There appears to be a tension with regard to lexical scanning of
>>>> f-literals and parsing of extraction fields. Can macros allow for use of
>>>> digraphs and UCNs in extraction fields? Should these be well-formed?
>>>>
>>>> - #define COLON :
>>>> f"The answer is {boolVar ? 17 COLON 42}";
>>>> - #define LEFT_SQUARE_BRACKET <:
>>>> #define RIGHT_SQUARE_BRACKET :>
>>>> f"The size is {sizeof int LEFT_SQUARE_BRACKET 42
>>>> RIGHT_SQUARE_BRACKET}";
>>>>
>>>> Tom.
>>>>
>>> When parsing an embedded expression, they should be parsed as
>>> expressions (digraphs, no ASCII ucn etc). when parsing the string literal
>>> outside of embedded expression fragments, the rules of string literals
>>> should apply (no digraphs, etc).
>>>
>>> That matches what I was thinking and what lead me to ask the question.
>>>
>>>
>>> anything else would be a great implementation burden and weird form a
>>> user standpoint.
>>>
>>> Whether macro expands seems like an lewg question (if we follow the
>>> model of these being nested expressions, then macro expansion should happen
>>> for consistency but it depends when an implementation can actually parse
>>> these fragments)
>>>
>>> Yes, that is a LEWG question. But it is relevant to SG16 with regard to
>>> the ability to specify the '[', ']', '{', '}', '#', and '##' tokens in
>>> extraction fields in source files that have an encoding that doesn't
>>> support them. If macro expansion doesn't occur, then we presumably need to
>>> support use of digraphs or UCNs in some way.
>>>
>> I think you guys meant EWG (not LEWG). If there is no '{' or '}' in the
>> source file encoding, then I think the only answer is that the feature
>> cannot be used with such source files.
>>
>
> How would such an implementation deal with std::format today?
>
I am not convinced this arises in practice; however:
For std::format usage, it is possible to handle this by using escape
sequences in the string.
Trigraphs also exist (as an extension in newer versions of C++) as a
solution for non-raw strings.
I suppose the trigraph solution also works for the new feature, but I don't
want to promote it in any way since it affects code portability.
For EBCDIC, it is not the case that the character does not exist (AFAIK):
It is merely the case that the character is encoded in different positions
across EBCDIC variants.
In an EBCDIC scenario, the IBM compiler provides the ability for a user to
declare which EBCDIC encoding is used. Both in-file and out-of-file
solutions exist, including filesystem metadata support in certain
environments.
-- HT
wrote:
>
>
> On Wed, Feb 25, 2026 at 11:15 PM Hubert Tong via SG16 <
> sg16_at_[hidden]> wrote:
>
>> On Wed, Feb 25, 2026 at 4:53 PM Tom Honermann via SG16 <
>> sg16_at_[hidden]> wrote:
>>
>>> On 2/25/26 3:10 PM, Corentin Jabot via SG16 wrote:
>>>
>>>
>>>
>>> On Wed, Feb 25, 2026, 20:31 Tom Honermann via SG16 <
>>> sg16_at_[hidden]> wrote:
>>>
>>>> While re-reading the papers today, I encountered a couple of questions
>>>> related to lexing of interpolated literals and handling of digraphs and
>>>> UCNs. If time permits today, we can discuss these examples. I'm using the
>>>> syntax from P3412R3 below, but I think the questions apply to P2951R0 as
>>>> well.
>>>>
>>>> P3412R3 section 7.1 prompted me to think of these lambda examples
>>>> concerning lexical scanning for ',' and ':'. Note that angle brackets are
>>>> not used for bracket matching. These might be worth adding as examples.
>>>>
>>>> - f"{[]<int,int>{}}" // lambda must be enclosed in
>>>> parenthesis.
>>>> - f"{[] post (r:r>0) { return 1; }}" // lambda must be enclosed in
>>>> parenthesis.
>>>>
>>>> Ignore the examples above. They are nonsense that I put together too
>>> quickly without thinking things through. I was trying to find counter
>>> examples to the following claim from section 7.1 of P3412R3
>>> <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3412r3.pdf>.
>>> I failed. Thanks to Barry for correcting me offline.
>>>
>>> Note that lambdas, which may contain any type of code including for
>>> instance goto labels, always contain this code inside matched braces, so
>>> any colons will be ignored when detecting the expression-field end. The
>>> same goes for *statement-expressions* of gcc and *blocks* of Clang.
>>>
>>>
>>>>
>>>> It makes sense for digraphs to not be recognized as part of an
>>>> f-literal; that is consistent with string literals. Within string literals,
>>>> digraphs aren't needed because UCNs can be used to specify characters that
>>>> are in the basic character set but not necessarily available in the input
>>>> source file encoding. Normally, UCNs are not permitted to match members of
>>>> the basic character set in language syntax. What about in extraction
>>>> fields? Should the following be well-formed?
>>>>
>>>> - f"The answer is {boolVar ? 17 \u003A 42}"; // U+003A is ':'
>>>>
>>>> There appears to be a tension with regard to lexical scanning of
>>>> f-literals and parsing of extraction fields. Can macros allow for use of
>>>> digraphs and UCNs in extraction fields? Should these be well-formed?
>>>>
>>>> - #define COLON :
>>>> f"The answer is {boolVar ? 17 COLON 42}";
>>>> - #define LEFT_SQUARE_BRACKET <:
>>>> #define RIGHT_SQUARE_BRACKET :>
>>>> f"The size is {sizeof int LEFT_SQUARE_BRACKET 42
>>>> RIGHT_SQUARE_BRACKET}";
>>>>
>>>> Tom.
>>>>
>>> When parsing an embedded expression, they should be parsed as
>>> expressions (digraphs, no ASCII ucn etc). when parsing the string literal
>>> outside of embedded expression fragments, the rules of string literals
>>> should apply (no digraphs, etc).
>>>
>>> That matches what I was thinking and what lead me to ask the question.
>>>
>>>
>>> anything else would be a great implementation burden and weird form a
>>> user standpoint.
>>>
>>> Whether macro expands seems like an lewg question (if we follow the
>>> model of these being nested expressions, then macro expansion should happen
>>> for consistency but it depends when an implementation can actually parse
>>> these fragments)
>>>
>>> Yes, that is a LEWG question. But it is relevant to SG16 with regard to
>>> the ability to specify the '[', ']', '{', '}', '#', and '##' tokens in
>>> extraction fields in source files that have an encoding that doesn't
>>> support them. If macro expansion doesn't occur, then we presumably need to
>>> support use of digraphs or UCNs in some way.
>>>
>> I think you guys meant EWG (not LEWG). If there is no '{' or '}' in the
>> source file encoding, then I think the only answer is that the feature
>> cannot be used with such source files.
>>
>
> How would such an implementation deal with std::format today?
>
I am not convinced this arises in practice; however:
For std::format usage, it is possible to handle this by using escape
sequences in the string.
Trigraphs also exist (as an extension in newer versions of C++) as a
solution for non-raw strings.
I suppose the trigraph solution also works for the new feature, but I don't
want to promote it in any way since it affects code portability.
For EBCDIC, it is not the case that the character does not exist (AFAIK):
It is merely the case that the character is encoded in different positions
across EBCDIC variants.
In an EBCDIC scenario, the IBM compiler provides the ability for a user to
declare which EBCDIC encoding is used. Both in-file and out-of-file
solutions exist, including filesystem metadata support in certain
environments.
-- HT
Received on 2026-02-26 14:57:55
