sg16: Re: [SG16] Whitespaces again

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Wed, 22 Sep 2021 10:44:14 -0400

On Wed, Sep 22, 2021 at 7:21 AM Corentin <corentin.jabot_at_[hidden]> wrote:

> Thanks for the feedbacks
> New draft https://isocpp.org/files/papers/D2348R2.pdf
>

Thanks.

>
> On Wed, Sep 22, 2021 at 6:54 AM Hubert Tong <
> hubert.reinterpretcast_at_[hidden]> wrote:
>
>>
>> In [lex.whitespaces]:
>> A disambiguation rule is required to prefer matching CRLF as a
>> *line-break* instead of two *line-break*s.
>>
>
> See Jens' reply, which I agree with
>

Although the original formulation I gave for the "each non-whitespace
character" preprocessor token case covered this for everything other than
the raw string literal case, you still need a disambiguation rule to manage
the raw string literal case.

>
>>
>> The [lex.string], the "*line-break*" in a raw string literal wording
>> could be more explicit about scanning for line-breaks (sequences matching a
>> *line-break* is not a *line-break* "for free"; it is a *line-break* if,
>> for example, the grammar asks for a *line-break*).
>> This can be done by adding *line-break* under the *r-char* grammar and
>> adjusting the other *r-char* case with the formula from
>> *single-line-comment-elem*.
>>
>
> I am not sure I agree with all of that, but I do agree that there isn't no
> line-break as such in a raw string literals.
> I've replaced it with " A sequence of characters that matches the grammar
> of line-break ..."
>

That approach is fine. Still need to watch out for the CRLF versus CR + LF
ambiguity.

>
>
>> In [lex.pptoken]:
>> The instances of "non-whitespace character" with respect to the "cannot
>> be one of the above" case is problematic if the interpretation leaves us
>> with cases where there are Unicode whitespace characters that are a part of
>> neither a preprocessing token nor a *whitespace*. That's a new
>> situation, which the surrounding wording could not be relied upon to handle
>> in a straightforward manner.
>>
>
> Changed to "each character that is not part of a whitespace and that
> cannot be one of the above"
> Might need further massaging when merging
>

The "is not part of a *whitespace*" would only be okay if there was a rule
elsewhere to prefer interpreting comments as comments. It seems that rule
is missing. The "cannot be" formulation works the same way as the existing
"cannot be one of the above": it indicates the preference in interpretation
and avoids the question of whether a character has a certain property while
the determination of whether it has that property is in play.

>
>
>>
>> This could be fixed by replacing:
>> each non-whitespace character that cannot be one of the above
>> =>
>> each character that cannot be considered part of a *whitespace* and
>> cannot be one of the above
>>
>> This also happens to fix a pre-existing issue that the wording is rather
>> weak on preferring to interpret comments as comments.
>>
>

Received on 2021-09-22 09:44:45