Date: Mon, 12 Apr 2021 14:51:19 +0200
Aaron,
on Mon, 12 Apr 2021 07:59:03 -0400 you (Aaron Ballman
<aaron_at_[hidden]>) wrote:
> > Then the token sequence
> >
> > `[` `something` `]`
> >
> > can be two things, namely either a designator or the start of a
> > lambda. Both of these can appear in the same context in an
> > initialization of an array.
> >
> > This already needs a lookahead that is 2 or 3 tokens to
> > disambiguate, and integrating that into the parser already needs
> > some lifting. (And if `something` is not an ICE for an
> > implementation, they still must add a disambiguation rule, here.)
> >
> > The attempted introduction of designated intitializers into C++
> > should produce the same ambiguity. But here `something` is
> > definitively an ICE.
> >
> > If we add attributes to the picture, things really become
> > "interesting". In the LALR grammar this adds about 20 shift/reduce
> > conflicts. The token sequence
> >
> > `int` `A` `[` `[` `HAL`
> >
> > could be introducing a declaration of
> >
> > - a VLA where the bound will be given by a lambda
> > (disabiguated by a following `]`, `=` or `,` token)
> >
> > - an `int` object with a vendor attribute for vendor "HAL" to
> > the identifier `A` (disambiguated by a following `∷` token)
> >
>
> I agree that introducing lambda syntax to C would cause a parsing
> ambiguity there. C++ has the same, it could be a regular array where
> the bound is given by a constexpr lambda or an int object with an
> attribute.
>
> C++ makes this unambiguously an attribute per [dcl.attr.grammar]p7 and
> we do not have any such disambiguation rule yet for C (I seem to
> recall bringing this up in one of our many discussions about the
> syntax, and I *think* the rationale was because we didn't know of any
> current ambiguities in C that would require the rule).
>
> FWIW, in C++ users can disambiguate themselves using parentheses.
> e.g., int a [([HAL]() constexpr { return 12; }())]; // array, not
> attribute
Indeed that would be a way for applications to clearly mark their
intent.
> > The high amount of shift/reduce conflicts come from the fact that
> > there are already so many different possibilities for VLA, and even
> > in two places (regular declarators and abstract declarators), and
> > that the attribute also has two possibilities, namely also to start
> > with a standard attribute. The worst I think is
> >
> > `int` `A` `[` `[` `deprecated`
> >
> > It could be introducing
> >
> > - a VLA where the bound will be given by a lambda
> > (disabiguated by a following `=` or `,` token)
> >
> > - the sequence
> >
> > `int` `A` `[` `[` `deprecated` `]`
> >
> > which in turn could be introducing
> >
> > - a VLA where the bound will be given by a lambda
> > (disabiguated by a following `(`, `[` or `{` token)
> >
> > - a deprecated `int` object `A` (disambiguated by a
> > following `]` token)
> >
> > It is nowhere enshrined that C has to stay with a LALR grammar, but
> > I think if we abandon that possibility we should at least make such
> > a decision knowingly. What the examples above show
> >
> > - making attribute names keywords does not help much
> > because of vendor specific attributes
> >
> > - the real culprit is the token sequence `[` `[` which
> > introduces all of these conflicts
> >
> > For the latter, I tested to introduce `[[` as a token for the start
> > of attributes, and all the ambiguity disappears nicely. It has to be
> > noted that this sequence cannot appear in a valid C17 program, so
> > any change that we make for `[` `[` in a row does not impact
> > existing C code. The only impact for users of C23 would be that
> > when they want to use a lambda in an array bound (which is a new
> > feature) they'd have to put spaces between the `[` `[`.
>
> C's maximal munch rule (6.6p4) would cause problems for
> implementations that also support C-derivative languages like
> Objective-C, where the [[ tokens appear *very* frequently due to the
> message passing syntax that they use. We'd effectively have to "undo"
> the formation of that token, similar to the mess we already have to go
> through for undoing turning >> into > and > in some circumstances in
> C++. In C++, this was pretty reasonable because the >> into two >
> tokens only occurs in very specific contexts with declarations,
> whereas [[ in Objective-C appears naturally as part of expressions
> that get used much more frequently and so it's less clear to me how
> palatable such a change would be.
If we go like that, implementations that don't have these problems
because they don't implement other languages with these double
brackets could still use a `[[` token and map the token pair `[` `[`
to that special token.
It is a bit user unfriendly because by basic experience with C++
people would probably assume that separating the two `[` should
suffice to disambiguate.
> > Doing so would introduce a surface incompatibilty with C++. On the
> > other hand, my guess would be that C++ better have the same sort of
> > disambiguation strategy, because now a called lambda can be a
> > integer constant expression for them. So for C++ you could replace
> > VLA above by array, and you'd be in the same sort of mess.
>
> C++ disambiguates differently and rather than using a new token that
> C++ doesn't have, I'd hope that we could explore using the same
> disambiguation strategy as C++ has already used because there's
> significant implementation experience with the C++ formulation and
> some known implementation concerns with the introduction of a new [[
> token (at least for some C implementations).
In this particular case C++ experience for the syntax is not so
convincing, because the grammar concerning `[` is finally a bit
different. We have different constructs with different properties
(VLA, designators).
But if that is wanted I can add such a rule to the basic lambda paper.
Jens
on Mon, 12 Apr 2021 07:59:03 -0400 you (Aaron Ballman
<aaron_at_[hidden]>) wrote:
> > Then the token sequence
> >
> > `[` `something` `]`
> >
> > can be two things, namely either a designator or the start of a
> > lambda. Both of these can appear in the same context in an
> > initialization of an array.
> >
> > This already needs a lookahead that is 2 or 3 tokens to
> > disambiguate, and integrating that into the parser already needs
> > some lifting. (And if `something` is not an ICE for an
> > implementation, they still must add a disambiguation rule, here.)
> >
> > The attempted introduction of designated intitializers into C++
> > should produce the same ambiguity. But here `something` is
> > definitively an ICE.
> >
> > If we add attributes to the picture, things really become
> > "interesting". In the LALR grammar this adds about 20 shift/reduce
> > conflicts. The token sequence
> >
> > `int` `A` `[` `[` `HAL`
> >
> > could be introducing a declaration of
> >
> > - a VLA where the bound will be given by a lambda
> > (disabiguated by a following `]`, `=` or `,` token)
> >
> > - an `int` object with a vendor attribute for vendor "HAL" to
> > the identifier `A` (disambiguated by a following `∷` token)
> >
>
> I agree that introducing lambda syntax to C would cause a parsing
> ambiguity there. C++ has the same, it could be a regular array where
> the bound is given by a constexpr lambda or an int object with an
> attribute.
>
> C++ makes this unambiguously an attribute per [dcl.attr.grammar]p7 and
> we do not have any such disambiguation rule yet for C (I seem to
> recall bringing this up in one of our many discussions about the
> syntax, and I *think* the rationale was because we didn't know of any
> current ambiguities in C that would require the rule).
>
> FWIW, in C++ users can disambiguate themselves using parentheses.
> e.g., int a [([HAL]() constexpr { return 12; }())]; // array, not
> attribute
Indeed that would be a way for applications to clearly mark their
intent.
> > The high amount of shift/reduce conflicts come from the fact that
> > there are already so many different possibilities for VLA, and even
> > in two places (regular declarators and abstract declarators), and
> > that the attribute also has two possibilities, namely also to start
> > with a standard attribute. The worst I think is
> >
> > `int` `A` `[` `[` `deprecated`
> >
> > It could be introducing
> >
> > - a VLA where the bound will be given by a lambda
> > (disabiguated by a following `=` or `,` token)
> >
> > - the sequence
> >
> > `int` `A` `[` `[` `deprecated` `]`
> >
> > which in turn could be introducing
> >
> > - a VLA where the bound will be given by a lambda
> > (disabiguated by a following `(`, `[` or `{` token)
> >
> > - a deprecated `int` object `A` (disambiguated by a
> > following `]` token)
> >
> > It is nowhere enshrined that C has to stay with a LALR grammar, but
> > I think if we abandon that possibility we should at least make such
> > a decision knowingly. What the examples above show
> >
> > - making attribute names keywords does not help much
> > because of vendor specific attributes
> >
> > - the real culprit is the token sequence `[` `[` which
> > introduces all of these conflicts
> >
> > For the latter, I tested to introduce `[[` as a token for the start
> > of attributes, and all the ambiguity disappears nicely. It has to be
> > noted that this sequence cannot appear in a valid C17 program, so
> > any change that we make for `[` `[` in a row does not impact
> > existing C code. The only impact for users of C23 would be that
> > when they want to use a lambda in an array bound (which is a new
> > feature) they'd have to put spaces between the `[` `[`.
>
> C's maximal munch rule (6.6p4) would cause problems for
> implementations that also support C-derivative languages like
> Objective-C, where the [[ tokens appear *very* frequently due to the
> message passing syntax that they use. We'd effectively have to "undo"
> the formation of that token, similar to the mess we already have to go
> through for undoing turning >> into > and > in some circumstances in
> C++. In C++, this was pretty reasonable because the >> into two >
> tokens only occurs in very specific contexts with declarations,
> whereas [[ in Objective-C appears naturally as part of expressions
> that get used much more frequently and so it's less clear to me how
> palatable such a change would be.
If we go like that, implementations that don't have these problems
because they don't implement other languages with these double
brackets could still use a `[[` token and map the token pair `[` `[`
to that special token.
It is a bit user unfriendly because by basic experience with C++
people would probably assume that separating the two `[` should
suffice to disambiguate.
> > Doing so would introduce a surface incompatibilty with C++. On the
> > other hand, my guess would be that C++ better have the same sort of
> > disambiguation strategy, because now a called lambda can be a
> > integer constant expression for them. So for C++ you could replace
> > VLA above by array, and you'd be in the same sort of mess.
>
> C++ disambiguates differently and rather than using a new token that
> C++ doesn't have, I'd hope that we could explore using the same
> disambiguation strategy as C++ has already used because there's
> significant implementation experience with the C++ formulation and
> some known implementation concerns with the introduction of a new [[
> token (at least for some C implementations).
In this particular case C++ experience for the syntax is not so
convincing, because the grammar concerning `[` is finally a bit
different. We have different constructs with different properties
(VLA, designators).
But if that is wanted I can add such a rule to the basic lambda paper.
Jens
-- :: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::
Received on 2021-04-12 07:51:24