Date: Wed, 20 Mar 2024 15:07:20 -0700
On Wed, 20 Mar 2024 at 14:49, Jens Maurer <jens.maurer_at_[hidden]> wrote:
>
> On 20/03/2024 21.18, Richard Smith via Core wrote:
> > The resolution of CWG232 and CWG2823 means that code such as &p[0] is no
> longer valid[*] when p is a null pointer.
>
> > [*] Formally, such code has always had undefined behavior in ISO C++,
>
> Exactly. All we did was document that undefined behavior properly.
>
> > but we've had an agreed-upon suggested resolution for CWG232 to allow
> it for over 20 years,
>
> That suggested resolution also had a note that supporting the facility
> properly would require introduction of novel concepts such as "empty
> lvalues"
> into the specification. The slight change in viewpoint here is that
> introducing
> such novelty was felt to be evolutionary.
>
> Quote from CWG232:
>
> "There is no consensus to pursue the introduction of empty lvalues,
> without prejudice to a potential future paper addressed to EWG."
>
> In general, I think CWG should be more aggressive in closing core issues
> that
> clearly ask for evolutionary changes to the language, and instead defer to
> the
> paper process, which allows for more and more detailed accompanying
> rationale.
>
> > all implementations have allowed it in practice <
> https://godbolt.org/z/8obK5vMzj>,
>
> Implementations are at liberty to provide functionality for undefined
> behavior
> as they see fit.
>
> > and real world code relies on it.
>
> Real code also relied on the absence of strict aliasing analysis, 30 years
> ago.
>
> > This choice comes as a surprise to me,
>
> Please resume joining CWG meetings and/or watch the core issues list to
> avoid
> the surprise portion of this.
>
> > and breaks the longstanding model that C++ behaves as if there is a
> T[0] array at nullptr for every object type T.
>
> I've never heard about such a model. Also, we don't even know what kind of
> pointer value you'd get from such an array; see CWG2532.
>
> > It also breaks C compatibility -- C has an explicit rule for the `&`
> operator:
> >
> >> If the operand is the result of a unary * operator,
>
> Is it also the "result" of such an operator if it then travels through
> the ternary or comma operator, before taking the address?
>
Clang thinks this looks through parentheses and _Generic, but not commas or
ternaries. I am not certain what WG14 intended; maybe our liaison could
find out?
>> neither that operator nor the & operator is evaluated and the result is
> as if both were omitted, except that the constraints on the operators still
> apply and the result is not an lvalue. Similarly, if the operand is the
> result of a [] operator, neither the & operator nor the unary * that is
> implied by the [] is evaluated and the result is as if the & operator were
> removed and the [] operator were changed to a + operator.
>
> This just means that the common subset of C and C++ is a little smaller
> than people appear to have been assuming incorrectly.
>
> > If we're not adding empty lvalues in general, should we adopt the C rule
> for the & operator in particular, for compatibility with C and existing C++
> code?
>
> Please direct this concern to EWG, where evolutionary matters of C++ are
> handled.
>
> From a specification standpoint, if we go for the narrow approach here,
> I'd appreciate a syntactic formulation (analogous to the change in CWG1954
> for typeid of nullptr).
>
> > If we don't adopt the rule from C, I think we should at least add an
> Annex C entry.
>
> That is, indeed, a CWG concern with the status quo, so let's do that:
>
> https://cplusplus.github.io/CWG/issues/2875.html
I think the second part here,
char *p3 = &p[0]; // well-defined in C, undefined behavior in C++
is UB in C, because it's equivalent to `p + 0`, which is permitted in C++
but not in C.
Jens
>
>
>
> On 20/03/2024 21.18, Richard Smith via Core wrote:
> > The resolution of CWG232 and CWG2823 means that code such as &p[0] is no
> longer valid[*] when p is a null pointer.
>
> > [*] Formally, such code has always had undefined behavior in ISO C++,
>
> Exactly. All we did was document that undefined behavior properly.
>
> > but we've had an agreed-upon suggested resolution for CWG232 to allow
> it for over 20 years,
>
> That suggested resolution also had a note that supporting the facility
> properly would require introduction of novel concepts such as "empty
> lvalues"
> into the specification. The slight change in viewpoint here is that
> introducing
> such novelty was felt to be evolutionary.
>
> Quote from CWG232:
>
> "There is no consensus to pursue the introduction of empty lvalues,
> without prejudice to a potential future paper addressed to EWG."
>
> In general, I think CWG should be more aggressive in closing core issues
> that
> clearly ask for evolutionary changes to the language, and instead defer to
> the
> paper process, which allows for more and more detailed accompanying
> rationale.
>
> > all implementations have allowed it in practice <
> https://godbolt.org/z/8obK5vMzj>,
>
> Implementations are at liberty to provide functionality for undefined
> behavior
> as they see fit.
>
> > and real world code relies on it.
>
> Real code also relied on the absence of strict aliasing analysis, 30 years
> ago.
>
> > This choice comes as a surprise to me,
>
> Please resume joining CWG meetings and/or watch the core issues list to
> avoid
> the surprise portion of this.
>
> > and breaks the longstanding model that C++ behaves as if there is a
> T[0] array at nullptr for every object type T.
>
> I've never heard about such a model. Also, we don't even know what kind of
> pointer value you'd get from such an array; see CWG2532.
>
> > It also breaks C compatibility -- C has an explicit rule for the `&`
> operator:
> >
> >> If the operand is the result of a unary * operator,
>
> Is it also the "result" of such an operator if it then travels through
> the ternary or comma operator, before taking the address?
>
Clang thinks this looks through parentheses and _Generic, but not commas or
ternaries. I am not certain what WG14 intended; maybe our liaison could
find out?
>> neither that operator nor the & operator is evaluated and the result is
> as if both were omitted, except that the constraints on the operators still
> apply and the result is not an lvalue. Similarly, if the operand is the
> result of a [] operator, neither the & operator nor the unary * that is
> implied by the [] is evaluated and the result is as if the & operator were
> removed and the [] operator were changed to a + operator.
>
> This just means that the common subset of C and C++ is a little smaller
> than people appear to have been assuming incorrectly.
>
> > If we're not adding empty lvalues in general, should we adopt the C rule
> for the & operator in particular, for compatibility with C and existing C++
> code?
>
> Please direct this concern to EWG, where evolutionary matters of C++ are
> handled.
>
> From a specification standpoint, if we go for the narrow approach here,
> I'd appreciate a syntactic formulation (analogous to the change in CWG1954
> for typeid of nullptr).
>
> > If we don't adopt the rule from C, I think we should at least add an
> Annex C entry.
>
> That is, indeed, a CWG concern with the status quo, so let's do that:
>
> https://cplusplus.github.io/CWG/issues/2875.html
I think the second part here,
char *p3 = &p[0]; // well-defined in C, undefined behavior in C++
is UB in C, because it's equivalent to `p + 0`, which is permitted in C++
but not in C.
Jens
>
>
Received on 2024-03-20 22:07:33