On 20/03/2024 21.18, Richard Smith via Core wrote:
> The resolution of CWG232 and CWG2823 means that code such as &p[0] is no longer valid[*] when p is a null pointer.
> [*] Formally, such code has always had undefined behavior in ISO C++,
Exactly. All we did was document that undefined behavior properly.
> but we've had an agreed-upon suggested resolution for CWG232 to allow it for over 20 years,
That suggested resolution also had a note that supporting the facility
properly would require introduction of novel concepts such as "empty lvalues"
into the specification. The slight change in viewpoint here is that introducing
such novelty was felt to be evolutionary.
Quote from CWG232:
"There is no consensus to pursue the introduction of empty lvalues, without prejudice to a potential future paper addressed to EWG."
In general, I think CWG should be more aggressive in closing core issues that
clearly ask for evolutionary changes to the language, and instead defer to the
paper process, which allows for more and more detailed accompanying rationale.
> all implementations have allowed it in practice <https://godbolt.org/z/8obK5vMzj>,
Implementations are at liberty to provide functionality for undefined behavior
as they see fit.
> and real world code relies on it.
Real code also relied on the absence of strict aliasing analysis, 30 years ago.
> This choice comes as a surprise to me,
Please resume joining CWG meetings and/or watch the core issues list to avoid
the surprise portion of this.
> and breaks the longstanding model that C++ behaves as if there is a T[0] array at nullptr for every object type T.
I've never heard about such a model. Also, we don't even know what kind of
pointer value you'd get from such an array; see CWG2532.
> It also breaks C compatibility -- C has an explicit rule for the `&` operator:
>
>> If the operand is the result of a unary * operator,
Is it also the "result" of such an operator if it then travels through
the ternary or comma operator, before taking the address?
Clang thinks this looks through parentheses and _Generic, but not commas or ternaries. I am not certain what WG14 intended; maybe our liaison could find out?
>> neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator.
This just means that the common subset of C and C++ is a little smaller
than people appear to have been assuming incorrectly.
> If we're not adding empty lvalues in general, should we adopt the C rule for the & operator in particular, for compatibility with C and existing C++ code?
Please direct this concern to EWG, where evolutionary matters of C++ are handled.
>From a specification standpoint, if we go for the narrow approach here,
I'd appreciate a syntactic formulation (analogous to the change in CWG1954
for typeid of nullptr).
> If we don't adopt the rule from C, I think we should at least add an Annex C entry.
That is, indeed, a CWG concern with the status quo, so let's do that:
https://cplusplus.github.io/CWG/issues/2875.html
I think the second part here,
char *p3 = &p[0]; // well-defined in C, undefined behavior in C++
is UB in C, because it's equivalent to `p + 0`, which is permitted in C++ but not in C.
Jens