Date: Wed, 2 Jul 2025 10:34:21 +0200
On 01/07/2025 17:26, Henning Meyer via Std-Proposals wrote:
> I think there was an (unavoidable) missed opportunity in C++98 when
> new[] and delete[] were introduced.
>
> What we have currently is unchanged since 1998 and very C:
>
> new std::string[8] returns an object of type std::string*. It is
> indistinguishable in its type from the result of new std::string.
>
> When you free it, you must remember that it was allocated via new[] and
> pass it into delete[] instead of delete.
> Calling delete instead of delete[] is undefined behavior and may lead to
> crashes in practice not just memory leaks.
>
> Instead, we could have the following:
>
> new T[n] returns an object of type T[],
>
> we can declare variables of type T[], they have a representation
> identical to T*
>
> objects of type T[] decay to T* similar to array decay,
>
> delete p has the behavior of delete[] when p is of type T[].
>
> This would represent the difference in the type system and not just in
> the logic within functions.
>
>
> I think the state of T[] is very odd in the current language:
>
> variables cannot be declared:
>
> int p[]; // will not compile
>
> struct members can be declared, but this is C (flexible array members)
> and not allowed in strict C++
>
> struct S {
>
> int p[];
>
> };
>
> There are headers written in C that use use this syntax, and these won't
> change to not break existing code.
>
> Function parameters can be declared, but is no different from declaring
> a pointer
>
> void fun(int p[]); is the same as void fun(int* p);
>
> As far as I can tell, T[] in C++ is mostly used in specializations of
> templates like std::unique_ptr<T[]> which is essentially syntactic sugar
> over std::unique_ptr<T,array_deleter<T>>, as an array without bound T[]
> cannot be meaningfully used in the current language.
>
> I think the C++ language rules can be amended to allow T[] to represent
> T* allocated by new[] and backwards compatibility with C headers can
> preserved by disallowing this construct within extern "C" constructs.
>
> Of course, it is easy to imagine generic C++ code that breaks when the
> expression new[] returns a type that decays to T* instead of T*.
> Whether that is relevant in practice can only be determined by
> implementing the proposed changes in a compiler.
>
> I just thought I ask whether I am the only one who thinks this might be
> a good idea before (asking for help) implementing this in a branch of
> GCC or LLVM.
>
> Regards,
> Henning
>
As others have pointed out, making T[] distinct from T* would be a
/massive/ change to the way the fundamental types in C++ work. It is
not something that can be shoehorned into the language now. It is not
something that could be changed just for improving delete (especially
since we are now not supposed to use naked new and delete much, and of
preference use containers and smart pointers).
I have a suggestion of an alternative idea that would be much less
intrusive, and might be feasible.
When you use "new T" or "new T[10]", the low-level allocation functions
make space on the heap for the type or array, and also somehow record
information about the size of the actual allocation (which might be
rounded up, such as for cache line alignment) and the number of elements
in an array new allocation. Traditionally, C malloc/free systems did
this by allocating a size_t worth of space more than you asked for,
storing the allocation size in that size_t, and returning a pointer just
after that size_t for the user data. Current C++ implementations can do
something similar, or they can store the information elsewhere in some
form. And they don't need to store information if they can calculate it
later or will never need it. (The count of a new array is only really
needed if the type has a non-trivial destructor.)
So we can pretend that when you write "auto p = new T[10];", as well as
getting back a T* point in p, the compiler has "magic" functions :
size_t __real_allocation_size(T* p);
size_t __array_count(T* p);
How these "magic" functions are implemented is entirely
implementation-dependent, but logical equivalences of these must exist
for the current "delete" mechanism to work.
My suggestion then is to introduce a new container type,
std::dynarray<T>. This will always be an incomplete type, so you cannot
have local or statically allocated instances of it, or return it from a
function - mostly you will use pointers to the type. It will have the
same interface as std::array<T, N> (including, crucially, the "data"
member). But the size of the array is no longer a constant part of the
type - it is now returned by __array_count(p) where "p" is a pointer to
the dynarray<>.
Now instead of using "auto p = new T[10];" then "delete[] p;", you can
write "auto p = new std::dynarray<T>(10);", then "delete p;". The
pointer to the dynarray can be safely passed around to functions, and
used like a pointer to a container - it will be much like a
non-resizeable vector but would have the same efficiency and overhead as
a C-style array allocated on the heap with "new T[n]".
Implementation could not be pure C++, as it needs the magic
"__array_count" function.
I don't know if that idea would suit your needs, but it might be a
compromise between what you want, and something that has at least a
vague hope of being implementable!
David
> I think there was an (unavoidable) missed opportunity in C++98 when
> new[] and delete[] were introduced.
>
> What we have currently is unchanged since 1998 and very C:
>
> new std::string[8] returns an object of type std::string*. It is
> indistinguishable in its type from the result of new std::string.
>
> When you free it, you must remember that it was allocated via new[] and
> pass it into delete[] instead of delete.
> Calling delete instead of delete[] is undefined behavior and may lead to
> crashes in practice not just memory leaks.
>
> Instead, we could have the following:
>
> new T[n] returns an object of type T[],
>
> we can declare variables of type T[], they have a representation
> identical to T*
>
> objects of type T[] decay to T* similar to array decay,
>
> delete p has the behavior of delete[] when p is of type T[].
>
> This would represent the difference in the type system and not just in
> the logic within functions.
>
>
> I think the state of T[] is very odd in the current language:
>
> variables cannot be declared:
>
> int p[]; // will not compile
>
> struct members can be declared, but this is C (flexible array members)
> and not allowed in strict C++
>
> struct S {
>
> int p[];
>
> };
>
> There are headers written in C that use use this syntax, and these won't
> change to not break existing code.
>
> Function parameters can be declared, but is no different from declaring
> a pointer
>
> void fun(int p[]); is the same as void fun(int* p);
>
> As far as I can tell, T[] in C++ is mostly used in specializations of
> templates like std::unique_ptr<T[]> which is essentially syntactic sugar
> over std::unique_ptr<T,array_deleter<T>>, as an array without bound T[]
> cannot be meaningfully used in the current language.
>
> I think the C++ language rules can be amended to allow T[] to represent
> T* allocated by new[] and backwards compatibility with C headers can
> preserved by disallowing this construct within extern "C" constructs.
>
> Of course, it is easy to imagine generic C++ code that breaks when the
> expression new[] returns a type that decays to T* instead of T*.
> Whether that is relevant in practice can only be determined by
> implementing the proposed changes in a compiler.
>
> I just thought I ask whether I am the only one who thinks this might be
> a good idea before (asking for help) implementing this in a branch of
> GCC or LLVM.
>
> Regards,
> Henning
>
As others have pointed out, making T[] distinct from T* would be a
/massive/ change to the way the fundamental types in C++ work. It is
not something that can be shoehorned into the language now. It is not
something that could be changed just for improving delete (especially
since we are now not supposed to use naked new and delete much, and of
preference use containers and smart pointers).
I have a suggestion of an alternative idea that would be much less
intrusive, and might be feasible.
When you use "new T" or "new T[10]", the low-level allocation functions
make space on the heap for the type or array, and also somehow record
information about the size of the actual allocation (which might be
rounded up, such as for cache line alignment) and the number of elements
in an array new allocation. Traditionally, C malloc/free systems did
this by allocating a size_t worth of space more than you asked for,
storing the allocation size in that size_t, and returning a pointer just
after that size_t for the user data. Current C++ implementations can do
something similar, or they can store the information elsewhere in some
form. And they don't need to store information if they can calculate it
later or will never need it. (The count of a new array is only really
needed if the type has a non-trivial destructor.)
So we can pretend that when you write "auto p = new T[10];", as well as
getting back a T* point in p, the compiler has "magic" functions :
size_t __real_allocation_size(T* p);
size_t __array_count(T* p);
How these "magic" functions are implemented is entirely
implementation-dependent, but logical equivalences of these must exist
for the current "delete" mechanism to work.
My suggestion then is to introduce a new container type,
std::dynarray<T>. This will always be an incomplete type, so you cannot
have local or statically allocated instances of it, or return it from a
function - mostly you will use pointers to the type. It will have the
same interface as std::array<T, N> (including, crucially, the "data"
member). But the size of the array is no longer a constant part of the
type - it is now returned by __array_count(p) where "p" is a pointer to
the dynarray<>.
Now instead of using "auto p = new T[10];" then "delete[] p;", you can
write "auto p = new std::dynarray<T>(10);", then "delete p;". The
pointer to the dynarray can be safely passed around to functions, and
used like a pointer to a container - it will be much like a
non-resizeable vector but would have the same efficiency and overhead as
a C-style array allocated on the heap with "new T[n]".
Implementation could not be pure C++, as it needs the magic
"__array_count" function.
I don't know if that idea would suit your needs, but it might be a
compromise between what you want, and something that has at least a
vague hope of being implementable!
David
Received on 2025-07-02 08:34:26