Date: Wed, 11 Dec 2024 14:41:39 -0300
On Wednesday 11 December 2024 13:00:03 Brasilia Standard Time Tiago Freire
wrote:
> (**and surely there are no systems that have more memory available than what
> the max signed integer can represent, so that shouldn't be a problem......
> right?** 👀)
Yes and no.
This does not exist on 64-bit and in all the current architectures, it can't
exist either. The pointer ranges are limited to well below 63 bits there.
32-bit architectures did have more than 4 GB of RAM for a while, before
everyone decided that they should use 64-bit ones. 32-bit applications are
still a valid target for 64-bit OSes, so the applications still exist where
there's more RAM available for an application than it could conceivably
allocate.
That said, some language limitations do apply that limit the allocation to the
range of the signed type:
* no single memory block allocated by malloc() can be bigger than
PTRDIFF_MAX-1, including those of 32-bit ptrdiff_t, because of the next point
The -1 comes from the fact that the one-past-the-end address must still be
reachable in the same arithmetic.
(this does not apply to mmap() or other OS-specific calls)
* the C and C++ languages require that pointer arithmetic on arrays be done
with ptrdiff_t, so the maximum size that any array can assume is of
PTRDIFF_MAX-1 elements anyway, even if you could allocate more with an OS-
specific call. If you do that, you get very strange situations where
end > begin
but
end - begin < 0
* the C++ containers do require a difference_type that can address the entire
range in both positive and negative sides, again limiting the size
That means it is possible to use a 32-bit size_type and a 64-bit
difference_type in Standard Library containers to use up to UINT_MAX elements
in a container. In practice, no one does that, because either the 64-bit
computations would be marginally but noticeably slower (given how frequent
they will happen) or the size of the container would be artificially limited to
4 billion when it could go higher. And this is not even addressing the
plethora of bugs that would turn up due to the 32/64 mismatch.
Likewise, it would be possible to use 32-bit size_t and 64-bit ptrdiff_t but no
one does that for similar reasons.
A hypothetical architecture could have N-bit size_t and N+1-bit ptrdiff_t, but
no such exist today. And as you've pointed out that hardware arithmetic is
implemented in two's complement so they don't need to know whether they're
operating on signed or unsigned for most operations, it's unlikely to ever
exist.
Therefore, effectively, for all intents and purposes, the size of any C or C++
container is at most PTRDIFF_MAX - 1 and this means half of the range in the
unsigned type used to represent sizes and indices are off limits.
> That's incorrect. Signed integers do absolutely overflow (and underflow), by
> design constraints alone it can't represent as many positive values as an
> unsigned integer of the same size. What it means is when it does overflow
> it's UB.
The point is that, being UB, the compiler is allowed to assume it *didn't*
overflow. So long as your code is correct and it really didn't, there's a minor
but real performance gain in not having code emitted to deal with an allowed
overflow.
wrote:
> (**and surely there are no systems that have more memory available than what
> the max signed integer can represent, so that shouldn't be a problem......
> right?** 👀)
Yes and no.
This does not exist on 64-bit and in all the current architectures, it can't
exist either. The pointer ranges are limited to well below 63 bits there.
32-bit architectures did have more than 4 GB of RAM for a while, before
everyone decided that they should use 64-bit ones. 32-bit applications are
still a valid target for 64-bit OSes, so the applications still exist where
there's more RAM available for an application than it could conceivably
allocate.
That said, some language limitations do apply that limit the allocation to the
range of the signed type:
* no single memory block allocated by malloc() can be bigger than
PTRDIFF_MAX-1, including those of 32-bit ptrdiff_t, because of the next point
The -1 comes from the fact that the one-past-the-end address must still be
reachable in the same arithmetic.
(this does not apply to mmap() or other OS-specific calls)
* the C and C++ languages require that pointer arithmetic on arrays be done
with ptrdiff_t, so the maximum size that any array can assume is of
PTRDIFF_MAX-1 elements anyway, even if you could allocate more with an OS-
specific call. If you do that, you get very strange situations where
end > begin
but
end - begin < 0
* the C++ containers do require a difference_type that can address the entire
range in both positive and negative sides, again limiting the size
That means it is possible to use a 32-bit size_type and a 64-bit
difference_type in Standard Library containers to use up to UINT_MAX elements
in a container. In practice, no one does that, because either the 64-bit
computations would be marginally but noticeably slower (given how frequent
they will happen) or the size of the container would be artificially limited to
4 billion when it could go higher. And this is not even addressing the
plethora of bugs that would turn up due to the 32/64 mismatch.
Likewise, it would be possible to use 32-bit size_t and 64-bit ptrdiff_t but no
one does that for similar reasons.
A hypothetical architecture could have N-bit size_t and N+1-bit ptrdiff_t, but
no such exist today. And as you've pointed out that hardware arithmetic is
implemented in two's complement so they don't need to know whether they're
operating on signed or unsigned for most operations, it's unlikely to ever
exist.
Therefore, effectively, for all intents and purposes, the size of any C or C++
container is at most PTRDIFF_MAX - 1 and this means half of the range in the
unsigned type used to represent sizes and indices are off limits.
> That's incorrect. Signed integers do absolutely overflow (and underflow), by
> design constraints alone it can't represent as many positive values as an
> unsigned integer of the same size. What it means is when it does overflow
> it's UB.
The point is that, being UB, the compiler is allowed to assume it *didn't*
overflow. So long as your code is correct and it really didn't, there's a minor
but real performance gain in not having code emitted to deal with an allowed
overflow.
-- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Principal Engineer - Intel DCAI Platform & System Engineering
Received on 2024-12-11 17:41:45