C++ Logo

std-discussion

Advanced search

Re: Is it valid use reinterpret_cast to form pointer to object?

From: Thiago Macieira <thiago_at_[hidden]>
Date: Mon, 31 Jul 2023 18:42:46 -0700
On Monday, 31 July 2023 17:21:06 PDT Ville Voutilainen wrote:
> I don't understand why start_lifetime_as is brought up in this
> discussion, and where that paper shows
> the example in the beginning of the thread being UB.

I think it was incorrectly brought up. The answer to the OP is that
std::launder() is required because a std::byte array and T aren't pointer-
interconvertible, presuming one new or std::construct_at have been done to
actually start the lifetime of a T.

However, it was brought up and I read the paper, then I pointed my objections.

> start_lifetime_as
> convinces the language that a bag of representation bits
> are a valid object of trivial type.

The paper specifically says that it convinces the language that the lifetime
starts now, not that it had already started and you're just forming a pointer
to it.

> But the example in the
> thread-starting message is not like that at all, it does a placement
> new of a T object into a storage location, and then separately obtains
> a pointer to T into that same storage location.
> If the status quo wording somewhere says that's UB, then we should
> have an almost trivial issue submission saying "surely not".

I agree with you, but the paper implies it is UB. The language for
[basic.compound]/4 seems to confirm that.

"If two objects are pointer-interconvertible, then they have the same address,
and it is possible to obtain a pointer to one from a pointer to the other via
a reinterpret_cast ([expr.reinterpret.cast])."

Strictly speaking, this is an "if" (sufficient condition), not "if and only if"
(sufficient and necessary). That means there could be other conditions under
which reinterpret_cast can be used on pointers without causing UB, but the
language here strongly implies it is a requirement. Though nothing in
[expr.reinterpret.cast] corroborates.

> That should also have nothing to do with start_lifetime_as, because
> that placement-new plus the later address-obtaining and use
> should work even for non trivial types, for any T, regardless of
> whether trivial or complex, so start_lifetime_as doesn't solve
> any problems here anyway.

Agreed but in this case start_lifetime_as wouldn't be the solution either. If
the object's lifetime has already started, then it should be std::launder
instead or (as you and I seem to agree) reinterpret_cast.

> > But the standard also says that memcpy() is allowed to implicitly create
> > objects, something this paper even reminds us of. Since any read() implies
> > a memcpy() from some other storage, I argue that the object's lifetime
> > was already started.
>
> Well, there's a limited bunch of special functions that the standard
> recognizes as implicit-lifetime-starting.
> malloc is one of them, memcpy is another, but not all operations that
> copy memory are in that bunch.

True, but I argue that this is the very case where it is implicitly starting a
lifetime, because it is copying a full object whose lifetime was previously
started elsewhere onto this buffer.

> That's why we need start_lifetime_as, to be able to provide an escape
> hatch to communicate to the language
> that such a special operation is in play, even though the standard
> doesn't explicitly recognize it as such,
> and that also makes that bunch extensible.

I understand what you're saying, but I only agree in very limited
circumstances, for automatic or static storage buffers. For dynamic storage,
you obtained such using malloc() or an equivalent platform-specific blessed
function (mmap, brk) and/or copied an extant object there with memcpy().

Even for automatic and static storage buffers, I am arguing that read(),
recv(), recvfrom(), recvmsg() perform a memcpy() from an unseen storage of an
extant object that was previously write()ten. I also argue that mmap() does
such a memcpy() too, like a lot of other functions do. Therefore, this should
not be UB:

  struct sockaddr_storage buf;
  socklen_t len = sizeof(buf);
  getsockname(sockfd, reinterpret_cast<sockaddr *>(&buf), &len);
  if (buf.ss_family == AF_INET)
    do_inet(reinterpret_cast<sockaddr_in *>(buf));
  else if (buf.ss_family == AF_INET6)
    do_inet6(reinterpret_cast<sockaddr_in6 *>(buf));

I advise that people to just use a union here and benefit from the common
initial sequence rule, but the language of pointer-interconvertibility and the
fact that getsockname() performed a memcpy() from such a union type means the
code above must not be UB.

The actual implementation inside the Linux kernel does not have the union. It
does however start the lifetime of a sockaddr_in or sockaddr_in6 object (using
the C language rules for that) then memcpy()s onto your buffer, so
std::launder() would be correct. The FreeBSD implementation is even more
explicit by using malloc()
https://github.com/freebsd/freebsd-src/blob/main/sys/netinet/in_pcb.c#L1832
-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel DCAI Cloud Engineering

Received on 2023-08-01 01:42:49