C++ Logo

sg12

Advanced search

[ub] Is memmove a general way to change the type of memory?

From: Jeffrey Yasskin <jyasskin_at_[hidden]>
Date: Tue, 5 Nov 2013 15:49:16 -0800
Adam (cc'ed) found an interesting result yesterday:

Say we have a region of memory written by an unknown process:
  int* content = new int[sizeof(Foo)/sizeof(int)];
  initialize(tmp);

We want to interpret the bytes in that region of memory as the object
representation of a 'Foo'. We can't just reinterpret_cast them to a
Foo, since then we're violating [basic.lval]. However, we know how to
reinterpret the bytes of a float as the bytes of an int:

  float f = 3.14f;
  uint32_t i;
  static_assert(sizeof(i) == sizeof(f), "...");
  memcpy(&i, &f, sizeof(i));
  use(i); // Unspecified value, but implementations often define it.

So:

  Foo f;
  memcpy(&f, content, sizeof(Foo));
  use(f); // Similar caveat.

We can certainly transform this to:
  char tmp[sizeof(Foo)];
  memcpy(tmp, content, sizeof(Foo));
  memcpy(&f, tmp, sizeof(Foo));
  use(f);

But in between the memcpy()s there, we're no longer using 'content',
so to save memory let's transform that to:

  char tmp[sizeof(Foo)];
  memcpy(tmp, content, sizeof(Foo));
  Foo* foo = reinterpret_cast<Foo*>(content);
  memcpy(foo, tmp, sizeof(Foo));
  use(*foo);

But memmove() is defined as "The memmove function copies n characters
from the object pointed to by s2 into the object pointed to by s1.
Copying takes place as if the n characters from the object pointed to
by s2 are first copied into a temporary array of n characters that does
not overlap the objects pointed to by s1 and s2, and then the n
characters from the temporary array are copied into the object pointed
to by s1." in C99, and C++14 delegates to C99 for memmove()'s
definition. So we can optimize our implementation to:

  Foo* foo = reinterpret_cast<Foo*>(content);
  memmove(foo, content, sizeof(Foo));
  use(*foo);

The type of a variable isn't generally thought to help with aliasing
violations, so we should be fine transforming this to:

  memmove(content, content, sizeof(Foo));
  use(*reinterpret_cast<Foo*>(content));

Leading to the conclusion that memmove() is the explicit way to reset
the type of any block of memory. Clang and gcc successfully optimize
this self-copy away, so it's even a free reinterpretation.

Now, this is a little crazy, and Clang's TBAA annotations on the
program below appear to indicate an aliasing violation, so I'd like to
invite this list to point out where my logic's broken.
Jeffrey

--------

// Compiled with `clang++ -O1 test.cc -o - -S -emit-llvm`

#include <stdio.h>
#include <string.h>

void foo(void* p, size_t len) {
  memmove(p, p, len);
}

__attribute__((noinline)) int convert(float* f) {
  memmove(f, f, sizeof(*f));
  return *reinterpret_cast<int*>(f);
}

int main() {
  float f = 3.14;
  printf("%d\n", convert(&f));
}

Received on 2013-11-06 00:49:39