C++ Logo


Advanced search

[ub] Is memmove a general way to change the type of memory?

From: Jeffrey Yasskin <jyasskin_at_[hidden]>
Date: Tue, 5 Nov 2013 15:49:16 -0800
Adam (cc'ed) found an interesting result yesterday:

Say we have a region of memory written by an unknown process:
  int* content = new int[sizeof(Foo)/sizeof(int)];

We want to interpret the bytes in that region of memory as the object
representation of a 'Foo'. We can't just reinterpret_cast them to a
Foo, since then we're violating [basic.lval]. However, we know how to
reinterpret the bytes of a float as the bytes of an int:

  float f = 3.14f;
  uint32_t i;
  static_assert(sizeof(i) == sizeof(f), "...");
  memcpy(&i, &f, sizeof(i));
  use(i); // Unspecified value, but implementations often define it.


  Foo f;
  memcpy(&f, content, sizeof(Foo));
  use(f); // Similar caveat.

We can certainly transform this to:
  char tmp[sizeof(Foo)];
  memcpy(tmp, content, sizeof(Foo));
  memcpy(&f, tmp, sizeof(Foo));

But in between the memcpy()s there, we're no longer using 'content',
so to save memory let's transform that to:

  char tmp[sizeof(Foo)];
  memcpy(tmp, content, sizeof(Foo));
  Foo* foo = reinterpret_cast<Foo*>(content);
  memcpy(foo, tmp, sizeof(Foo));

But memmove() is defined as "The memmove function copies n characters
from the object pointed to by s2 into the object pointed to by s1.
Copying takes place as if the n characters from the object pointed to
by s2 are first copied into a temporary array of n characters that does
not overlap the objects pointed to by s1 and s2, and then the n
characters from the temporary array are copied into the object pointed
to by s1." in C99, and C++14 delegates to C99 for memmove()'s
definition. So we can optimize our implementation to:

  Foo* foo = reinterpret_cast<Foo*>(content);
  memmove(foo, content, sizeof(Foo));

The type of a variable isn't generally thought to help with aliasing
violations, so we should be fine transforming this to:

  memmove(content, content, sizeof(Foo));

Leading to the conclusion that memmove() is the explicit way to reset
the type of any block of memory. Clang and gcc successfully optimize
this self-copy away, so it's even a free reinterpretation.

Now, this is a little crazy, and Clang's TBAA annotations on the
program below appear to indicate an aliasing violation, so I'd like to
invite this list to point out where my logic's broken.


// Compiled with `clang++ -O1 test.cc -o - -S -emit-llvm`

#include <stdio.h>
#include <string.h>

void foo(void* p, size_t len) {
  memmove(p, p, len);

__attribute__((noinline)) int convert(float* f) {
  memmove(f, f, sizeof(*f));
  return *reinterpret_cast<int*>(f);

int main() {
  float f = 3.14;
  printf("%d\n", convert(&f));

Received on 2013-11-06 00:49:39