C++ Logo

liaison

Advanced search

Re: [wg14/wg21 liaison] (SC22WG14.19262) C memory object model study group - uninitialised reads and padding

From: Uecker, Martin <Martin.Uecker_at_[hidden]>
Date: Wed, 14 Apr 2021 21:06:17 +0000
Am Mittwoch, den 14.04.2021, 13:46 -0700 schrieb Richard Smith:
> On Wed, Apr 14, 2021 at 12:19 PM Uecker, Martin via Liaison <
> liaison_at_[hidden]> wrote:
>
> > Am Mittwoch, den 14.04.2021, 15:12 -0400 schrieb Aaron Ballman:
> > > On Wed, Apr 14, 2021 at 3:10 PM Uecker, Martin
> > > <Martin.Uecker_at_[hidden]> wrote:
> > > > Am Mittwoch, den 14.04.2021, 21:51 +0300 schrieb Ville Voutilainen:
> > > > > On Wed, 14 Apr 2021 at 21:47, Jens Gustedt via Liaison
> > > > > <liaison_at_lists.isocpp.org> wrote:
> > > > > > Am 14. April 2021 20:07:18 MESZ schrieb JF Bastien <
> > cxx_at_[hidden]>:
> > > > > > > On Wed, Apr 14, 2021 at 11:00 AM Uecker, Martin <
> > Martin.Uecker_at_[hidden]>
> > > > > > > wrote:
> > > > > > > > Am Mittwoch, den 14.04.2021, 08:54 -0700 schrieb JF Bastien
> > via Liaison:
> > > > > > > > > On Tue, Apr 13, 2021 at 11:40 AM Peter Sewell <
> > Peter.Sewell_at_[hidden]>
> > > > > > > > > wrote:
> > > > > > > > > > - reading uninitialised representation bytes and padding
> > bytes is also
> > > > > > > > > > necessary for other bytewise polymorphic operations:
> > memcmp, marshalling,
> > > > > > > > > > encryption, and hashing (deferring what one knows about
> > the results of
> > > > > > > > > > such reads for a moment). It's not clear how generally
> > these operations
> > > > > > > > > > have to be supported, and we would like more data. Atomic
> > cmpxchg on large
> > > > > > > > > > structs, implemented with locks, would do a memcmp/memcpy
> > combination (in
> > > > > > > > > > fact is described as such in the standard).
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > For atomics with padding, C++20 adopted the following change
> > (and I expect
> > > > > > > > > that compilers will implement it in previous versions as
> > well):
> > > > > > > > > http://wg21.link/P0528
> > > > > > > >
> > > > > > > > I am not terribly excited about this solution.
> > > > > > > >
> > > > > > > > I think C should stick to the memcmp/memcpy semantics of
> > cmpxchg
> > > > > > > > which operate on the representation including padding. This
> > fits
> > > > > > > > to the hardware instructions, simplifies compiler design (no
> > > > > > > > need to look into each type), is easy to explain, handles all
> > > > > > > > cases consistently including unions, and is what most
> > > > > > > > C programmers would expect.
> > > > > > >
> > > > > > > OK, but it doesn't work, as explained in the paper.
> > > > > >
> > > > > > Well, there is actually not much of an explanation in the paper.
> > > > > >
> > > > > > And doesn't work isn't much of a description
> > > > >
> > > > > You need to look at the R0 revision
> > > > > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0528r0.html
> > > >
> > > > The closest thing which could apply to C is the following:
> > > >
> > > > Padded infloop_maybe(Atomic* atomic) {
> > > > Padded desired; // Padding unknown.
> > > > Padded expected; // Could be different.
> > > > peek("desired before", &desired);
> > > > peek("expected before", &expected);
> > > > peek("atomic before", atomic);
> > > > while (
> > > > !atomic->compare_exchange_strong(
> > > > expected,
> > > > desired // Padding bits added and removed here ˙ ͜ʟ˙
> > > > ));
> > > > peek("expected after", &expected);
> > > > peek("atomic after", atomic);
> > > > return expected; // Maybe changed here as well.
> > > > }
> > > >
> > > > But the claim that this can loop indefinitely seems wrong.
> > > >
> > > > The padding of desired is irrelevant.
> > > >
> > > > If the padding of 'expected' is different from
> > > > the padding of 'atomic', then there is one additional
> > > > executation of the loop where 'expected' is
> > > > updated to a version with the right padding. Then
> > > > in the next round the compare exchange succeeds.
> > >
> > > Is that guaranteed? I believe padding can take on arbitrary bit
> > > patterns at any point in time, so I think it could loop indefinitely
> > > in theory (but likely wouldn't in practice).
> >
> > According to the standard text, padding takes unspecified
> > values when a struct member is written - not at any time.
> >
>
> This is probably the relevant difference, then. I don't think the C++ model
> includes an idea that padding bits have a value, let alone a stable one; a
> C++ implementation is at liberty to represent a `struct { char c; int n; }`
> as a `char` plus an `int`, and not explicitly model any padding between the
> char and the int.

The idea that padding should not have stable values was discussed
before and some people argue that this opens up optimization
opportunities. Some people also argue that the standard is unclear
but I do not see this. The discussion we have now is essentially
about what it should be.

At least memcpy/memcmp on representation bytes is something
I believe C programmer expect to work (I think we will
cause a lot of frustration if we break this). If memcpy/memcmp
works as expected atomic_exchange_compare also does.

I wonder why C++ made the change to inderminate values in C++14?
(As I understand it, this is where it diverged from C).
This seems to cause a lot of trouble now and, quite frankly,
I do not see a real benefit in C++'s rules for indeterminate
values that are worth all the trouble.

Best,
Martin


>
> > But this is not even relvant here, because we do not
> > write to a struct member, we copy the full representation
> > of 'expected' as if by memcpy. This memcpy then sets
> > the representation bytes to right vlaues for this to work.
> >
> > Best,
> > Martin
> >
> > _______________________________________________
> > Liaison mailing list
> > Liaison_at_lists.isocpp.org
> > Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/liaison
> > Link to this post: http://lists.isocpp.org/liaison/2021/04/0423.php
> >

Received on 2021-04-14 16:06:26