Date: Thu, 6 Apr 2023 17:54:22 +0200
On 06/04/2023 17.10, Andy via Std-Proposals wrote:
> P1478 suggests that the added atomic_{load,store}_per_byte_memcpy can be implemented without accessing each byte individually
>
>> Note that on standard hardware, it should be OK to actually perform the copy at larger than byte granularity. Copying multiple bytes as part of one operation is indistinguishable from running them so quickly that the intermediate state is not observed. In fact, we expect that existing assembly memcpy implementations will suffice when suffixed with the required fence.
>
> A Rust RFC was made to mirror the same primitives in Rust, which also suggested that the implementation is not restricted to byte-sized accesses. The question of mixed size access was raised: https://github.com/rust-lang/rfcs/pull/3301#pullrequestreview-1082913781 <https://github.com/rust-lang/rfcs/pull/3301#pullrequestreview-1082913781>, which appears to apply to P1478 as well but was not discussed in the proposal.
>
> Since the actual access granularity of atomic memcpy is implementation-defined and opaque to the user, the user cannot guarantee that the same granularity is used when they access the same address using something other than atomic_{load,store}_per_byte_memcpy, or with a different count argument. It may be possible to guarantee a consistent granularity for each type, but since the proposed API takes void*, this isn’t possible.
>
> For instance, it's valid to have
>
> #include <atomic>
>
>
> struct Foo {
> std::atomic_int32_t a;
> };
>
> void thread1(Foo& foo) {
> foo.a.load(std::memory_order_acquire);
> }
>
> void thread2(Foo& foo) {
> Foo foo_copy;
> std::atomic_load_per_byte_memcpy(&foo_copy, &foo, sizeof(Foo), std::memory_order_acquire);
> }
>
> Resulting in the same address being accessed by two threads with potentially differently sized atomic operations, without order.
>
> The proposal seems to suggest that this is fine,
No. "Foo" is not trivially copyable (because a std::atomic_int32_t is not trivially copyable),
thus you can't apply memcpy to it. Note the comment in the example in the paper:
Foo data; // Trivially copyable.
This requirement seems to be missing from the wording, though.
Jens
> P1478 suggests that the added atomic_{load,store}_per_byte_memcpy can be implemented without accessing each byte individually
>
>> Note that on standard hardware, it should be OK to actually perform the copy at larger than byte granularity. Copying multiple bytes as part of one operation is indistinguishable from running them so quickly that the intermediate state is not observed. In fact, we expect that existing assembly memcpy implementations will suffice when suffixed with the required fence.
>
> A Rust RFC was made to mirror the same primitives in Rust, which also suggested that the implementation is not restricted to byte-sized accesses. The question of mixed size access was raised: https://github.com/rust-lang/rfcs/pull/3301#pullrequestreview-1082913781 <https://github.com/rust-lang/rfcs/pull/3301#pullrequestreview-1082913781>, which appears to apply to P1478 as well but was not discussed in the proposal.
>
> Since the actual access granularity of atomic memcpy is implementation-defined and opaque to the user, the user cannot guarantee that the same granularity is used when they access the same address using something other than atomic_{load,store}_per_byte_memcpy, or with a different count argument. It may be possible to guarantee a consistent granularity for each type, but since the proposed API takes void*, this isn’t possible.
>
> For instance, it's valid to have
>
> #include <atomic>
>
>
> struct Foo {
> std::atomic_int32_t a;
> };
>
> void thread1(Foo& foo) {
> foo.a.load(std::memory_order_acquire);
> }
>
> void thread2(Foo& foo) {
> Foo foo_copy;
> std::atomic_load_per_byte_memcpy(&foo_copy, &foo, sizeof(Foo), std::memory_order_acquire);
> }
>
> Resulting in the same address being accessed by two threads with potentially differently sized atomic operations, without order.
>
> The proposal seems to suggest that this is fine,
No. "Foo" is not trivially copyable (because a std::atomic_int32_t is not trivially copyable),
thus you can't apply memcpy to it. Note the comment in the example in the paper:
Foo data; // Trivially copyable.
This requirement seems to be missing from the wording, though.
Jens
Received on 2023-04-06 15:54:27