C++ Logo


Advanced search

Re: [SG14] Call tomorrow Wednesday Oct 16 2-4 ET

From: Domagoj Šarić <domagoj.saric_at_[hidden]>
Date: Thu, 10 Oct 2019 18:53:44 +0200
On Wed, Oct 9, 2019 at 4:38 AM Ben Craig <ben.craig_at_[hidden]> wrote:

> I'm pretty sure about my non-trivial vs. trivial numbers. Here's an older
> set of runs that I had in a draft version of the paper:
> https://raw.githack.com/ben-craig/error_bench/f24c0c1e9e133b68b548f1a6265de7adf8a43f59/error_speed.html#happy_write_to_a_global_histogram
> You can see that the general shape of the graph is the same, but the means
> and medians are slightly different.
> Take a look at the difference in assembly between trivial and non-trivial
> with the version of gcc I used:
> https://godbolt.org/z/CPgHg6
> The general idea is that a local destructor really makes moving things in
> registers miserable. You get things in registers, but you immediately need
> to move them somewhere else because the destructor you are about to call
> would clobber those values. Then you have to move them back so that you
> can return them. If you have a non-trivial object, then it gets RVO'd on
> the stack, and you can just pass that pointer along, and there's very
> little shuffling of registers or data. Lots of code will need to read that
> top element of the stack to check if it is an error, but that value is
> going to be super hot in the cache.

FWIW, finally managed to take a look at this - even with the above said
this is still surprising:
 - stack can be as hot as the sun it is still at best in L1 which is 4-5
cycles of latency on x86 and ARM (true, x86 is perhaps able to circumvent
this with store-load forwarding in this simple scenario...don't know if ARM
has this feature) while the trivial/in-register version involves only one
register2register-move extra
- on MSVC (due to its broken x64 ABI) both versions go through the stack
yet the difference is still there (albeit smaller)
I'd guess that (despite the really laudable effort you put into this!)
coming up with a 'real world' happy path benchmark is really difficult
(unlike the sad path, which pretty much gives us the expected results)...

Domagoj Šarić
Tech lead
C++ engineer
Microblink LTD
www.microblink.com <https://microblink.com/>

Received on 2019-10-10 11:56:09