Date: Fri, 10 Jun 2022 21:43:33 +0100
On Fri, 10 Jun 2022 12:27:46 -0400
"Arthur O'Dwyer" <arthur.j.odwyer_at_[hidden]> wrote:
> On Fri, Jun 10, 2022 at 12:00 PM Lénárd Szolnoki
> <cpp_at_[hidden]> wrote:
>
> > So what do I do if I want to microbenchmark a function with LTO on?
> > Maybe because that's the configuration relevant for my application.
> >
>
> I don't think I understand the notion of "microbenchmarking" "with
> LTO on." Isn't the whole point of LTO to mash all your code together
> so that it's *not* "micro" anymore, and its performance will end up
> depending very heavily on how it's actually used in practice?
I think it's sensible to benchmark a function that calls other
functions in other TUs. Note that those functions might not be mine,
but come from 3rd party dependencies that I statically link with LTO
enabled. I would still call this a microbenchmark, it's still not
benchmarking a whole applicaiton. It makes sense to use the same
optimization options for a microbenchmark as the one you use for your
applicaiton, and that includes LTO.
> At that point, you need a "macrobenchmark" so that you're testing the
> performance of the actual code, because its "micro" performance won't
> necessarily bear any relationship to its "macro" (real-world)
> performance. Maybe that means linking your final executable and then
> running it on some real-world input via a script; or *maybe*
> (although this seems very "clever") it means linking your "micro"
> benchmark function into a .dll and then wrapping a single call to
> that .dll into your top-level Google Benchmark program.
>
> #include <benchmark/benchmark.h>
> extern void runMyOptimizedMicrobenchmark(int*); // implemented in a
> .so/DLL somewhere else
> static void BM_MyThing(benchmark::State& state) {
> int i = 0;
> for (auto _ : state) {
> runMyOptimizedMicrobenchmark(&i);
> }
> benchmark::DoNotOptimize(i);
> }
> BENCHMARK(BM_MyThing);
The thing is, if I have a library funciton called "DoNotOptimize", but
it fails to be an optimization barrier in some circumstances, then it's
not a very good abstraction. Wouldn't it be nice if this could be a
similar optimization barrier, LTO enabled or not? Anyway,
[[gnu::noipa]] would be a better attribute here for making a noop
function "opaque", but clang does not recognize it and I don't know an
equivalent (noinline is not that).
> Either way, I've also lost the thread of what we're trying to
> accomplish here. Are we still trying to support someone stepping
> through in the debugger?
That's how the discussion started, but I'm interested in optimization
barriers in general, or maybe even more generally fine grained control
of optimization hints/options within the code.
> because people *definitely* don't do that
> with microbenchmark code. All I'm saying is, if your goal is simply
> to mystify the optimizing compiler as to whether a particular
> variable is dead or whether a particular write to it can be hoisted,
> literally all you have to do is escape that variable's address into a
> different translation unit (which is exactly what
> benchmark::DoNotOptimize does).
Again, with LTO disabled. Anyway, various assumptions about what the
compiler sees/doesn't see can break. Sure, you are pretty safe with
dynamic linking though, but we go into implementation-defined territory
in a different way then.
> The `volatile` keyword is both
> insufficient and unnecessary to achieve that goal.
I'm not advocating for volatile. I think dedicated attributes make
sense, either vendor-specific or standard.
> –Arthur
>
> >
"Arthur O'Dwyer" <arthur.j.odwyer_at_[hidden]> wrote:
> On Fri, Jun 10, 2022 at 12:00 PM Lénárd Szolnoki
> <cpp_at_[hidden]> wrote:
>
> > So what do I do if I want to microbenchmark a function with LTO on?
> > Maybe because that's the configuration relevant for my application.
> >
>
> I don't think I understand the notion of "microbenchmarking" "with
> LTO on." Isn't the whole point of LTO to mash all your code together
> so that it's *not* "micro" anymore, and its performance will end up
> depending very heavily on how it's actually used in practice?
I think it's sensible to benchmark a function that calls other
functions in other TUs. Note that those functions might not be mine,
but come from 3rd party dependencies that I statically link with LTO
enabled. I would still call this a microbenchmark, it's still not
benchmarking a whole applicaiton. It makes sense to use the same
optimization options for a microbenchmark as the one you use for your
applicaiton, and that includes LTO.
> At that point, you need a "macrobenchmark" so that you're testing the
> performance of the actual code, because its "micro" performance won't
> necessarily bear any relationship to its "macro" (real-world)
> performance. Maybe that means linking your final executable and then
> running it on some real-world input via a script; or *maybe*
> (although this seems very "clever") it means linking your "micro"
> benchmark function into a .dll and then wrapping a single call to
> that .dll into your top-level Google Benchmark program.
>
> #include <benchmark/benchmark.h>
> extern void runMyOptimizedMicrobenchmark(int*); // implemented in a
> .so/DLL somewhere else
> static void BM_MyThing(benchmark::State& state) {
> int i = 0;
> for (auto _ : state) {
> runMyOptimizedMicrobenchmark(&i);
> }
> benchmark::DoNotOptimize(i);
> }
> BENCHMARK(BM_MyThing);
The thing is, if I have a library funciton called "DoNotOptimize", but
it fails to be an optimization barrier in some circumstances, then it's
not a very good abstraction. Wouldn't it be nice if this could be a
similar optimization barrier, LTO enabled or not? Anyway,
[[gnu::noipa]] would be a better attribute here for making a noop
function "opaque", but clang does not recognize it and I don't know an
equivalent (noinline is not that).
> Either way, I've also lost the thread of what we're trying to
> accomplish here. Are we still trying to support someone stepping
> through in the debugger?
That's how the discussion started, but I'm interested in optimization
barriers in general, or maybe even more generally fine grained control
of optimization hints/options within the code.
> because people *definitely* don't do that
> with microbenchmark code. All I'm saying is, if your goal is simply
> to mystify the optimizing compiler as to whether a particular
> variable is dead or whether a particular write to it can be hoisted,
> literally all you have to do is escape that variable's address into a
> different translation unit (which is exactly what
> benchmark::DoNotOptimize does).
Again, with LTO disabled. Anyway, various assumptions about what the
compiler sees/doesn't see can break. Sure, you are pretty safe with
dynamic linking though, but we go into implementation-defined territory
in a different way then.
> The `volatile` keyword is both
> insufficient and unnecessary to achieve that goal.
I'm not advocating for volatile. I think dedicated attributes make
sense, either vendor-specific or standard.
> –Arthur
>
> >
Received on 2022-06-10 20:43:42