Date: Mon, 11 Oct 2021 19:13:17 -0700
On Monday, 11 October 2021 18:33:52 PDT Ivan Matek wrote:
> It was timed on a real codebase not toy examples. I did not benchmark per
> TU overhead, but when file is used 100 or 1000 times even tiny deltas add
> up.
> I made the measurement on system that was not doing anything else and had
> not turbo enabled so timings were repeatable(20-30 sec oscilations for 25m
> build) And removing chrono from common header reduced build times by around
> 10%.
Reproducible results are a good thing. I measured with turbo enabled, which
means that while my results are comparable to one another, they do not scale
up to compilations using all cores.
For 1000 files, given the heat dissipation issue and the turbo bins, I expect
it to add to more than 30 CPU-seconds total. But I didn't expect 30 wall-clock
seconds because you surely have more than one logical CPU.
> Also it was a VS solution, it is possible that MSVC implementation of
> chrono is heavier or they organize their header usage differently, for
> example I noticed I get a stringstream with chrono in VS.
> https://www.godbolt.org/z/nb73sxr35
Indeed, and that's why I tried libc++ too, because I know libc++ organises
very well with as little overlap as they can get. I was pleasantly surprised
that GCC 11's libstdc++ had comparable (and better) results.
But that's QoI. If one implementation can do better, so can all. Whether that
implementation is close to the optimum possible, that's another story.
> So if you want to benchmark it on something like a large codebase I suggest
> you add it to few widely used headers and try to benchmark then. Just make
> sure gcc/clang version(more precisely their std lib) you use implements a
> lot of C++20 stuff since my intuition is that new stuff is quite heavy.
It is. Ranges adds a lot of compilation time to <algorithm>. Also why I showed
both C++17 and C++20 times.
That's why I'd like to remove #include <algorithm> from qglobal.h, but have so
far been unable to. If there's an adoption of your suggestion, I'd welcome it.
> Now again you could say: just dont be stupid, dont put heavy headers in
> often used headers, but like I said problem is that chrono is not really
> granular, so for example if I want to have something as trivial as nanos in
> my header file I must include entire chrono header.
I'm just saying that <chrono> is not that expensive. It's definitely not the
most expensive of them all and the implementations of both libc++ and libstdc+
+ are fairly self-contained. For the C++17 functionality, there isn't much
that can be removed. At best, we'd separate the duration types from the
clocks, with time points somewhere in the middle.
> tl;dr not saying your benchmarks are wrong, but for my usecase/my compiler
> <chrono> is quite heavy.
I'm saying mine are not wrong. If you choose a compiler that is not the best,
then you get to complain about QoI to the vendor, whichever that is. We start
any benchmarking from the best option available, to see what the standard
needs to do to improve further upon that.
> It was timed on a real codebase not toy examples. I did not benchmark per
> TU overhead, but when file is used 100 or 1000 times even tiny deltas add
> up.
> I made the measurement on system that was not doing anything else and had
> not turbo enabled so timings were repeatable(20-30 sec oscilations for 25m
> build) And removing chrono from common header reduced build times by around
> 10%.
Reproducible results are a good thing. I measured with turbo enabled, which
means that while my results are comparable to one another, they do not scale
up to compilations using all cores.
For 1000 files, given the heat dissipation issue and the turbo bins, I expect
it to add to more than 30 CPU-seconds total. But I didn't expect 30 wall-clock
seconds because you surely have more than one logical CPU.
> Also it was a VS solution, it is possible that MSVC implementation of
> chrono is heavier or they organize their header usage differently, for
> example I noticed I get a stringstream with chrono in VS.
> https://www.godbolt.org/z/nb73sxr35
Indeed, and that's why I tried libc++ too, because I know libc++ organises
very well with as little overlap as they can get. I was pleasantly surprised
that GCC 11's libstdc++ had comparable (and better) results.
But that's QoI. If one implementation can do better, so can all. Whether that
implementation is close to the optimum possible, that's another story.
> So if you want to benchmark it on something like a large codebase I suggest
> you add it to few widely used headers and try to benchmark then. Just make
> sure gcc/clang version(more precisely their std lib) you use implements a
> lot of C++20 stuff since my intuition is that new stuff is quite heavy.
It is. Ranges adds a lot of compilation time to <algorithm>. Also why I showed
both C++17 and C++20 times.
That's why I'd like to remove #include <algorithm> from qglobal.h, but have so
far been unable to. If there's an adoption of your suggestion, I'd welcome it.
> Now again you could say: just dont be stupid, dont put heavy headers in
> often used headers, but like I said problem is that chrono is not really
> granular, so for example if I want to have something as trivial as nanos in
> my header file I must include entire chrono header.
I'm just saying that <chrono> is not that expensive. It's definitely not the
most expensive of them all and the implementations of both libc++ and libstdc+
+ are fairly self-contained. For the C++17 functionality, there isn't much
that can be removed. At best, we'd separate the duration types from the
clocks, with time points somewhere in the middle.
> tl;dr not saying your benchmarks are wrong, but for my usecase/my compiler
> <chrono> is quite heavy.
I'm saying mine are not wrong. If you choose a compiler that is not the best,
then you get to complain about QoI to the vendor, whichever that is. We start
any benchmarking from the best option available, to see what the standard
needs to do to improve further upon that.
-- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Software Architect - Intel DPG Cloud Engineering
Received on 2021-10-11 21:13:40