Yeah, P3161 provides the low-level operations that are used to implement those higher-level algorithms such as Karatsuba. Regardless of which high-level algorithm you choose, you reduce the instruction count significantly if you can compute the high bits of a multiplication at the same time as the low bits. If you didn't have the standard operation such as std::mul_wide, you would implement that yourself in terms of __int128 multiplication or something else, so there's no way around it. It's somewhere in the implementation of std::big_int.