And as a solution:
-> Provide your code as a int_mod<m> C++ template library without license hindrance.
-> Then the compiler vendors (hopefully) will optimize their _BitInt implementation to generate the same output as your library using the algorithm from your library, but implemented directly in the compiler
=> The optimized _BitInt will be available for C and C++ with fully inlined and optimized code