And as a solution:

 -> Provide your code as a int_mod<m> C++ template library without license hindrance.

 

 -> Then the compiler vendors (hopefully) will optimize their _BitInt implementation to generate the same output as your library using the algorithm from your library, but implemented directly in the compiler

 

 => The optimized _BitInt will be available for C and C++ with fully inlined and optimized code