Even if compilers will optimize the code out, all users will have the pay the cost of slower compilation (because compiler still needs to instantiate the serialization code for big/little endian).
Basically with current design this is an early pessimization.
You should focus on providing low-level primitives first.
Maybe something like Codec concept that allows serializing primitive types (instead of enum format member in context concept), and provide two implementations BigEndianCodec, LittleEndianCodec - and allow users to provide custom ones.
With your current design I see no easy way to provide custom serialization codec (e.g. Type-Length-Value).