I wonder how it plays with guaranteed RVO. If the return value conceptually initializes the result directly, shouldn't that technically include padding? (I guess that's mostly academic, since compilers don't do it in practice)

The problem is that it's not observable whether you create one trivially copyable or two trivially copyable objects; in the case of long double, function results are returned via floating-point stack, and there is no padding there.

And how is the problem exactly? If I follow this discussion, the
problem is that padded type when cast to not padded type can have
"poison" bits.

The problem is that you can't use a function that returns no padding to put padding into a specified state, so just bit-casting back and forth does nothing.