Sorry for the change of mind about the container design, but my last answer is based on a deeper reading of your source code and partially on your first comment.
So my idea is that the container should embrace option (A). This guarantees less restrictive requirements than option (B), especially since the container does not necessarily have to comply with STL requirements. The point was to avoid straying too far from existing designs, that allow for a clearer view of the interface and implementation choices for new containers.
As you have already said, this looks like dr_tensor. It is as generic as possible, since allows to have both dr_vector and dr_matrix in one class (in the previous answer I wrongly called it multidimensional matrix).
I think its design should be something like this:
template <class T, class Layout, class AccessPolicy, class Alloc>
However, for the moment I do not want to place restrictions on the container's memory layout: while the design of a contiguous buffer that is interpreted as a multidimensional array is my favourite, an implementation similar to the std::deque container might be a great option. In this sense, I prefer to test the different implementations first.
The only thing I would absolutely avoid is the split between static and dynamic memory, as mentioned in my last answer, and I also do not like the SBO approach.
About the best features of std::valarray, your sentence is spot on: "we should be optimize subsequent
operations on a matrix".
My opinion is that the container can become a completion of std::valarray, that is very limiting for multidimensional operations and leaves the user the burden to implement efficient functions and layouts. In part, it is also a conjunction between std::valarray - std::span (one-dimensional) and [tensor] - std::mdspan (multi-dimensional).