I'm not sure that cache blocking is really an execution policy at all. It's an iteration pattern and memory layout question, and we don't as yet have a multi-dimensional operation in P2300 that would immediately need it. How cache blocking should work for the bulk operation is very algorithm dependent, and maybe the right approach is in the mapping of a higher algorithm to the flat (or later n-dimensional) bulk algorithm rather than in bulk itself.

How do you envisage cache blocking fitting in as an execution policy?