Olivier Giroux had a comment on one of the other reflectors a few months ago, asking whether we are only focusing on yesterday's error handling problems, and not looking enough at tomorrow's error handling problems.

With that in mind, what is the status quo for error handling in various unusual environments? I think I've got a good handle on the status quo for kernel and embedded systems, but I don't know what people do for GPUs, FPGAs, or cluster (e.g. GPUs, FPGAs, HPC clusters, etc...).

Do these systems all use some variation of return codes? Exceptions? Perhaps the only error conditions can be expressed and propagated with floating point NaNs? Maybe applications in these environments almost never run into errors that need to be propagated up the stack?

Please, share your experiences.