From an FPGA perspective, in our case we sit at the end of a DMA-driven message queue. In as much as we ever pass up errors (as opposed to a plain hardware crash - undefined behaviour in hardware is not fun) it's a specific message on that queue. That probably mirrors something outcome-like in the CPU realm. A hardware crash causes a watchdog callback that triggers extraordinary shutdown mechanisms in the CPU code - so akin to exception handling, but using our own triggers / handlers.

From: SG14 [mailto:sg14-bounces@lists.isocpp.org] On Behalf Of Ben Craig via SG14
Sent: Tuesday, 03 December, 2019 14:47
To: sg14@lists.isocpp.org
Cc: Ben Craig <ben.craig@ni.com>; Olivier Giroux <OGiroux@nvidia.com>
Subject: [SG14] Unusual environment error handling

Olivier Giroux had a comment on one of the other reflectors a few months ago, asking whether we are only focusing on yesterday's error handling problems, and not looking enough at tomorrow's error handling problems.

With that in mind, what is the status quo for error handling in various unusual environments? I think I've got a good handle on the status quo for kernel and embedded systems, but I don't know what people do for GPUs, FPGAs, or cluster (e.g. GPUs, FPGAs, HPC clusters, etc...).

Do these systems all use some variation of return codes? Exceptions? Perhaps the only error conditions can be expressed and propagated with floating point NaNs? Maybe applications in these environments almost never run into errors that need to be propagated up the stack?

Please, share your experiences.