Date: Tue, 3 Dec 2019 20:06:37 +0000
>From an FPGA perspective, in our case we sit at the end of a DMA-driven message queue. In as much as we ever pass up errors (as opposed to a plain hardware crash - undefined behaviour in hardware is not fun) it's a specific message on that queue. That probably mirrors something outcome-like in the CPU realm. A hardware crash causes a watchdog callback that triggers extraordinary shutdown mechanisms in the CPU code - so akin to exception handling, but using our own triggers / handlers.
From: SG14 [mailto:sg14-bounces_at_[hidden]] On Behalf Of Ben Craig via SG14
Sent: Tuesday, 03 December, 2019 14:47
To: sg14_at_[hidden]
Cc: Ben Craig <ben.craig_at_[hidden]>; Olivier Giroux <OGiroux_at_[hidden]>
Subject: [SG14] Unusual environment error handling
Olivier Giroux had a comment on one of the other reflectors a few months ago, asking whether we are only focusing on yesterday's error handling problems, and not looking enough at tomorrow's error handling problems.
With that in mind, what is the status quo for error handling in various unusual environments? I think I've got a good handle on the status quo for kernel and embedded systems, but I don't know what people do for GPUs, FPGAs, or cluster (e.g. GPUs, FPGAs, HPC clusters, etc...).
Do these systems all use some variation of return codes? Exceptions? Perhaps the only error conditions can be expressed and propagated with floating point NaNs? Maybe applications in these environments almost never run into errors that need to be propagated up the stack?
Please, share your experiences.
________________________________
IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.
From: SG14 [mailto:sg14-bounces_at_[hidden]] On Behalf Of Ben Craig via SG14
Sent: Tuesday, 03 December, 2019 14:47
To: sg14_at_[hidden]
Cc: Ben Craig <ben.craig_at_[hidden]>; Olivier Giroux <OGiroux_at_[hidden]>
Subject: [SG14] Unusual environment error handling
Olivier Giroux had a comment on one of the other reflectors a few months ago, asking whether we are only focusing on yesterday's error handling problems, and not looking enough at tomorrow's error handling problems.
With that in mind, what is the status quo for error handling in various unusual environments? I think I've got a good handle on the status quo for kernel and embedded systems, but I don't know what people do for GPUs, FPGAs, or cluster (e.g. GPUs, FPGAs, HPC clusters, etc...).
Do these systems all use some variation of return codes? Exceptions? Perhaps the only error conditions can be expressed and propagated with floating point NaNs? Maybe applications in these environments almost never run into errors that need to be propagated up the stack?
Please, share your experiences.
________________________________
IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.
Received on 2019-12-03 14:09:01