0
0

Hi there,

From some initial tests, it seems like if cFE encounters a runtime error the whole kernel dies. That’s somewhat expected for embedded devices, but can be highly problematic for flight software. Other embedded OS or frameworks such as ROS allows for both software and hardware watchdogs with a set timeout, and even watchdogs of watchdogs.

Does cFE provide such a feature?

Thanks

  • the_other_james
    What platform are you running on? I think runtime exception handling may require changes to the OSAL / platform support package depending on your hardware.
  • crabotin
    I’m currently running this on a Linux box with OSAL.
  • You must to post comments
-1
0

Yes, I agree that a hardware timer would require another piece of hardware and related drivers.

I’m not sure I asked the question correctly: is there a way to prevent a buggy app from killing the main kernel execution?

  • crabotin
    I’d be grateful if someone could explain the negative vote here… I ask because on general purpose operating systems like Linux, even drivers run in user space instead of kernel space so that a crash doesn’t kill the kernel at the same time. I do, of course, understand that cFE/cFS is not as complex as Linux.
  • You must to post comments
0
0

With respect to watchdog timers, my guess is that a hardware timer will require a specific driver and not be generalizable. For detecting applications freezing, I don’t know if cFE has a specific mechanism for detecting it, but eventually message pipes will overflow generating error events.

  • You must to post comments
0
0

I’m going to answer this in regards to the POSIX OSAL implementation since that is what you are using. Other OSes and hardware will provide different exception handling capabilities. Please note: this implementation was not originally intended for flight use.

Some background: OSAL’s POSIX implementation uses threads instead of processes. So each cFE application is a thread, and they all share a memory space. This matches what you see in many embedded operating systems / hardware (for example, RTEMS doesn’t support POSIX processes, and not all microcontrollers have MMUs to allow memory isolation). This means that cFE applications can directly read and write each other’s memory. cFE applications can (if I’m not mistaken) also read and write to the memory of the core services. There isn’t really a “kernel execution” because no part of cFE is running in privileged mode (well, in your case the kernel is the Linux kernel, which is continuing to run just fine).

The relevant code for exception handling is in osapi.c in OSAL. For POSIX, this file is: https://github.com/nasa/osal/blob/master/src/os/posix/osapi.c. Here you will see a bunch of signal handling code, such as “sigdelset(&mask, SIGSEGV); /* Segfault */” which removes SIGSEGV from the list of signals that are blocked. Right now the default behavior is to let the cFE crash when an exception occurs because as the comments mention, this makes debugging much easier. You could, if you wanted, catch that signal and run your own error recovery routine. cFE does have ways to restart applications, if you can figured out which application caused the issue.

So I guess the short answer is no, cFE can not prevent buggy behavior in an application from crashing the rest of cFE. This isn’t really possible on the hardware it was designed for.

 

  • You must to post comments
Showing 3 results
Your Answer

Please first to submit.