Short Introduction to This Paper
The paper is written in good texts and it introduces several interesting self-healing strategies for a operating system (not for a specific application). However there is a plethora of related work in hardware and software fault-tolerance.
The contribution of this paper is mainly a survey of techniques that can be applied to provide self-healing functionality to an OS. It discussed the concepts, implementation and evaluation on exception handling, code reloading, operating system component isolation, micro-rebooting, automatic system service restarts, watchdog timer based recovery and transactional components.
Connections with my thesis
It brings me more inspiration. I used to do chaos engineering or self-healing on application-level. But I could also do something on OS level (or docker container level may be more interesting). There are plenty of different techniques discussed in the paper, I could investigate more and come up with some novel strategies about designing a self-healing system