Posts tagged with Fault Injection

Short Introduction to This Paper

This paper gives us an introduction about how Etsy uses "GameDay" to build more confidence about their system's behavior. Specifically, it includes the discussion about 1) why apply it in production environment, 2) how to do fault injection during a GameDay exercise, 3) business justification and 4) a case, limitations and fear.

Highlights of This Paper

  • Introduction about the provisioning of a server or cloud instance from zero to production
  • Explanation about why many complex systems are largely intractable
  • Pattern about GameDay exercise, introducing the methodology of how they doing fault injection in a real company

- Read the full article -

Short Introduction to This Paper

This paper introduces and explores the idea of data poisoning, a light-weight peer-architecture technique to inject faults into Python programs. This method requires very small modification to the original program, which facilitates evaluation of sensitivity of systems that are prototyped or modeled in Python. Actually this paper doesn't show much detail about the implementation, but the types of data poisoning it declares are very interesting.

Highlights of This Paper

  • Data poisoning's symbolic expression
  • Different types of data poisoning

Key Infomation

  • Types of data poisoning: deterministic effect poisoning, intermittent effect poisoning (need define the lifetime of poisoned data), infectious/non-infectious poisoning

Relevant Future Works

  • Only doing data poisoning is not enough, we should analysis the system's behaviour under different types of perturbation

URL

Data Poisoning: Lightweight Soft Fault Injection for Python

Short Introduction to This Paper

This paper describes the motivation, innovation, design, running example and future development of a Fault Inject Tool (FIT). This tool enables controlled causing of cloud platform issues such as resource stress and service or VM outages, the purpose being to observe the subsequent effect on deployed applications.

Highlights of This Paper

  • The DICE FIT will address the need to generate various cloud agnostic faults at the VM Admin and Cloud Admin levels. So greater flexibility and the ability to generate multiple faults, relatively lightweight

Key Infomation

  • Design: To access the VM level and issue commands the DICE FIT uses SSH to connect to the Virtual Machines and issue the commands. By using JSCH, the tool is able to connect to any VM that has SSH enabled and issue commands as a pre-defined user. This allows greater flexibility of commands as well as the installation of tools and dependences.

Relevant Future Works

  • Containerised environments will also be considered as future FIT targets to help understand the effect on microservices when injecting faults to the underlying host as well as the integrity of the containerised deployment
  • The CACTOS project will expand the tool functionality by initiating a specific application level fault to trigger optimisation algorithms

URL

DICE Fault Injection Tool(Paper)
DICE-Fault-Injection-Tool(Github Project)

Short Introduction to This Paper

This paper aims at analyzing and improving how software handles unanticipated exceptions. The first objective is to set up contracts about exception handling and a way to assess them automatically. The second one is to improve the resilience capabilities of software by transforming the source code. The authors devise an algorithm, called short-circuit testing, which injects exceptions during test suite execution so as to simulate unanticipated errors. It is a kind of fault-injection techniques dedicated to exceptionhandling. This algorithm collects data that is used for verifying two formal contracts that capture two resilience properties w.r.t. exceptions: the source-independence and pure-resilience contracts. Then the team propose a code modification technique, called “catch-stretching” which allows error-recovery code (of the form of catch blocks) to be more resilient.

Highlights of This Paper

  • This work shows that it is possible to reason on software resilience by injecting exceptions during test suite execution
  • Definition of two contracts for exception handling: source independence contract, pure resilience contract
  • An algorithm and four predicates to verify whether a try-catch satisfies those contracts
  • A source code transformation to improve the resilience against exceptions
  • An empirical evaluation on 9 open sources applications with one test suite each showing that there exists resilient try-catch blocks in practice

Key Infomation

1.png

2.png

  • Source-independent: A try-catch is source-independent if the catch block proceeds equivalently, whatever the source of the caught exception is in the try block
  • Pure Resilience: A try-catch is purely resilient if the system state is equivalent at the end of the try-catch execution whether or not an exception occurs in the try block
  • Short-circuit testing consists of dynamically injecting exceptions during the test suite execution in order to analyze the resilience of try-catch blocks
  • Catch Stretching: Replacing the type of the caught exceptions so that they catch more exceptions than before. For instance, replacing catch(FileNotFoundException e) by catch(IOExceptione). The extreme of catch stretching is to parametrize the catch with the most generic type of exceptions(e.g. Throwable in Java, Exception in .NET)

Relevant Future Works

  • Further exploring how to improve the resilience of software applications: the scope of try blocks can be automatically adapted while still satisfying the test suite
  • The purely resilient catch blocks could probably be used elsewhere because they have a real recovery power
  • The resilience oracle has not to be only a test suite, but for example metamorphic relations or production traces
  • Automated refactoring of the relevant test suite

Questions

  • How to do catch stretching when there is a try with multiple catch blocks? And maybe the original test suites are not enough to verify new catch blocks

URL

Exception Handling Analysis and Transformation Using Fault Injection - Study of Resilience Against Unanticipated Exceptions