Silverchair Insights: Incident Management at Silverchair: Root Cause Analysis

Over the last few months, Silverchair has improved and strengthened our Incident Management Processes. One piece of the Incident Management processes that we identified for improvement was our ability to determine a root cause through root cause analysis. This process is a problem-solving method used to investigate and categorize the root causes of an event.

Often, a root cause is not something that may be easily pointed to. Depending on the problem and the various teams involved, finding a root cause can be compared to finding a needle in a haystack. Determining a root cause is key, however, to identifying what went wrong by answering the who, what, where, when, why, and how. Our ability to understand the root causes of an issue directly correlates to our ability to implement changes that would reduce or eliminate the risk of the root cause occurring again. When we answer these questions, we, in turn, allow for a full understanding of the incident as well as the opportunity to learn and grow as a company.

 

Causal Factor Analysis

There are numerous methods available to determine a root cause. Silverchair has recently employed a technique called Causal Factor Analysis. This technique uses a causal factor chart, which is simply a sequence diagram with logic tests that describes the events leading up to an occurrence, as well as the conditions surrounding these events. Using this technique, we are able to provide structure around evidence gathering and effectively chart out the facts of a situation. We are also able to identify our weaknesses and uncover flaws that may otherwise not be surfaced.

We recently tested the effectiveness of the Causal Factor Analysis Technique by using it to determine the root cause of an event and potential contributing factors. We conducted this exercise using sticky notes on a white board to allowed the team the flexibility to move items around as we uncovered new evidence. The timeline created illustrates specific events we knew to correlate with the incident that occurred over the course of a few weeks. Through timeline iterations and systematical communication with team members involved in the event, we were able to create a detailed account of the event timeline, breakdowns, and additional context to further clarify the chart.

Here is an example of the first fully-developed display:

Causal Factor Analysis, as charted on a white board with post-it notes

 

As we continue to solidify this process, we plan to share the findings of incidents more broadly to strengthen our knowledge base company-wide. Silverchair finds tremendous value in using processes such as this, as they allow teams to jointly identify improvement opportunities in our macro and micro workflows.

—Daniel Persico, Manager of Technical Operations

1993 1999 2000s 2010 2017 calendar facebook instagram landscape linkedin news pen stats trophy twitter zapnito