Quickly Investigate a Production Incident [Root Cause Analysis]
Single Incident Success Story: Using TapRooT® Root Cause Analysis to Quickly Investigate a Production Incident
Submitted by Don Guidry, Huntsman
This success story was written in the past and is being updated to our most recent blog article format and is being reposted here. The example shows how TapRooT® Root Cause Analysis (RCA) can be successfully used to quickly but effectively investigate an expensive process incident and prevent the incident’s recurrence. The lessons learned from this success story still apply today.
Challenge
To quickly (within two weeks while repairs are being made) and without requiring excessive support of plant personnel, investigate and learn how to prevent the recurrence of the collapse of a process tank that caused severe damage to the tank and significant costs for its replacement.
Incident Summary
During the T&I period for a large petrochemical manufacturing process, a large tank collapsed due to the rapid condensation of steam that was being used to steam out the tank for maintenance work.
The rapid condensation occurred when the deluge system was accidentally activated when an electrician removed a faulty bulb in one of the relays for the deluge system’s electrical power supply.
Investigation
The investigation was performed at a facility that was licensed to use the TapRooT® System. Although the license included the right to use the TapRooT® System training materials, none of the people involved in the incident had received the training before the incident. Therefore, the investigation started by providing the participants a brief introduction to the TapRooT® System and the tools they would use.
A TapRooT® Trained Facilitator was chosen to lead the investigation. He had been to the 5-Day TapRooT® Advanced Root Cause Analysis Team Leader Course. He decided to use lunch hour (lunch was being catered) to conduct meetings with plant personnel participating in the investigations to save their time and avoid interruptions to the turnaround. It took four sessions one to three days apart (a total of 9 hours) to complete the investigation and develop corrective actions that will help prevent this type of incident from recurring.
In the first session, the team members learned about TapRooT® and drew their first SnapCharT® Diagram to better understand what happened.
In the second session, the team reviewed the SnapCharT® and the Causal Factors for the incident.
In the third session, the team reviewed a root cause analysis that had been performed by the facilitator using the Root Cause Tree® Diagram. All team members agreed to the analysis.
In the fourth session, the team members used the Corrective Action Helper® Guide to develop SMARTER Corrective Actions.
Results
In this short period of time, interesting problems were uncovered and difficult issues were addressed. TapRooT® RCA helped us logically and quickly lay out what happened and understand the specific root causes.
The two most impressive items about this investigation were:
- The generic problem that was uncovered that we are convinced we would not have uncovered if we had not been using TapRooT® RCA. We found an issue of the reluctance to remove any safeguard from service (like the deluge system) when the system being protected is removed from service for maintenance. Issues uncovered included how to decide when a safeguard should be disabled and the timing of when to disable the safeguard.
- The efficiency of the investigation process and the ability of the team to quickly adapt to using TapRooT® RCA. Just nine hours of team time were used to investigate a fairly complicated production problem. This investigation was conducted during hectic “turnaround” tempo operations with minimal impact on the operations and maintenance organization.
As a result of this investigation, we instituted a new checklist to be used during preparation for taking systems out of service. This checklist addresses the effect of safeguards that will be left in service, the hazards posed by safeguards left in service, and, if a safeguard is to be taken out of service, the process and timing for removing the safeguard from service.
We believe that this new checklist will fill an important gap in our maintenance planning process. After reviewing our past experience, we estimated that the insight from lessons learned from this one incident could save Huntsman over a million dollars a year by eliminating the sometimes expensive and dangerous unplanned events that happen during maintenance.
Improving performance is never really completed. This is just one example that demonstrates how we will continue to use the TapRooT® System to improve safety, production, and maintenance. But I think it clearly demonstrates that all investigations don’t have to be long, drawn-out affairs to learn lessons of great value.