Step one in Incident Management Opens a new window is to find out what is actually going on, and make sure the people who need to know are informed. Most software servers have built in reporting, generating vast numbers of records for expected and innocuous events. There are then applications running on those servers: their reporting may be good or indifferent. The trick is to ensure the events that are important and/or unexpected, the incidents, are reported to the right human(s), and that any reporting is accurate and comprehensive.
While incidents can be resolved as "exceptions", the expense of the hours begins to accrue when the exception becomes a daily occurrence. The problem that is the root cause of these incidents should now be more strongly investigated and ways found to prevent it from happening again.
An incident occurred at a manufacturing plant where a night-shift operator was loading a computer-controlled furnace with work. As they were performing their task, they suddenly became aware of heat unexpectedly coming down on their neck, indicating that the furnace heater was moving above them. Despite the operator's best efforts to stop the motor, the heater collided with the containment vessel cradle, shearing the nylon gearing.
As a result of this incident, the furnace was out of operation for two days during a busy period: there would not have been a night-shift running otherwise. This caused major disruptions to the manufacturing schedule.
After the incident, a review was conducted to determine the cause. The review eventually exonerated the operator, finding that the incident was not due to human error but rather a flaw in the furnace programming. It was discovered that if a switch was changed through two positions quickly at a certain stage in the sequence, the program did not register the intermediate change, which would have initiated a "halt" state for the heater in a safe neutral position. The heater would then not have started moving into position until a confirming "go" button was pressed.
The incident review team, having identified the root cause, implemented changes to the furnace programming to prevent similar incidents from occurring in the future.
An incident or problem review must focus on potential issues with the process, and look to find ways to strengthen the processes, to make them more robust. It's all too easy to blame the operator. A longer article "Incident Reviews Opens a new window" is published on LinkedIn.
Why hire an external consultant to carry out incident and problem reviews?
Being neutral in an incident review is a tacit skill
Fixes for the underlying problem are not always obvious to those closely involved.
An outsider can ask penetrating questions that may not otherwise get asked.
For more information on how we can help you with improving incident & problem management, including carrying out an ad hoc review, please write to email@example.com.
For a fixed-price initial review of the problem as you see it, go to Fiverr. After which, if agreeable, we will then move to quote for further work.
The image was generated using DALL·E 2 from the prompt "A image of a industrial furnace with a person in front of it, with a look of concern on their face, next to a computer console displaying error messages".