In the script for the movie Apollo 13, one of the sayings fundamental views of the people working during the program was distilled into the quote “Failure is Not an Option”, and was spoken by Ed Harris, playing Gene Kranz. Kranz notably used it to title his autobiographical history of the events.
But the lesson of Apollo has been lost over the years by various parts of the engineering world.
The reason why “Failure is Not an Option” was a philosophy during the Apollo program was the knowledge that even the smallest failures could lead to the (very public) death of astronauts, possibly in space. This was deemed a risk for which the consequences were unacceptable.
In world of risk assessment, there are two parameters often combined to decide on mitigation action. One of the probability of an event; the second is the consequences of said event, in terms of economic loss or danger to human life or the environment. In most cases, there is a category of events for which the consequences are so dire that the probability is irrelevant – even if it were to happen once in a thousand years, it would still be unacceptable. These are the “Failure is Not an Option” situations.
Which brings me to current events. The events in the Gulf of Mexico, where human activity has directly resulted in catastrophic damage to the environment and catastrophic economic loss – this clearly should have been a Failure is Not an Option risk.
A good way to determine if an event is one of these is to ask two questions:
- If it goes wrong, can we fix it in a reasonable amount of time with limited damages. If not, then failure should not be an option.
- If it goes wrong, do you want to be on TV or testifying before Congress/Parliament about why you let it happen? If not, then failure should not be an option.
Pretty straight forward. The problem is that many people have forgotten to ask these questions.
The secondary problem is that pressures of time and money often cause people to superficially assess risk and underestimate the consequences of things going wrong. Gene Kranz said it after the Apollo 1 fire:
Spaceflight will never tolerate carelessness, incapacity, and neglect. Somewhere, somehow, we screwed up. It could have been in design, build, or test. Whatever it was, we should have caught it. We were too gung ho about the schedule and we locked out all of the problems we saw each day in our work. Every element of the program was in trouble and so were we. The simulators were not working, Mission Control was behind in virtually every area, and the flight and test procedures changed daily. Nothing we did had any shelf life. Not one of us stood up and said, ‘Dammit, stop!’ I don’t know what Thompson’s committee will find as the cause, but I know what I find. We are the cause! We were not ready! We did not do our job. We were rolling the dice, hoping that things would come together by launch day, when in our hearts we knew it would take a miracle. We were pushing the schedule and betting that the Cape would slip before we did. From this day forward, Flight Control will be known by two words: ‘Tough’ and ‘Competent.’ Tough means we are forever accountable for what we do or what we fail to do. We will never again compromise our responsibilities. Every time we walk into Mission Control we will know what we stand for. Competent means we will never take anything for granted. We will never be found short in our knowledge and in our skills. Mission Control will be perfect. When you leave this meeting today you will go to your office and the first thing you will do there is to write ‘Tough and Competent’ on your blackboards. It will never be erased. Each day when you enter the room these words will remind you of the price paid by Grissom, White, and Chaffee. These words are the price of admission to the ranks of Mission Control.