That Looks About Right: An Alaska Airlines Software Failure
Two 'tail strikes' in quick succession on Alaska Airlines flights highlight the subtle impacts of software bugs, and the importance of retaining human checks.
Alaska Airlines tail strikes
While visiting Alaska - via Seattle/Tacoma International Airport - in February, I heard about two tail strikes that had just occurred, only minutes apart, on Boeing aircraft operated by Alaska Airlines, heading for Hawaii. These incidents grounded both flights and forced a temporary shutdown of Alaska’s flight activity nationwide.
A tail strike occurs when the tail of an aircraft contacts the runway during take-off or landing. This can happen when the aircraft is loaded with too much weight or when the aircraft is rotated too early during take-off. Planes are designed to cope with tail strikes, and there was reportedly no major concern from these incidents, but they can pose a serious safety risk to passengers and crew in certain circumstances.
A software error in the ‘critical weight’ calculation
The Seattle Times reported that the root cause was found to be a software bug in a program that holds the crucial weight and balance data in each plane’s flight computer.
The critical take-off calculation takes into account several factors, including the aircraft's weight and balance, its performance characteristics, the length and condition of the runway and the prevailing weather conditions. Specialised software programs are used to do this, and the system then provides the flight crew with an estimate of the maximum weight that the aircraft can safely carry.
However, in this case the system - provided by a company called Dynamic Source - failed. The tool delivered faulty data that underestimated weights for the airplanes. This meant that the aircraft was too heavily loaded and this in turn caused the tail strikes during take-off.
That looks about right…
The Seattle Times reported that:
...the error was enough to skew the engine thrust and speed settings. Both planes headed down the runway with less power and at lower speed than they should have. And with the jets judged lighter than they actually were, the pilots rotated too early.
The bug was identified quickly in part because some flight crews noticed the weights didn’t seem right. They’re trained to be aware of the importance of accurate weight and balance calculations. During pre-flight checks the first officer reads the calculated weight data aloud and the captain verbally verifies it. Pilots routinely use an acronym when they do the pre-take-off check: TLAR, which means “That Looks About Right.” Soon after the tail strikes that day, Alaska issued a message to all of its pilots to “take a second and conduct a sanity check of the data.”
Subtle effects on functional safety
This incident highlights a couple of distinct learning points. The first is that the safe function of engineered systems can be complex. The standards that need to be employed to ensure software is safe are based on the concept of ‘functional safety’ - i.e. that the safety functions delivered by software are understood and consciously engineered. But subtle deviations from the intended behaviour can occur in practice, in the real-world.
The second is that, as we continue to embrace automation, it becomes more and more apparent that the human input is never fully removed - it’s just shifted from one activity to another. Whereas in days gone by crew would have had to estimate the weight of the plane, now they are required to check that the calculated values look plausible. That shift of focus will persist and expand as we see more automation, and the configuration of systems with manual data it implies.
So in summary, as software continues to be used to drive automation, we need to retain that human touch in at least two distinct ways:
ensuring that configured data delivers the right outcome and functions in practice;
understanding the, sometimes subtle, impacts of the ‘real world’ on safe performance and system function.
Developing competence in software safety assurance
If you’re interested in software safety assurance (and if you’re reading this blog you probably are) you’ll also be interested to know that Libusa has just launched an e-learning course on ‘Rail Software Safety Assurance as a Client.’ I developed the course content for it to seek to meet the requirements of the UK rail industry standard RIS-0745-CCS. The course has been developed specifically to bring any party who is not a software development expert up to speed on the basics of software safety assurance, when acting in a client role. The feedback from industry experts on it has been universally positive so please do take a look at it, particularly if you’re in a role where you need to apply the standard.
I’m keen to build the network for Tech Safe Transport. If you know anyone who is interested in the safety of modern transport technology, and who likes a thought provoking read every few weeks, please do share a link with them.
Thanks for reading
All views here are my own. Please feel free to feed back any thoughts or comments and please do feel free to drop me an e-mail on george.bearfield@ntlworld.com.
My particular area of professional interest is practical risk management and assurance of new technology. I’m always keen to engage on interesting projects or research in this area.