Friday, December 5, 2008

Precision, Speed and Mistakes

One of the benefits of being part of the Twitter craze is that you see real time what interests people you find interesting. Similar principle as other older services, but for some reason I can still not figure out, more effective.

My attention was drawn to a fascinating article written for Vanity Fair by William Langewiesche. Langewiesche is an interesting writer/journalist. I had previously read a few books he wrote: "The Outlaw Sea" (a chilling description of the ocean as the last out-of-legal-reach frontier - something the pirates in Somalia, and the seaborne Mumbai attacks remind us of), "Sahara Unveiled" (an interesting journey in a region of the world I spent quite some time in, and that he describes and analyses quite well) and "Inside the sky" (a beautiful account of the experience of flying)

The article [http://www.vanityfair.com/magazine/2009/01/air_crash200901?printable=true&currentPage=all] focuses on the circumstances that led to the mid-air collision between a private Embraer business jet and a GOL 737 over the Amazon a few years ago.
The key conclusion of the article: the speed and precision that modern technology enables magnifies the effects of mistakes.
In this case, both jets were flying on intersecting fly paths, flown by ultra-precise auto-pilots. Mistakes were made in communication between the pilots and flight control (language was a barrier), and the transponder and TCAS on the business jet were accidentally turned off.
In the past, the natural imprecision of human-controlled flight would have reduced the risk of the collision - but in this case, the jets intersected with surgical precision, the business jet surviving the collision but the 737 being lost.

This is an incredibly vivid illustration of the need of performing overall systems analysis in any application that embeds highly precise technology that changes significantly the way operations are conducted. Such an analysis must focus on:
- where in the system are the points at which changes of protocol, translation, etc., present opportunities for miscommunication or failures?
- how does the system cope with these failures?
- how does the system provide effective monitoring so that these failures are detected as soon as possible - at a speed which is compatible with the speed at which the application is proceeding (you need to move fast to prevent an issue that is occurring fast)?
- how quickly can the application change operation mode to cope with the failure?
- etc.

This is directly related to the preoccupations of Enterprise Decision Management (EDM). EDM-centric applications need to monitor the decisions being made, the outcomes of those decisions, the business value (safety for example in the case above) of the decisions, detect deviance, etc. These applications need to identify the decision metrics to monitor, the conditions under which these decision metrics are indicative of problems, the mechanisms to change the operation mode (move to human processing for example) to cope with some of those problems, etc.

The technologies and approaches used are similar to those involved in the "improvement loop" side of decision management.

No comments: