Process Theatre
Feb. 29th, 2020 03:04 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Book Review: Normal Accidents - Living with High-Risk Technologies, by Charles Perrow
I came across this book from a blog post I encountered on my feed. It chimed with problems both technical and cultural at work: the idea that increasing safety measures does not always make a system safer. Of course, I'm a software engineer and the system isn't a hospital or a nuclear power station, but I was interested to see whether there were any principles that could be transferred.
It's quite a dry, academic read, but not impenetrably so. Perrow begins with an example of a simple mistake one might make in the morning, and a series of incidents that may follow from it on the way to a job interview. The sequence of events sounds unlikely, but that is rather the point. There follows a more thorough discussion of the Three Mile Island accident of 1979, of which I have a vague memory as a child. Establishing cause is one thing; establishing responsibilities is another. A catalogue of errors and neglect emerges. The point is, most of these things, on their own, would not have led to the accident: the unlikely coincidence of these things made it inevitable. Further chapters examine other industries, such as the chemical industry, air and marine transport, dams, mining, space travel and genetic engineering. The results are rarely reassuring; the nuclear industry gets highlighted only because it is relatively new, and can, in the case of leakage, readily cause damage to a much wider environment. Only air travel emerges with a reasonable bill of health, based on a combination of self-interest (no airline wants a bad safety reputation) and a culture of "blameless" reporting systems (though one never knows how blameless it actually is) for minor inconsistencies and infringements.
This is a social science book, and Perrow probes the convenient tendency for investigations to assign "operator error" as the cause of accidents. While this can't be excluded, he argues there are other factors at work. His thesis involves two axes, one of system complexity, and the other of system coupling, giving a diagram of four quadrants.
In the high-complexity, high-risk quadrant, he argues, accidents are inevitable (hence "normal"), because there is no appropriate management strategy: a strong command-and-control structure isn't flexible to allow discretion for operators to tackle an incident creatively, but high coupling between system parts and a lack of slack in the system reduces the scope for autonomy. This is where the arguments in the book become relevant to a discipline such as software engineering. For me, it's not because our systems land in a particular quadrant on Perrow's chart, but rather that we have the opportunity to choose (at least to some extent) where on the chart to build our systems. Reducing complexity and coupling is not always easy, unfortunately it sometimes doesn't even get considered and it can be a difficult "sell" to management for adding in to an existing product, even after an accident; but putting in more and more rules and "process theatre" may not make a system safer (and is guaranteed to make it less responsive). Perrow documents several cases where safety systems failed or made an accident worse.
I read the 1999 edition, which concludes with an additional chapter on some major accidents since the first edition, including the Bhopal chemical plant, Chernobyl and the Challenger space shuttle disasters. Fortunately his final chapter, on predictions for the Y2K bug, turned out to be way too pessimistic, but even he acknowledges this event was something the industry knew was coming (even if it only dealt with it at the last minute), which is a different characteristic from most of the other events described in the book.
I came across this book from a blog post I encountered on my feed. It chimed with problems both technical and cultural at work: the idea that increasing safety measures does not always make a system safer. Of course, I'm a software engineer and the system isn't a hospital or a nuclear power station, but I was interested to see whether there were any principles that could be transferred.
It's quite a dry, academic read, but not impenetrably so. Perrow begins with an example of a simple mistake one might make in the morning, and a series of incidents that may follow from it on the way to a job interview. The sequence of events sounds unlikely, but that is rather the point. There follows a more thorough discussion of the Three Mile Island accident of 1979, of which I have a vague memory as a child. Establishing cause is one thing; establishing responsibilities is another. A catalogue of errors and neglect emerges. The point is, most of these things, on their own, would not have led to the accident: the unlikely coincidence of these things made it inevitable. Further chapters examine other industries, such as the chemical industry, air and marine transport, dams, mining, space travel and genetic engineering. The results are rarely reassuring; the nuclear industry gets highlighted only because it is relatively new, and can, in the case of leakage, readily cause damage to a much wider environment. Only air travel emerges with a reasonable bill of health, based on a combination of self-interest (no airline wants a bad safety reputation) and a culture of "blameless" reporting systems (though one never knows how blameless it actually is) for minor inconsistencies and infringements.
This is a social science book, and Perrow probes the convenient tendency for investigations to assign "operator error" as the cause of accidents. While this can't be excluded, he argues there are other factors at work. His thesis involves two axes, one of system complexity, and the other of system coupling, giving a diagram of four quadrants.
In the high-complexity, high-risk quadrant, he argues, accidents are inevitable (hence "normal"), because there is no appropriate management strategy: a strong command-and-control structure isn't flexible to allow discretion for operators to tackle an incident creatively, but high coupling between system parts and a lack of slack in the system reduces the scope for autonomy. This is where the arguments in the book become relevant to a discipline such as software engineering. For me, it's not because our systems land in a particular quadrant on Perrow's chart, but rather that we have the opportunity to choose (at least to some extent) where on the chart to build our systems. Reducing complexity and coupling is not always easy, unfortunately it sometimes doesn't even get considered and it can be a difficult "sell" to management for adding in to an existing product, even after an accident; but putting in more and more rules and "process theatre" may not make a system safer (and is guaranteed to make it less responsive). Perrow documents several cases where safety systems failed or made an accident worse.
I read the 1999 edition, which concludes with an additional chapter on some major accidents since the first edition, including the Bhopal chemical plant, Chernobyl and the Challenger space shuttle disasters. Fortunately his final chapter, on predictions for the Y2K bug, turned out to be way too pessimistic, but even he acknowledges this event was something the industry knew was coming (even if it only dealt with it at the last minute), which is a different characteristic from most of the other events described in the book.