Tag Archives: Soviet Union

Goodhart’s Law and Test Cases

I’d like to share a story about a glass factory in the Soviet Union. Being Soviet, the factory didn’t have to worry about the pesky things that a typical glass manufacturer has to pay attention to, like profits or appealing to customers. This factory was in the workers’ Paradise, after all! Their only charge was to “make glass”. Thus, the factory’s managers were left free to define exactly what that meant. Going through it in their minds, their solution was to take the factory’s output, and weigh it.

Over time–and, mind you, not a long time–the “product” became larger and heavier, until what was coming off the factory floor were giant cubes of glass. Very heavy, of course, but useful to no one.  The managers were forced to admit that their definition of success was flawed.

Thinking it over, management decided it would be better to measure the area of the glass produced.  They announced this change to the workers. Soon, the giant cubes were gone, replaced by enormous sheets, nearly paper-thin.  Lots of surface area per volume, but again, utterly useless outside the factory gates.

Now, I don’t remember when or where I first heard this story, and it may be apocryphal. However, even as a fable it contains an important lesson about the potential consequences of ignoring what has come to be known as Goodhart’s Law. Stated succinctly, it is this: When a measure becomes a target, it ceases to be a good measure.

What does any of this have to do with software testing, and test cases? I hope the answer is fairly obvious, but I’ll spell it out anyway. I’ve seen too many testing teams who think that it’s a QA “best practice” to focus on the test case as the sole unit of measure of “testing productivity”. The conventional wisdom in the industry appears to be: the more test cases, the better. The scope, quality, or risk of each test case taken individually, if considered at all, is of secondary importance.

I’ve seen situations where this myopia got so bad that all that mattered was that the team completed 130 test cases in a day. If that didn’t happen then the team was seen as not being as productive as they could have been. Never mind how many bugs had been found, or which test cases were actually executed.

I hope you can see how this sort of incentive structure can lead to perverse outcomes. Test cases will be written so as to maximize their number instead of their quality or importance. The testers will scour the test case repository for those items that can be done the fastest, regardless of the risk-level of the product area being tested. They’re likely to put blinders on against any tangential quality issues that may surface in the course of executing their test case. They’ll start seeing bugs as annoyances instead of things to be proud of having found. In other words, the test team’s output will quickly begin to resemble, metaphorically, that of the Soviet glass factory.

Joel Spolsky makes the same point here.