Category Archives: Planning

Estimating Testing Times: Glorified Fortune-Telling?


Hofstadter’s Law:
It always takes longer than you
expect, even when you take
into account Hofstadter’s Law.

Douglas Hofstadter

A good friend of mine is a trainer for CrossFit, and has been for years. For a long time he trained clients out of his house, but his practice started outgrowing the space. His neighbors were complaining about the noise (if you’ve ever been in a CrossFit gym you can easily imagine that they had a point). Parking was becoming a problem, too.

So, in September, 2009, he rented a suite for a gym, in a building with an excellent location and a gutted interior–perfect for setting up the space exactly how he wanted it. It needed new flooring, plumbing, framing, drywall, venting, insulation, dropped ceiling, electricity, and a few other minor things. At the time, he told me they’d be putting the finishing touches on the build-out by mid-December. I remember thinking, “Wow. Three months. That’s a long time.”

As it turned out, construction wasn’t completed until late June, 2010, Seven months later than originally estimated.

Let’s think about that. Here’s a well-defined problem, with detailed plans (with drawings and precise measurements, even!) and a known scope, not prone to “scope creep.” The technology requirements for this kind of project are, arguably, on the low side–and certainly standardized and familiar. The job was implemented by skilled, experienced professionals, using specialized, efficiency-maximizing tools. And yet, it still took more than 3 times longer than estimated.

Contrast that with a software project. Often the requirements are incomplete, but even when they’re not, they’re still written in words, which are inherently ambiguous. What about tools? Sometimes even those have to be built, or existing tools need to be customized. And the analogy breaks down completely when you try to compare writing a line of code (or testing it) with, for example, hanging a sheet of drywall. Programmers are, by definition, attempting something that has never been done before. How do you come up with reasonable estimates in this situation?

This exact question was asked in an online discussion forum recently. A number of self-described “QA experts” chimed in with their answers. These all involved complex models, assumptions, and calculations based on things like “productivity factors,” “data-driven procedures,” “Markov chains,” etc. My eyes glazed over as I read them. If they weren’t all committing the Platonic fallacy then I don’t know what it is.

Firstly, at the start of any software project you are, as Jeffrey Friedman puts it, radically ignorant. You do not know what you do not know. The requirements are ambiguous and the code hasn’t even been written yet. This is still true for updates to existing products. You can’t be certain what effect the new features will have on the existing ones, or how many bugs will be introduced by re-factoring the existing features. How can you possibly know how many test cases you’re going to need to run? Are you sure you’re not committing the Ludic Fallacy when you estimate the “average time” per test case? Even if you’ve found the perfect estimation model (and how would you know this?), your inputs for it are bound to be wrong.

To attempt an estimate in that situation is to claim knowledge that you do not possess. Is that even ethical?

Secondly, your radical ignorance goes well beyond what the model’s inputs should be. What model takes into account events like the following (all of which actually happened, on projects I’ve been a part of)?

  1. The database containing the company’s live customer data–all of it–is inadvertently deleted by a programmer who thought at the time that he was working in the developer sandbox.
  2. The Director of Development, chief architect of the project, with much of the system design and requirements kept only in his head, fails to appear at work one day. Calls to his home go unanswered for two weeks. When someone finally gets in touch with him he says he won’t be coming back to work.
  3. A disgruntled programmer spends most of his time putting derogatory easter eggs in the program instead of actually working. When found by a particularly alert tester (sadly I can’t claim it was me) the programmer is fired.
  4. A version of the product is released containing an egregious bug, forcing the company to completely reassess  its approach to development (and blame the testers for missing the “obvious” bug, which then destroys morale and prompts a tester to quit).
  5. The company’s primary investor is indicted for running a ponzi scheme. The majority of the employees are simply let go, as there is not enough revenue from sales to continue to pay them.

The typical response from the “experts” has been, “Well, that’s where the ‘fudge factor’ comes in, along with the constant need to adjust the estimate while the project is underway.”

To that I ask, “Isn’t that just an implicit admission that estimates are no better than fortune-telling?”

I heard from Lynn McKee recently that Michael Bolton has a ready answer when asked to estimate testing time: “Tell me what all the bugs will be, first, then I can tell you how long it will take to test.”

I can’t wait to use that!


Goodhart’s Law and Test Cases

I’d like to share a story about a glass factory in the Soviet Union. Being Soviet, the factory didn’t have to worry about the pesky things that a typical glass manufacturer has to pay attention to, like profits or appealing to customers. This factory was in the workers’ Paradise, after all! Their only charge was to “make glass”. Thus, the factory’s managers were left free to define exactly what that meant. Going through it in their minds, their solution was to take the factory’s output, and weigh it.

Over time–and, mind you, not a long time–the “product” became larger and heavier, until what was coming off the factory floor were giant cubes of glass. Very heavy, of course, but useful to no one.  The managers were forced to admit that their definition of success was flawed.

Thinking it over, management decided it would be better to measure the area of the glass produced.  They announced this change to the workers. Soon, the giant cubes were gone, replaced by enormous sheets, nearly paper-thin.  Lots of surface area per volume, but again, utterly useless outside the factory gates.

Now, I don’t remember when or where I first heard this story, and it may be apocryphal. However, even as a fable it contains an important lesson about the potential consequences of ignoring what has come to be known as Goodhart’s Law. Stated succinctly, it is this: When a measure becomes a target, it ceases to be a good measure.

What does any of this have to do with software testing, and test cases? I hope the answer is fairly obvious, but I’ll spell it out anyway. I’ve seen too many testing teams who think that it’s a QA “best practice” to focus on the test case as the sole unit of measure of “testing productivity”. The conventional wisdom in the industry appears to be: the more test cases, the better. The scope, quality, or risk of each test case taken individually, if considered at all, is of secondary importance.

I’ve seen situations where this myopia got so bad that all that mattered was that the team completed 130 test cases in a day. If that didn’t happen then the team was seen as not being as productive as they could have been. Never mind how many bugs had been found, or which test cases were actually executed.

I hope you can see how this sort of incentive structure can lead to perverse outcomes. Test cases will be written so as to maximize their number instead of their quality or importance. The testers will scour the test case repository for those items that can be done the fastest, regardless of the risk-level of the product area being tested. They’re likely to put blinders on against any tangential quality issues that may surface in the course of executing their test case. They’ll start seeing bugs as annoyances instead of things to be proud of having found. In other words, the test team’s output will quickly begin to resemble, metaphorically, that of the Soviet glass factory.

Joel Spolsky makes the same point here.