Tag Archives: Hofstadter’s Law

Requirements: Placebo or Panacea?

Perhaps the definitive commentary on "requirements"I vividly remember the days when I matured as a tester. I was the fledgling manager of a small department of folks who were equal parts tester and customer support. The company itself was staffed by primarily young, enthusiastic but inexperienced people, perhaps all given a touch of arrogance by the company’s enormous profitability.

We had just released a major update to our software–my first release ever as a test manager–and we all felt pretty good about it. For a couple days. Then the complaints started rolling in.

“Why didn’t QA find this bug?” was a common refrain. I hated not having an answer.

“Well… uh… No one told me the software was supposed to be able to do that. So how could we know to test for it? We need more detailed requirements!” (I was mindful of the tree cartoon, which had recently been shared around the office, to everyone’s knowing amusement.)

The programmers didn’t escape the inquisition unscathed, either. Their solution–and I concurred–was, “We need a dedicated Project Manager!”

Soon we had one. In no time, the walls were papered with PERT charts. “Critical path” was the new buzzword–and, boy, did you want to stay off that thing!

You couldn’t help but notice that the PERT charts got frequent revisions. They were pretty, and they gave the impression that things were well-in-hand; our path forward was clear. But they were pretty much obsolete the day after they got taped to the wall. “Scope creep” and “feature creep” were new buzzwords heard muttered around the office–usually after a meeting with the PM. I also found it odd that the contents of the chart would change, but somehow the target release date didn’t move.

As for requirements, soon we had technical specs, design specs, functional specs, specs-this, specs-that… Convinced that everything was going to be roses, I was off and running, creating test plans, test cases, test scripts, bla bla…

The original target release date came and went, and was six months gone before the next update was finally shipped. Two days later? You guessed it! Customers calling and complaining about bugs with things we’d never thought to test.

Aside from concluding that target release dates and PERT charts are fantasies, the result of all this painful experience was that I came to really appreciate a couple things. First, it’s impossible for requirements documents to be anything close to “complete” (yes, a heavily loaded word if I’ve ever seen one. Let’s provisionally define it as: “Nothing of any significance to anyone important has been left out”). Second, having document completeness as a goal means spending time away from other things that are ultimately more important.

Requirements–as well as the team’s understanding of those requirements–grow and evolve throughout the project. This is unavoidable, and most importantly it’s okay.

Apparently, though, this is not the only possible conclusion one can reach after having such experiences.

Robin F. Goldsmith, JD, is a guy who has been a software consultant since 1982, so I hope he’s seen his fair share of software releases. Interestingly, he asserts here that “[i]nadequate requirements overwhelmingly cause most project difficulties, including ineffective ROI and many of the other factors commonly blamed for project problems.” Here, he claims “The main reason traditional QA testing overlooks risks is because those risks aren’t addressed in the system design… The most common reason something is missing in the design is that it’s missing in the requirements too.”

My reaction to these claims is: “Oh, really?”

How do you define “inadequate” in a way that doesn’t involve question begging?
How do you know it’s the “main” reason?
What do you mean by “traditional QA testing”?

Goldsmith addresses that last question with a bit of a swipe at the context-driven school:

Many testers just start running spontaneous tests of whatever occurs to them. Exploratory testing is a somewhat more structured form of such ad hoc test execution, which still avoids writing things down but does encourage using more conscious ways of thinking about test design to enhance identification of tests during the course of test execution. Ad hoc testing frequently combines experimentation to find out how the system works along with trying things that experience has shown are likely to prompt common types of errors.

Spontaneous tests often reveal defects, partly because testers tend to gravitate toward tests that surface commonly occurring errors and partly because developers generally make so many errors that one can’t help but find some of them. Even ad hoc testing advocates sometimes acknowledge the inherent risks of relying on memory rather than on writing, but they tend not to realize the approach’s other critical limitations.

By definition, ad hoc testing doesn’t begin until after the code has been written, so it can only catch — but not help prevent — defects. Also, ad hoc testing mainly identifies low-level design and coding errors. Despite often being referred to as “contextual” testing, ad hoc methods seldom have suitable context to identify code that is “working” but in the service of erroneous designs, and they have even less context to detect what’s been omitted due to incorrect or missing requirements.

I’m not sure where Goldsmith got his information about the Context-driven approach to testing, but what he’s describing ain’t it! See here and here for much better descriptions.

Goldsmith contrasts “traditional QA testing” with something he calls “proactive testing.” Aside from “starting early by identifying and analyzing the biggest risks,” the proactive tester

…enlists special risk identification techniques to reveal many large risks that are ordinarily overlooked, as well as the ones that aren’t. These test design techniques are so powerful because they don’t merely react to what’s been stated in the design. Instead, these methods come at the situation from a variety of testing orientations. A testing orientation generally spots issues that a typical development orientation misses; the more orientations we use, the more we tend to spot. [my emphasis]

What are these “special risk identification techniques”? Goldsmith doesn’t say. To me, this is an enormous red flag that we’re probably dealing with a charlatan, here. Is he hoping that desperate people will pay him to learn what these apparently amazing techniques are?

His advice for ensuring a project’s requirements are “adequate” is similarly unhelpful. As near as I can figure it, reading his article, his solution amounts to making sure that you know what the “REAL” [emphasis Goldsmith] requirements are at the start of the project, so you don’t waste time on the requirements that aren’t “REAL”.

Is “REAL” an acronym for something illuminating? No. Goldsmith says he’s capitalizing to avoid the possibility of page formatting errors. He defines it as “real and right” and “needed” and “the requirements we end up with” and “business requirements.” Apparently, then, “most” projects that have difficulties are focusing on “product requirements” instead of “business requirements.”

Let’s say that again: To ensure your requirements are adequate you must ensure you’ve defined the right requirements.

I see stuff like this and start to wonder if perhaps my reading comprehension has taken a sudden nose dive.


Estimating Testing Times: Glorified Fortune-Telling?


Hofstadter’s Law:
It always takes longer than you
expect, even when you take
into account Hofstadter’s Law.

Douglas Hofstadter

A good friend of mine is a trainer for CrossFit, and has been for years. For a long time he trained clients out of his house, but his practice started outgrowing the space. His neighbors were complaining about the noise (if you’ve ever been in a CrossFit gym you can easily imagine that they had a point). Parking was becoming a problem, too.

So, in September, 2009, he rented a suite for a gym, in a building with an excellent location and a gutted interior–perfect for setting up the space exactly how he wanted it. It needed new flooring, plumbing, framing, drywall, venting, insulation, dropped ceiling, electricity, and a few other minor things. At the time, he told me they’d be putting the finishing touches on the build-out by mid-December. I remember thinking, “Wow. Three months. That’s a long time.”

As it turned out, construction wasn’t completed until late June, 2010, Seven months later than originally estimated.

Let’s think about that. Here’s a well-defined problem, with detailed plans (with drawings and precise measurements, even!) and a known scope, not prone to “scope creep.” The technology requirements for this kind of project are, arguably, on the low side–and certainly standardized and familiar. The job was implemented by skilled, experienced professionals, using specialized, efficiency-maximizing tools. And yet, it still took more than 3 times longer than estimated.

Contrast that with a software project. Often the requirements are incomplete, but even when they’re not, they’re still written in words, which are inherently ambiguous. What about tools? Sometimes even those have to be built, or existing tools need to be customized. And the analogy breaks down completely when you try to compare writing a line of code (or testing it) with, for example, hanging a sheet of drywall. Programmers are, by definition, attempting something that has never been done before. How do you come up with reasonable estimates in this situation?

This exact question was asked in an online discussion forum recently. A number of self-described “QA experts” chimed in with their answers. These all involved complex models, assumptions, and calculations based on things like “productivity factors,” “data-driven procedures,” “Markov chains,” etc. My eyes glazed over as I read them. If they weren’t all committing the Platonic fallacy then I don’t know what it is.

Firstly, at the start of any software project you are, as Jeffrey Friedman puts it, radically ignorant. You do not know what you do not know. The requirements are ambiguous and the code hasn’t even been written yet. This is still true for updates to existing products. You can’t be certain what effect the new features will have on the existing ones, or how many bugs will be introduced by re-factoring the existing features. How can you possibly know how many test cases you’re going to need to run? Are you sure you’re not committing the Ludic Fallacy when you estimate the “average time” per test case? Even if you’ve found the perfect estimation model (and how would you know this?), your inputs for it are bound to be wrong.

To attempt an estimate in that situation is to claim knowledge that you do not possess. Is that even ethical?

Secondly, your radical ignorance goes well beyond what the model’s inputs should be. What model takes into account events like the following (all of which actually happened, on projects I’ve been a part of)?

  1. The database containing the company’s live customer data–all of it–is inadvertently deleted by a programmer who thought at the time that he was working in the developer sandbox.
  2. The Director of Development, chief architect of the project, with much of the system design and requirements kept only in his head, fails to appear at work one day. Calls to his home go unanswered for two weeks. When someone finally gets in touch with him he says he won’t be coming back to work.
  3. A disgruntled programmer spends most of his time putting derogatory easter eggs in the program instead of actually working. When found by a particularly alert tester (sadly I can’t claim it was me) the programmer is fired.
  4. A version of the product is released containing an egregious bug, forcing the company to completely reassess  its approach to development (and blame the testers for missing the “obvious” bug, which then destroys morale and prompts a tester to quit).
  5. The company’s primary investor is indicted for running a ponzi scheme. The majority of the employees are simply let go, as there is not enough revenue from sales to continue to pay them.

The typical response from the “experts” has been, “Well, that’s where the ‘fudge factor’ comes in, along with the constant need to adjust the estimate while the project is underway.”

To that I ask, “Isn’t that just an implicit admission that estimates are no better than fortune-telling?”

I heard from Lynn McKee recently that Michael Bolton has a ready answer when asked to estimate testing time: “Tell me what all the bugs will be, first, then I can tell you how long it will take to test.”

I can’t wait to use that!


Opportunity Cost

I’ve always had an abiding love of economics – which, contrary to what seems to be popular belief, is not about how to balance your checkbook, but about what it means to make choices in the face of scarcity. Once you’re even just a little familiar with the economic way of thinking then you never see the world in quite the same way again.

A fundamental component of economic thought is the notion of “opportunity cost.” The idea was probably most famously and elegantly described by the French economist Frédéric Bastiat in his essay “What Is Seen and What Is Not Seen”, in 1848. Its basics are this: The real cost of a thing is not the time or the money you spend on it, but the alternate choice that you’ve given up. For example, the opportunity cost of the omelet you had for breakfast is the pancakes you didn’t have.

Testers – ever familiar with tight schedules, under-staffing, and a potentially infinite list of tests to perform – probably understand opportunity cost at a more visceral level than your typical academic economist. When testers keep opportunity cost in mind, they’ll constantly be asking themselves, “Is this the most valuable test I could be doing right now?”

This brings us to the perennial question: Should testing be automated?

Setting aside the deeper question of “Can it?”, the answer, obviously, should involve weighing the benefits against the costs – including, most importantly, the opportunity cost. The opportunity cost of automated testing is the manual testing you have to give up, because – let’s not kid ourselves – it’s the rare software company that will hire a whole new team whose sole job is planning, writing, and maintaining the test scripts.

And let’s say the company does hire specialist automation testers. Well, it seems there’s always an irresistible inclination to, in a pinch (and Hofstadter’s Law ensures there’s always a pinch), have the automation team help with the manual testing efforts – “only to get us through this tough time”, of course. Funny how that “tough time” has a tendency to get longer and longer. Meanwhile, the automation scripts are not being written, or else they’re breaking or becoming obsolete. (Note that, of course, when a company puts automation on hold in favor of doing manual testing, it’s an implicit recognition of the opportunity cost of the automation.)

There’s another factor to consider: How valuable are these automated tests going to be?

I’ve already made the argument that good testers are not robots. Do I need to point out what automated testing is? Scripts are the pinnacle of inflexible and narrowly focused testing. A human performing the same test is going to see issues that the machine will miss. This is why James Bach prefers to call manual testing “sapient”, instead. He has a point!

Furthermore, except in particular domains (such as data-driven testing) any single test will lose value each time it is performed, thanks to something else from economic theory: the law of diminishing returns. Sure, sometimes developers break stuff that was working before, but unless there’s no chance the manual – sorry sapient – testers were going to timely find the problem in the course of their – ahem! – sapient tests, the value of any given automated test will asymptotically approach zero. Meanwhile its cost of maintenance and execution will only increase.

I won’t go so far as to say that test automation is never worth it. I’m sure there are situations where it is. However, given the precious opportunity cost involved, I think it’s called for less frequently than is generally believed.