Estimating Testing Times: Glorified Fortune-Telling?

.

Hofstadter’s Law:
It always takes longer than you
expect, even when you take
into account Hofstadter’s Law.

Douglas Hofstadter

.
.
.
.
A good friend of mine is a trainer for CrossFit, and has been for years. For a long time he trained clients out of his house, but his practice started outgrowing the space. His neighbors were complaining about the noise (if you’ve ever been in a CrossFit gym you can easily imagine that they had a point). Parking was becoming a problem, too.

So, in September, 2009, he rented a suite for a gym, in a building with an excellent location and a gutted interior–perfect for setting up the space exactly how he wanted it. It needed new flooring, plumbing, framing, drywall, venting, insulation, dropped ceiling, electricity, and a few other minor things. At the time, he told me they’d be putting the finishing touches on the build-out by mid-December. I remember thinking, “Wow. Three months. That’s a long time.”

As it turned out, construction wasn’t completed until late June, 2010, Seven months later than originally estimated.

Let’s think about that. Here’s a well-defined problem, with detailed plans (with drawings and precise measurements, even!) and a known scope, not prone to “scope creep.” The technology requirements for this kind of project are, arguably, on the low side–and certainly standardized and familiar. The job was implemented by skilled, experienced professionals, using specialized, efficiency-maximizing tools. And yet, it still took more than 3 times longer than estimated.

Contrast that with a software project. Often the requirements are incomplete, but even when they’re not, they’re still written in words, which are inherently ambiguous. What about tools? Sometimes even those have to be built, or existing tools need to be customized. And the analogy breaks down completely when you try to compare writing a line of code (or testing it) with, for example, hanging a sheet of drywall. Programmers are, by definition, attempting something that has never been done before. How do you come up with reasonable estimates in this situation?

This exact question was asked in an online discussion forum recently. A number of self-described “QA experts” chimed in with their answers. These all involved complex models, assumptions, and calculations based on things like “productivity factors,” “data-driven procedures,” “Markov chains,” etc. My eyes glazed over as I read them. If they weren’t all committing the Platonic fallacy then I don’t know what it is.

Firstly, at the start of any software project you are, as Jeffrey Friedman puts it, radically ignorant. You do not know what you do not know. The requirements are ambiguous and the code hasn’t even been written yet. This is still true for updates to existing products. You can’t be certain what effect the new features will have on the existing ones, or how many bugs will be introduced by re-factoring the existing features. How can you possibly know how many test cases you’re going to need to run? Are you sure you’re not committing the Ludic Fallacy when you estimate the “average time” per test case? Even if you’ve found the perfect estimation model (and how would you know this?), your inputs for it are bound to be wrong.

To attempt an estimate in that situation is to claim knowledge that you do not possess. Is that even ethical?

Secondly, your radical ignorance goes well beyond what the model’s inputs should be. What model takes into account events like the following (all of which actually happened, on projects I’ve been a part of)?

  1. The database containing the company’s live customer data–all of it–is inadvertently deleted by a programmer who thought at the time that he was working in the developer sandbox.
  2. The Director of Development, chief architect of the project, with much of the system design and requirements kept only in his head, fails to appear at work one day. Calls to his home go unanswered for two weeks. When someone finally gets in touch with him he says he won’t be coming back to work.
  3. A disgruntled programmer spends most of his time putting derogatory easter eggs in the program instead of actually working. When found by a particularly alert tester (sadly I can’t claim it was me) the programmer is fired.
  4. A version of the product is released containing an egregious bug, forcing the company to completely reassess  its approach to development (and blame the testers for missing the “obvious” bug, which then destroys morale and prompts a tester to quit).
  5. The company’s primary investor is indicted for running a ponzi scheme. The majority of the employees are simply let go, as there is not enough revenue from sales to continue to pay them.

The typical response from the “experts” has been, “Well, that’s where the ‘fudge factor’ comes in, along with the constant need to adjust the estimate while the project is underway.”

To that I ask, “Isn’t that just an implicit admission that estimates are no better than fortune-telling?”

I heard from Lynn McKee recently that Michael Bolton has a ready answer when asked to estimate testing time: “Tell me what all the bugs will be, first, then I can tell you how long it will take to test.”

I can’t wait to use that!

Share
Leave a comment ?

13 Comments.

  1. As I am too lazy to read your previous posts yet again, I’m depending on my likely faulty recollection when I offer up this question: what exactly is an example of a Black Swan in QA land?

    Not being very familiar with your field (except through your blog and your incessant palaver on the topic, fascinating of course), it would seem to me that software testing belongs in “Mediocrastan” not “Extremistan.” (I won’t bother to explain those terms as I think we both agree, everyone should read Taleb.) It would take one massively destructive bug to have effects severe enough to garner Black Swan status, and I can’t imagine what that may be. But I’m sure you can?!

    A final thought: Taleb at least manages to make money from his Black Swan ideas (or, it seems, afford an immense amount of leisure time for intellectual pursuits, punctuated with forays to the gym, where he can remind himself how much more brilliant he is than even his filthy rich fellow treadmill mates). I wonder how you will apply your insights into the limitations of software testing as you so cogently outlined in another post? In other words, are you offering up any solutions? Mere awareness won’t make these problems go away. (Taleb, remember, notes we all suffer from a lack of imagination, which I suppose is what it takes to come up with answers.)

    Yours affectionately. With lots of hugs and kisses.
    PW

    PS: I no longer like bell curves and it makes me sad.

  2. Pink,

    Thanks for your comments and questions! No doubt your tongue was planted firmly in your cheek as you were writing them, so take my reply in that spirit, as well.

    What exactly is an example of a Black Swan in QA land?

    Examples are legion.

    Aside from the Toyota Brake System failure, there have been bugs that allowed hackers to gain access to customer financial data, crashed airplanes…

    Google “bad software bugs” and see what I mean.

    …it would seem to me that software testing belongs in “Mediocrastan” not “Extremistan.”

    What gives you that impression?

    In other words, are you offering up any solutions?

    You must not have gotten to the part of The Black Swan where Taleb complains about people not liking his “negative advice”. I touched on it in the post you’re too lazy to re-read.

    My solution in this case: Don’t determine a software project’s release deadline based on predictions made by a model. We can’t predict!

    Beyond that, I fear, is “the province of the charlatan.”

  3. I think that more than any other lesson I learned in business school, the singlemost emphatic rule I’ve kept with me is this:

    “Halve your expected revenues; double your expected costs.”

    …as a guide to realistic expectations. And in fact when I learned it and as I’ve considered it a thousand times since, I am ultra-liberal on the interpretations of “revenues” and “costs”. Our most fundamental scarcity, always, is time (or attention, which is the same thing), and any economist or applied economist worth his salt will always interpret costs in terms of time.

    I interviewed a dozen general contractors, and worked on very detailed plans and lengthy discussions of the buildout for Arrowhead CrossFit, prior to finally hiring a contractor. For “contractor” we can neatly substitute “project manager”.

    I finally found a contractor whose qualifications trumped others, and whose estimate (proposed price) was the 2nd best we’d seen ($2500 over the lowest estimate). More importantly, he clearly showed the best understanding of the plans and attention to detail which he demonstrated to me far better than all the other contractors.

    He agreed with me that we could finish in the week after Christmas… by this time, it was the last week of November, and I wanted a target opening date of December 27th. He supported his target date with very detailed estimates of the time to complete each sub-step and inspection.

    I took my recommendation, after weeks of research, to my business partners. They cursorily, casually said No, because of the $2500 difference over the lowest estimate we had, a contractor who I decisively said would not work and was a bad choice. I tried to protest. I pointed out that a couple weeks’ of not being open would exceed the difference in cost. The lowest price estimate humbly said it would take him 4-5 months to do the buildout. Based on his experience — and on his ability — he was right.

    The rest of the buildout process was a calamitous farce of the contractor screwing things up, and the partners being unable to pay to keep the buildout on schedule.

    I believe that if we had hired the contractor I chose and recommended, we would have opened the gym by February 1st… allowing for an 8 week completion, where a 3.5 week completion was estimated.

    I think you were right throughout your article above, by the way. I agree with your assessments.

  4. Thanks for the comment, Cash!

  5. I think the issue is more of a perfect solution fallacy supported by two behaviors, the self-fulfilling prophecy and a Utopian adherence to an ideal. I’ve worked in a lot of companies where the advance estimates were nearly always met. Of course the estimates were either grossly over-inflated in the first place or the team began robbing Peter to pay Paul in order to make the project fit. When that didn’t work, it was assumed the estimates themselves were imperfect due to a miscalculations in the inputs rather than the entire process itself possibly being at fault.

  6. I read with much interest your discussion on LinkedIn on this topic. I wondered though out that debate, just exactly where the people worked how argued so passionately that you can get precise and accurate estimates by doing X, Y and Z. I’ve never worked in an environment like that in the last 27 years. In truth, I think you made your case very well. There comes a time where the metal meets the meat, and that is where most high-minded ideas and theories fall down.

    Invoking Michael Bolton, brilliant guy that he is, I was in a workshop he conducted with Fiona Charles on Estimates. The long and short of it was “You’ll know precisely how long it will take for you to finish a project when you’re done.” Estimates are made with glaring holes in knowledge and understanding. As you fill those holes, you will know if you need to do more or less work. I can’t recall any project I worked on where the initial estimate was more than we actually did – simply because our presumptions were wrong.

    One aside, and I leave you – Several years ago, a newly minted Project Management Professional came to my cube with a paper with half of one side typed up witha description of a project. He needed a “QA and Testing” estimate right then. I asked him a couple of questions, stroked my beard (an excellent technique to look thoughtful) and said “9 months.” He asked if I was serious, I explained my answer and he informed me that the normal “Industry Standard” was to use 1/3 of the development estimate for testing. Since the development estimate was only 60 hours, why was the test estimate so big? My response was “I did not realize you had a development estimate. That changes things a bit. I’d say 14 months then,” This change entailed not only a couple of “minor” software changes, it entailed a complete revision around the business rules and processes and that needed to be included since the “QA Team” also had the accountability to assist in Acceptance Testing. I was told by certain manager types that I was useing a flawed approach and no way would it take either 9 or 14 months. They were right – it took 18 months before the bugs were resolved and the departments inpacted all agreed the change was “ready.”

  7. Thanks for the comment, Cowboy Tester. My favorite (and I wish there were a “sarcasm” font) is when the deadline looms and the team is expected to pull 20+ hours/week of overtime to meet it. There’s often an implication (if not outright declaration) that the “lateness” is the team’s fault.

  8. Pete, thanks for sharing and giving me a good laugh!

  9. Having spent some years in the trenches with you, I know how annoying estimation, and it’s affect on the schedule, can be.
    More importantly, I was trying to figure out which example belonged to PM/Buzzsaw, etc. I thought number 2, but number 4 sounded good too…

  10. Hey Bry!

    Actually, none of the items listed happened at Buzzsaw, though I’m sure we could easily come up with several things worthy of inclusion.

  11. Really? None? I don’t know whether to laugh or cry.

  12. Cash: The rule of thumb I once heard for improving a development time estimate is double the number and increase the units. When I hear 2 weeks, I think four months.

  13. I should add that using that rule of thumb I have never found the corrected estimate to be less accurate than the original.

Leave a Comment


NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>