Monthly Archives: July 2010

Interview with a CEO

“So, tell me: How are you going to guarantee the accuracy and integrity of the data?” he asked.

I glanced at the clock on the wall: 2:25 p.m. The CEO and I had been talking since 2:00, and he had to be at his next meeting in 5 minutes.

I felt frozen, like a tilted pinball machine. For a moment I wasn’t even sure I’d heard the question right. He couldn’t seriously be asking a tester for… whaaa??? I could feel my adrenal glands dumping their contents into my blood stream.

“This is the moment,” I thought. “The point when this interview goes South.”

Part of me wanted to simply stand up, shake the CEO’s hand, thank him for the opportunity, and walk out. I could still salvage a nice afternoon before I had to be back at the airport.

Time seemed to slow to an agonizing crawl. Involuntarily, I pondered the previous 12 hours…

2:45AM Wake up, shower… 3AM Dress up in suit and tie (20 minutes devoted to fighting with tie)… 3:45AM Drive to airport… 5AM Sit in terminal… 6AM Board flight to San Francisco… 9AM Arrive SFO… 9:15AM Sit in (completely stationary) BART train… 10:00AM Miss Caltrain connection… 10:30AM Arrive at office, thanks to a ride from their helpful administrative assistant… 10:45AM Interview with the head of products… 11:30AM Interview with the head of development… 12:15PM Lunch… 2:00PM Interview with CEO…

As the epinephrine circulated through my body, creating a sensation akin to somersaulting backwards, I began to feel resentful. I’d flown there on my own dime, after having already talked with these guys by phone for several hours. I was under the impression that the trip would be more of a “meet & greet the team” social hour. Not a repeat of the entire interview process, from square one. The Head of Products had given me several assurances that I was his top choice and that they’d only be asking me to fly out if the position were essentially mine to refuse.

So, there I was. The CEO sat across the table from me, expecting an answer.

What I wanted to say was that I was in no position to guarantee anything of the sort, given my radical ignorance of the data domain, the data’s source(s), the sources’ track record(s) for accuracy, or how the data get manipulated by the in-house systems.

What I wanted to say was that his question was prima facie absurd. That I, as a tester, couldn’t “guarantee” anything other than that I would use my skills and experience to find as many of the highest risk issues as quickly as possible in the given time frame. However, when you’re dealing with any black box, you can’t guarantee that you’ve found all the problems. Certainty is not in the cards.

What I wanted to say was that anyone who sat in front of the CEO claiming that they could guarantee the data’s accuracy and integrity was clearly a liar and should be drummed out of the profession of software testing.

I wanted to say all that and more, but I didn’t. Given the day’s exhausting schedule, all these thoughts were little more than fleeting, inchoate, nebulous impressions. Plus, it seemed highly unlikely that the CEO, who struck me as an impatient man (your typical “Type A” personality), would be interested in spending the remaining 4 or 5 minutes discussing epistemology with me. Honestly, I’m not sure what I said, exactly. The question, and the CEO’s demeanor while asking it, had drained away any enthusiasm I had for the position. In all likelihood, my response was along the lines of “I have no idea how to answer that question.”

Whatever I said, it was obviously not how to impress an MBA from Wharton. I didn’t get offered the job.


Irreverence Versus Arrogance

Everything sacred is a tie, a fetter.
— Max Stirner

I am an irreverent guy. I’m a fan of South Park and QA Hates You, for example. Furthermore, I think it’s important–nay, essential–for software testers to cultivate a healthy irreverence. Nothing should be beyond question or scrutiny. “Respecting” something as “off limits” (also known as dogmatism) is bound to lead to unexamined assumptions, which in turn can lead to missed bugs and lower quality software. If anything, I think testers should consider themselves akin to the licensed fools of the royal court: Able–and encouraged–to call things as they see them and, especially, to question authority.

Contrast that with arrogance–an attitude often confused with irreverence. The distinction between them may be subtle, but it is key. Irreverence and humility are not mutually exclusive, whereas arrogance involves a feeling of smug superiority; a sense that one is “right.” Arrogance thus contains a healthy dose of dogmatism. The irreverent, on the other hand, are comfortable with the possibility that they’re wrong. They question all beliefs, including their own. The arrogant only question the beliefs of others.

I pride myself (yes, I am being intentionally ironic, here) on knowing this difference. So, it pains me to share the following email with you. It’s an embarrassing example of a moment when I completely failed to keep the distinction in mind. Worse, I had to re-read it several times before I could finally see that my tone was indeed arrogant, not irreverent, as I intended it. I’ll spare you my explanations and rationalizations about how and why this happened (though I have a bunch, believe me!).

The email–reproduced here unmodified except for some re-arranging, to improve clarity–was meant only for the QA team, not the 3rd-party developer of the system. In a comedy of errors and laziness it ended up being sent to them anyway. Sadly, I think its tone ensured that none of the ideas for improvements were implemented.

After you’ve read the email, I invite you to share any thoughts you have about why it crosses the line from irreverence into arrogance. Naked taunts are probably appropriate, too. On the other hand, maybe you’ll want to tell me I’m wrong. It really isn’t arrogant! I won’t hold my breath.

Do you have any stories of your own where you crossed the line and regretted it later?

The user interface for OEP has lots of room for improvement (I’m trying to be kind).

Below are some of my immediate thoughts while looking at the OEP UI for the front page. (I’ll save thoughts on the other pages for later)

1. Why does the Order Reference Number field not allow wildcards? I think it should, especially since ORNs are such long numbers.

2. Why can you not simply click a date and see the orders created on that date? The search requires additional parameters. Why? (Especially if the ORN field doesn’t allow wildcards!)

3. Why, when I click a date in the calendar, does the entire screen refresh, but a search doesn’t actually happen? I have to click the Search button. This is inconsistent with the way the Process Queue drop down works. There, when I select a new queue, it shows me that instantly. I don’t have to click the “Get Orders” button.

5. What does “Contact Name” refer to? When is anyone going to search by “Contact Name”? I don’t even know what a Contact Name is! Is it the patient? Is it the OEP user???

Click for full size

Click for full size

4. In fact, I *never* have to click the Get Orders button. Why is it even there on the screen?

6. Why waste screen space with a “Select” column (with the word “Select” repeated over and over again–this is UGLY) when you could eliminate that column and make the Order Reference number clickable? That would conserve screen space.

7. Why does OEP restrict the display list to only 10 items? It would be better if it allowed longer lists, so that there wouldn’t need to be so much searching around.

8. Why are there “View Notes” links for every item, when most items don’t have any notes associated with them? It seems like the View Notes link should only appear for those records that actually have notes.

9. Same question as above, for “Show History Records”.

10. Also, why is it “Show History Records” instead of just “History”, which would be more elegant, given the width of the column?

11. Speaking of that, why not just have “History” and “Notes” as the column headers, and pleasant icons in those rows where History or Notes exist? That would be much more pleasing to the eye.

Click for full size

Click for full size

12. In the History section, you have a “Record Comment” column and an “Action Performed” column. You’ll notice that there is NEVER a situation where the “Action Performed” column shows any useful information beyond what you can read in the “Record Comment” field. Why include something on the screen if it’s not going to provide useful information to the user?

For example:

Record Comment: Order checked out by user -TSIAdmin-
Action Performed: CheckOut

That is redundant information.

In addition to that, in this example the Record Create User ID field says “TSIAdmin”. That’s more redundant information.

There must be some other useful information that can be put on this screen.

13. Why does the History list restrict the display to only 5 items? Why not 20 items? Why not give the user the option to “display all on one page”?

Click for full size

Click for full size

14. In Notes section of the screen, the column widths seem wrong. The Date and User ID columns are very wide, leaving lots of white space on the screen.


Estimating Testing Times: Glorified Fortune-Telling?


Hofstadter’s Law:
It always takes longer than you
expect, even when you take
into account Hofstadter’s Law.

Douglas Hofstadter

A good friend of mine is a trainer for CrossFit, and has been for years. For a long time he trained clients out of his house, but his practice started outgrowing the space. His neighbors were complaining about the noise (if you’ve ever been in a CrossFit gym you can easily imagine that they had a point). Parking was becoming a problem, too.

So, in September, 2009, he rented a suite for a gym, in a building with an excellent location and a gutted interior–perfect for setting up the space exactly how he wanted it. It needed new flooring, plumbing, framing, drywall, venting, insulation, dropped ceiling, electricity, and a few other minor things. At the time, he told me they’d be putting the finishing touches on the build-out by mid-December. I remember thinking, “Wow. Three months. That’s a long time.”

As it turned out, construction wasn’t completed until late June, 2010, Seven months later than originally estimated.

Let’s think about that. Here’s a well-defined problem, with detailed plans (with drawings and precise measurements, even!) and a known scope, not prone to “scope creep.” The technology requirements for this kind of project are, arguably, on the low side–and certainly standardized and familiar. The job was implemented by skilled, experienced professionals, using specialized, efficiency-maximizing tools. And yet, it still took more than 3 times longer than estimated.

Contrast that with a software project. Often the requirements are incomplete, but even when they’re not, they’re still written in words, which are inherently ambiguous. What about tools? Sometimes even those have to be built, or existing tools need to be customized. And the analogy breaks down completely when you try to compare writing a line of code (or testing it) with, for example, hanging a sheet of drywall. Programmers are, by definition, attempting something that has never been done before. How do you come up with reasonable estimates in this situation?

This exact question was asked in an online discussion forum recently. A number of self-described “QA experts” chimed in with their answers. These all involved complex models, assumptions, and calculations based on things like “productivity factors,” “data-driven procedures,” “Markov chains,” etc. My eyes glazed over as I read them. If they weren’t all committing the Platonic fallacy then I don’t know what it is.

Firstly, at the start of any software project you are, as Jeffrey Friedman puts it, radically ignorant. You do not know what you do not know. The requirements are ambiguous and the code hasn’t even been written yet. This is still true for updates to existing products. You can’t be certain what effect the new features will have on the existing ones, or how many bugs will be introduced by re-factoring the existing features. How can you possibly know how many test cases you’re going to need to run? Are you sure you’re not committing the Ludic Fallacy when you estimate the “average time” per test case? Even if you’ve found the perfect estimation model (and how would you know this?), your inputs for it are bound to be wrong.

To attempt an estimate in that situation is to claim knowledge that you do not possess. Is that even ethical?

Secondly, your radical ignorance goes well beyond what the model’s inputs should be. What model takes into account events like the following (all of which actually happened, on projects I’ve been a part of)?

  1. The database containing the company’s live customer data–all of it–is inadvertently deleted by a programmer who thought at the time that he was working in the developer sandbox.
  2. The Director of Development, chief architect of the project, with much of the system design and requirements kept only in his head, fails to appear at work one day. Calls to his home go unanswered for two weeks. When someone finally gets in touch with him he says he won’t be coming back to work.
  3. A disgruntled programmer spends most of his time putting derogatory easter eggs in the program instead of actually working. When found by a particularly alert tester (sadly I can’t claim it was me) the programmer is fired.
  4. A version of the product is released containing an egregious bug, forcing the company to completely reassess  its approach to development (and blame the testers for missing the “obvious” bug, which then destroys morale and prompts a tester to quit).
  5. The company’s primary investor is indicted for running a ponzi scheme. The majority of the employees are simply let go, as there is not enough revenue from sales to continue to pay them.

The typical response from the “experts” has been, “Well, that’s where the ‘fudge factor’ comes in, along with the constant need to adjust the estimate while the project is underway.”

To that I ask, “Isn’t that just an implicit admission that estimates are no better than fortune-telling?”

I heard from Lynn McKee recently that Michael Bolton has a ready answer when asked to estimate testing time: “Tell me what all the bugs will be, first, then I can tell you how long it will take to test.”

I can’t wait to use that!


Testing’s Quiet Evidence

Since my post about Nassim Taleb’s The Black Swan I’ve continued to muse about the book and its ideas. In particular, while thinking about the notion of “silent evidence,” I realized there was a connection to testing that I hadn’t noticed before. I won’t flatter myself by thinking I missed the link due to some inherent subtlety in the concept. More likely it’s because I already associated the idea with something else: get-rich-quick schemes. Bear with me for a minute while I explain…

A few years ago I was consumed with debunking network marketing companies and real estate investing scams (If you’re genuinely curious why, I explain it all here). My efforts were focused primarily on a real estate “guru” who lives nearby, in Glendale, AZ. As with most con artists, his promotional materials include dozens of narrative fallac–uh… testimonials from former “students” who claim they achieved “great success” using the investment methods he teaches.

Of course, what the “guru” doesn’t promote are the undoubtedly much higher number of  people who attended his “boot camps” or bought his materials and either, A) did nothing, or B) tried his techniques and lost money (On the rare occasions such people are even mentioned, there’s always a ready explanation for them: The blame lies not with the technique, but with the practitioner. Voilà! The scammer has just removed your ability to falsify his claims!). Taleb calls these people the “silent evidence.” To ignore them when evaluating a population (in this case, the customers of a particular guru) is to engage in survivorship bias and miscalculate what you’re trying to measure. Con artists of all stripes make millions encouraging their customers to do this.

Testers are not con artists (as a rule, I mean), but we do have something that, while perhaps not silent, should be considered at least very quiet. In contrast to the scammers, it’s not to our advantage that it stay quiet. In fact, I’m starting to wonder if keeping it quiet is not at least in part to blame for some of the irrational practices you find in dysfunctional test teams, such as the obsession with test cases.

What am I talking about?

I am referring, dear reader, to all the bugs that were found and fixed prior to release. All those potentially serious issues that were neutralized before they could do any damage. No one thinks about them, because they don’t exist, except as forgotten items in a database no one cares about any more. But they’re there–hundreds, maybe thousands of them–quietly paying tribute to averted disasters, maintained reputations, even saved money (hence, why I call them “quiet” rather than silent: they’re still there if you look for them).

Meanwhile, the released product is out in the world, exposing its inevitable and embarrassing flaws for all to see, prompting CEOs and sales teams to wonder, “What are those testers doing all day? Why aren’t they assuring quality?” Note that this reaction is precisely the survivorship bias I mentioned above. The error causes them to undervalue the test team, in a way exactly analogous to how dupes of the real estate gurus overvalue the guru.

Okay, so what to do about this? I confess that, as yet, I do not know. Right now all I can say is it behooves us as testers to come up with ways of better publicizing the bugs that we find–to turn our quiet evidence into actual evidence. As to how to go about that, well, I’m open for suggestions.