Category Archives: Uncategorized

Testers Cannot Be Perfect Proxies for End Users

The following is a snippet of a Skype conversation I was a part of recently. I wanted to share it, since I think it’s a perfect example of a perennial tester conversation. See if you recognize yourself in it. The participants’ names have been removed in the interest of privacy. Other minor changes were made for the sake of clarity.

[11:25:37 AM] Dev Mgr: Programmer– i really need you to look into that “flash wont resize without a reload” fact
[11:25:42 AM] Dev Mgr: its not being believed [by the CEO]
[11:25:50 AM] Dev Mgr: and quite honestly
[11:26:02 AM] Dev Mgr: i went to some other pages that clearly have graphs on them
[11:26:07 AM] Dev Mgr: that dont need to be reloaded
[11:26:29 AM] Dev Mgr: when they get resized
[11:26:29 AM] Programmer: i’ll read fusionchart docs
[11:26:32 AM] Dev Mgr: thats
[11:26:33 AM] Dev Mgr: thanks
[11:26:40 AM] Dev Mgr: also Tester 1/Tester 2
[11:26:49 AM] Dev Mgr: i’m not really sure how that was not part of a QA
[11:27:08 AM] Dev Mgr: when you are looking for UI stuff before a release
[11:27:16 AM] Dev Mgr: not everyone has a big ass monitor
[11:27:25 AM] Dev Mgr: please add qa on your laptop screen to the test
[11:27:49 AM] Dev Mgr: this is not a successful launch
[11:28:18 AM] Tester 1: I agree that it should be added as a regular part of our tests–NOW that we know it’s importantl.
[11:28:34 AM] Tester 1: There are any number of things that one could complain about with these pages.
[11:28:59 AM] Tester 1: I strongly suspect that, had Tester 2 or I complained about this to Programmer prior to the launch we would have been seen as horrible nit pickers.
[11:29:16 AM] Dev Mgr: What??????????????
[11:29:26 AM] Tester 1: “Why are you complaining about something so trivial when we’re trying to get this out?”
[11:29:28 AM] Dev Mgr: how can the UI not be improtant
[11:29:34 AM] Tester 1: I’m not saying it’s not.
[11:29:43 AM] Tester 1: I’m saying that there are MANY things that one could pick to complain about.
[11:29:49 AM] Tester 1: The colors are one.
[11:29:58 AM] Tester 1: The fact that many of the graphs don’t have proper labels is another.
[11:30:22 AM] Tester 1: I could come up with more.
[11:31:19 AM] Tester 1: Unfortunately, prior to today, we weren’t aware that resizing would be the important thing.
[11:32:46 AM] Programmer: i agree with Tester 1, on a scale of 1 to 10, resizing is low on the totem pole since there is an almost obvious solution vs. the colors
[11:33:09 AM] Dev Mgr: i strongly disagree
[11:34:38 AM] Tester 1: Dev Mgr, could you have predicted that CEO would be up-in-arms about resizing?
[11:34:53 AM] Tester 2: This was my responsibility, and I will take the hit for it, that’s fine. That being said, however, considering that there were over a dozen pieces to this [specification], and we, as a team, fixed things that were given to us incorrectly in the 1040 from CEO, I would consider this issue is not the barometer or whether or not this was a successful launch.
[11:34:53 AM] Dev Mgr: this isn’t about CEO
[11:35:02 AM] Dev Mgr: i went to BBW on my laptop
[11:35:06 AM] Dev Mgr: and i had to resize it
[11:35:11 AM] Dev Mgr: and it happened to me
[11:36:19 AM] Tester 1: No doubt. But the fact that it happened is distinct from whether or not it is an important enough issue to have said “Let’s hold off on release until this is resolved.”
[11:36:37 AM] Dev Mgr: i would not have released that
[11:36:51 AM] Dev Mgr: USER EXPERIENCE
[11:36:55 AM] Tester 1: I’m very glad that I, Tester 2, and Programmer now know this.
[11:37:05 AM] Dev Mgr: this makes us look bad
[11:37:07 AM] Dev Mgr: to others
[11:37:10 AM] Dev Mgr: its a window into us
[11:37:20 AM] Dev Mgr: people/ clients see this page
[11:37:21 AM] Dev Mgr: we’re a tech company
[11:37:28 AM] Dev Mgr: and we can’t get our allignment right?
[11:38:49 AM] Dev Mgr: you didn’t see it and decide whether or not to push it
[11:38:55 AM] Dev Mgr: you said you never tested for it
[11:39:07 AM] Dev Mgr: did you test in IE?
[11:39:26 AM] Tester 2: yes, but did not resize window from full laptop view
[11:39:37 AM] Tester 1: What I said was that I believe that, HAD we tested it, Programmer, Tester 2, and I would’ve all concluded that it wasn’t a big enough issue to stop the launch.
[11:39:59 AM] Dev Mgr: then we have a bigger issue to discuss separately
[11:40:14 AM] Dev Mgr: image is everything
[11:40:19 AM] Dev Mgr: we are part of a public company
[11:40:25 AM] Dev Mgr: this is not an internal tool
[11:40:37 AM] Dev Mgr: our visual output
[11:40:49 AM] Dev Mgr: is more important to some than the guts that go into it
[11:41:52 AM] Tester 1: I wonder if perhaps the deeper lesson learned here is that sometimes we *need* to get buy-in from CEO prior to going live with stuff.
[11:42:18 AM] Dev Mgr: this isnt’ about CEO
[11:42:29 AM] Dev Mgr: other than it sucks that he found it first
[11:42:54 AM] Tester 1: But many other people aside from CEO saw it and didn’t say anything about it being important.
[11:42:56 AM] Dev Mgr: if you have a question about a UI that you can decide you can use me as a gauge
[11:43:18 AM] Tester 1: My fundamental issue is that it’s very hard for us to be certain that we are truly aware of all the important things.
[11:43:20 AM] Dev Mgr: Tester 1…you all say this wasn’t tested
[11:43:25 AM] Dev Mgr: so that’s all that matters
[11:43:38 AM] Dev Mgr: not if you had what you would have done
[11:43:45 AM] Tester 1: Had we tested it and decided it wasn’t important we’d be in the same boat.
[11:44:07 AM] Tester 1: The fact that we didn’t test it probably means that if we had we wouldn’t have concluded it needed to be fixed prior to launch.
[11:44:49 AM] Tester 1: I’m trying to get at the deeper point:
[11:44:58 AM] Tester 1: We won’t always know what is important to the end users.
[11:49:26 AM] Dev Mgr: well if you dont think that this opens up some eyes to what needs to be accpetable to users than i will now have to approve everything with a UI component
[11:49:55 AM] Dev Mgr: and that is not the right answer…might i add
[11:53:17 AM] Tester 1: The best answer I can give you at this point: We have learned as a result of this. Our testing in the future will be better. Unfortunately we aren’t omniscient, and don’t have unlimited amounts of time, so I fear we will make other mistakes, which we will also learn from.

My Wish List for a Test Case Tracking Tool

Let’s talk for a bit about tracking test cases. Now, before we get bogged down in semantics and hair-splitting, let me point out that I’ve already made my case against them, as have others. I want to focus here on the tracking, so I’ll just speak broadly about the “test case” – which, for the present discussion, is meant to include everything from test “checks” to testing “scenarios“, or even test “charters“.

Speaking provisionally, it’s probably a good idea to keep track of what your team has tested, when it was tested, by whom, and what the results of the test were. Right? I’ll assume for now that this is an uncontroversial claim in the testing world. I’ll venture further and assert that it’s also probably a good idea to keep a record of the clever (and even no-so-clever) testing ideas that strike you from time-to-time but that you can’t do at the moment, for whatever reason. Even more riskily, I’ll assert that, in general, less documentation is preferable to more.

Assuming you agree with the previous paragraph, how are you tracking your sapient testing?

I’ve used a number of methods. They’ve each had their good and bad aspects. All have suffered from annoying problems. I’ll detail those here, then talk about my imagined “ideal” tool for the job, in the hopes that someone can tell me either a) “I’ll build that for you” (ha ha! I know I’m being wacky), or b) “It already exists and its name is <awesome tracking tool x>.”

MS Word (or Word Perfect or Open Office)

The good: Familiar to everyone. Flexible. I’ve used these only when required by (apparently horribly delusional) managers to do so.

The bad: Flat files. Organizational nightmare. I’ve never seen a page layout for a test case template that didn’t make me depressed or annoyed (perhaps this can be chalked up to a personal problem unique to me, though). Updates to “fields” require typing everything out manually, which is time consuming and error-prone.


The good: Familiar to everyone. Flexible. The matrix format lends itself to keeping things relatively organized and sortable. Easy to add new test cases right where they’re most logical by adding a new row where you want it.

The bad: It’s still basically a flat file with no easy way to track history or generate reports.  Long test descriptions look awful in the cells (though in some ways this can be seen as a virtue). Large matrices become unwieldy, encouraging the creation of multiple spreadsheets, which leads to organizational headaches.


The good: Flexible. Generally the wiki tool automatically stores document revision histories. Everyone is always on the same page about what and where the latest version is. Wikis are now sophisticated enough to link to definable bug lists (see Confluence and Jira, for example).

The bad: Still essentially flat files. Barely better than MS Word, really, except for the history aspect.

FileMaker Pro

The good: It’s an actual database! You can customize fields and page layout exactly how you want them without needing to be a DB and/or Crystal Reports expert. I was in love with FileMaker Pro when I used it, actually.

The bad: It’s been a long time since I’ve used it. I stopped when we discovered that it was prone to erasing records if you weren’t careful. I’m sure that bug has been fixed, but I haven’t had a mind to check back. It’s hard to do some things with it that I started seeing as necessary for true usability (I’ll get to those in my wish list below).


This is a proprietary, web-based database tool in use at “Mega-Corp.”

The good: It’s a database. It tracks both bugs and test cases, and links the two, as appropriate. Can store files related to the test case, if needed. Stores histories of the test cases, and you can attach comments to the test case, if needed.

The bad: Slow. Horrible UI. Test team relies on convoluted processes to get around the tool limitations.

Test Director

The good: A database. A tool designed specifically for testers, so it tracks everything we care about, including requirements, bugs, and test cases. It takes screen shots and automatically stores them, making it easy to “prove” that your test case has passed (or failed). Plus, it helps create your entry into the bug database when you fail a test case.

This tool really has come the closest to my ideal tool as anything I’ve used so far.

The bad: The UI for test set organization leaves a lot to be desired. It forces a particular framework that I don’t particularly agree with, though I can see why they made the choices they did. I also think it doesn’t need to be as complicated as they made it. It would be nice to be given the flexibility to strip out the stuff in the UI that I didn’t care about. Lastly, this tool is exorbitantly expensive! Yikes! For it to be useful at all you need to buy enough seats to cover the entire test team plus at least two, for the business analysts and programmers to have access.

My Imaginary Ideal Tool

What I want most…

I want a tool that organizes my tests for me! I want to be able to quickly add a new test to it at any time without worrying about “where to put it.” This is perhaps the biggest failing of flat files. Some tests just defy quick categorization, so they don’t easily “fit” anywhere in your list.

The database format takes care of this problem, to a large degree, to be sure. The tool will have fields that, among other things, specify the type of test (function, data, UI, integration, etc…), the location of the test, both in terms of the layout of the program from the user standpoint, or of what parts of the code it exercises, et cetera.

All that is great, but I’d like the system to go a step beyond that. I want it to have an algorithm that uses things like…

  • the date the test was last executed
  • the date the related source code was updated (note that this implies the tool should be linked to the programmers’ source control tool)
  • the perceived importance and/or risk level of the test and/or the function being tested
  • other esoteric stuff that takes too many words to explain here

I want it to use that algorithm to determine which test in the database is, at this moment, the most important test (or set of tests, if you choose) I could be running. I then want it to serve it up to me. When I accept it by putting it into a “testing” status, the system will know to serve up the next most important test to whatever tester comes along later. Same goes for when I pass or fail the given test. It “goes away” and is replaced by whatever the system has determined is now “most important” according to the heuristic.

The way I see it, what this does for me is free me from the hassle of document maintenance and worrying about test coverage.  The tests become like a set of 3×5 cards all organized according to importance. You can add more “cards” to the stack, as you think of them, and they’re organized for you. You may not have time to get through the whole “pile” before you run out of time, but at least you can be reasonably confident that the tests you did run were the “right” ones.

The other stuff…

Aside from “what I want most,” this list is in no particular order. It’s not exhaustive either, though I tried my best to cover the essentials. Obviously the tool should include all of the “good” items I’ve already listed above.

  • It should have a “small footprint mode” (in terms of both UI and system resources) so it can run while you’re testing (necessary so you can refer to test criteria, or take and store screen shots) but have a minimal impact on the actual test process.
  • As I said above it should link to the programmers’ source control tool, so that when the programmers check in updates to code it will flag all related test cases so you can run them again.
  • It should link to your bug tracking tool (this will probably require that the tool be your bug tracking database, too. Not ideal, but perhaps unavoidable).
  • It should make bug creation easy when a test has failed (by, e.g., filling out all the relevant bug fields with the necessary details automatically). Conversely, it should make test case creation easy when you’ve found a bug that’s not covered by existing test cases, yet.
  • It should be possible to create “umbrella” test scenarios that supersede other test cases, because those tests are included implicitly. In other words, if you pass one of these “über-cases,” the other test cases must be considered “passed” as well, because they’re inherent in the nature of the über-case. The basic idea here is that the tool should help you prevent avoidable redundancies in your testing efforts.
  • Conversely, the failing of a test case linked to one or more über-cases should automatically mark those über-cases as not testable.

I’d love comments and criticisms on all this. Please feel free to suggest things that you’d like to see in your own ideal tool. Maybe someone will actually be inspired to build it for us!


Likely Posting Rates for the Near Future

In May when I started this blog I was working the final weeks as a contractor at a soul-killing corporation on a testing project that was as mind-numbing as it was dysfunctional. I started the blog as a creative outlet for me; a means of venting my frustrations constructively, since I felt like nothing I said at “Mega-Corp” made any difference.

In addition, I thought, I’d soon be back on the job market. The blog might become a good extension to the tired and typical job-seeker’s resume and cover letter. I saw it as a potential means of showcasing my philosophy and thought processes, as well as my writing style and personality, outside the tight confines of a job interview.

I had no expectations beyond that. I figured blog traffic would max out at around a visit a week. Probably those would be my polite and supportive friends, whom I’d pester to check out my latest ramblings (even though they had no interest in testing, software, or epistemology).

Then something funny happened. As near as I can figure it, a friend tweeted about one of the posts. This tweet was apparently seen by Michael Bolton, who presumably read it, liked it, and also tweeted about it. Suddenly there were intelligent comments from strangers (and respected industry celebrities) who were located all over the world. Suddenly posts were being mentioned elsewhere and included in blog carnivals. Suddenly people other than me were tweeting my posts. Wow!!! Who knew there was a large and vibrant testing community out there? Who knew I actually said anything interesting? Suddenly I felt pressure to maintain a consistent output of new, interesting material.

My contract with Mega-Corp ended. Based on the sparse job prospects over the previous six months, I fully expected to be facing a long stretch of unemployment. I have significant savings, so the idea didn’t scare me. In fact, I was genuinely looking forward to it. Aside from now having ample time to write blog posts, I could engage with this newly discovered testing community via Twitter, their own blogs, LinkedIn, the Software Testing Club, and elsewhere. I could spend a few hours a day learning Ruby–something I’d wanted to do for a while but seemed never to have time for.

Although I went to one interview during the first week of unemployment at the behest of the staffing firm I’d been contracting with, I wasn’t particularly interested in looking for work. I jokingly referred to my unemployment as an “involuntary sabbatical.” What little effort I put toward a job search was haphazard and frivolous. The few job listings that turned up were basically of the sort that had been appearing for the previous several months. They fit into one of three categories:

  1. Positions for which I was overqualified
  2. Positions which I knew I could do but I’d never get the interview for, since they listed specific technical requirements I couldn’t in good conscience put on my resume or in my cover letter
  3. Positions that were the software development equivalent of Gitmo prisoner stress positions

Then something funny happened. On day 11 of the sabbatical I got an email from a headhunter asking if I were looking for work. I wrote back and said that I was. She called. We talked for about 20 minutes. I think most of that time was me saying that my technical skills didn’t match what they had on the list of requirements. She said “Let’s submit anyway.” I said, “Sure. What the hell?” I was convinced it would go nowhere and went back to the exercises in my Ruby book. Less than an hour later the headhunter called back and said that the company wanted to interview me the next day at 9 a.m. I said, “Sure. What the hell?”

Armed with the company’s name and address, I started the requisite Googling. I found out that the company’s culture included things like letting people bring their dogs to work, giving everyone a Nerf gun and, most importantly to me, no dress code (based on photos on the company’s blog, shorts and flip-flops were standard fare, so my Vibrams would fit right in). So far, so good. Even better, the company was apparently wildly profitable and newly purchased by a larger firm, also profitable. No worries about job evaporation due to investor indictment!

I’ve been on a lot of job interviews this year. In all of them I felt a lack of control, like I was being forced to justify myself or excuse myself. For this one, though, I decided to take a different tack, since I truly didn’t care if I got the job or not. I took a copy of the advertised job requirements with me and went through them line-by-line with the interviewer, saying “What do you mean by X? My current experience with it is limited. I have no doubt I can learn it, but if it’s really important to you, then I’m probably not your guy.” I must’ve said some variant of that a half dozen times. I felt like I was trying to talk them out of picking me.

Somehow the interview lasted three hours. They told me they were going to talk with two more people, but that they wanted to move fast on a decision, so I would know either way by the next day. I could tell they liked me. For my part, the company struck me as a happy place, and what they wanted to hire me for seems to have become my own career specialty: Use your skills and expertise to do whatever is necessary to create a test department where there is none. As I was driving home I was thinking, “Dammit! I may have to cut my sabbatical short.”

I got a call from the headhunter less than two hours later. They were offering me the job. They wanted me to start tomorrow, if I was willing. I agonized over the decision for most of the afternoon. Three to six months of taking it easy, blogging, and learning Ruby, while looking for the perfect job–I had a really hard time giving up this romantic notion, but it seemed like the perfect job had already arrived, just way ahead of schedule. What if I turned it down and the next one didn’t come along for another year, well after my savings had evaporated?

I took the job.

This post has turned into something much more long-winded and shamelessly self-indulgent than I imagined it would be. Thanks for putting up with it. My only point has been to explain that my new job responsibilities over the coming weeks will probably sap my time and my creative energies. The testing problem I’ve been given is very interesting, and I need to focus on how to solve it. So, for the next few weeks, at least, there’s little chance I’ll be writing a post per week. I can’t imagine, though, that it will be too long before I feel a strong urge to vent again.


Requirements: Placebo or Panacea?

Perhaps the definitive commentary on "requirements"I vividly remember the days when I matured as a tester. I was the fledgling manager of a small department of folks who were equal parts tester and customer support. The company itself was staffed by primarily young, enthusiastic but inexperienced people, perhaps all given a touch of arrogance by the company’s enormous profitability.

We had just released a major update to our software–my first release ever as a test manager–and we all felt pretty good about it. For a couple days. Then the complaints started rolling in.

“Why didn’t QA find this bug?” was a common refrain. I hated not having an answer.

“Well… uh… No one told me the software was supposed to be able to do that. So how could we know to test for it? We need more detailed requirements!” (I was mindful of the tree cartoon, which had recently been shared around the office, to everyone’s knowing amusement.)

The programmers didn’t escape the inquisition unscathed, either. Their solution–and I concurred–was, “We need a dedicated Project Manager!”

Soon we had one. In no time, the walls were papered with PERT charts. “Critical path” was the new buzzword–and, boy, did you want to stay off that thing!

You couldn’t help but notice that the PERT charts got frequent revisions. They were pretty, and they gave the impression that things were well-in-hand; our path forward was clear. But they were pretty much obsolete the day after they got taped to the wall. “Scope creep” and “feature creep” were new buzzwords heard muttered around the office–usually after a meeting with the PM. I also found it odd that the contents of the chart would change, but somehow the target release date didn’t move.

As for requirements, soon we had technical specs, design specs, functional specs, specs-this, specs-that… Convinced that everything was going to be roses, I was off and running, creating test plans, test cases, test scripts, bla bla…

The original target release date came and went, and was six months gone before the next update was finally shipped. Two days later? You guessed it! Customers calling and complaining about bugs with things we’d never thought to test.

Aside from concluding that target release dates and PERT charts are fantasies, the result of all this painful experience was that I came to really appreciate a couple things. First, it’s impossible for requirements documents to be anything close to “complete” (yes, a heavily loaded word if I’ve ever seen one. Let’s provisionally define it as: “Nothing of any significance to anyone important has been left out”). Second, having document completeness as a goal means spending time away from other things that are ultimately more important.

Requirements–as well as the team’s understanding of those requirements–grow and evolve throughout the project. This is unavoidable, and most importantly it’s okay.

Apparently, though, this is not the only possible conclusion one can reach after having such experiences.

Robin F. Goldsmith, JD, is a guy who has been a software consultant since 1982, so I hope he’s seen his fair share of software releases. Interestingly, he asserts here that “[i]nadequate requirements overwhelmingly cause most project difficulties, including ineffective ROI and many of the other factors commonly blamed for project problems.” Here, he claims “The main reason traditional QA testing overlooks risks is because those risks aren’t addressed in the system design… The most common reason something is missing in the design is that it’s missing in the requirements too.”

My reaction to these claims is: “Oh, really?”

How do you define “inadequate” in a way that doesn’t involve question begging?
How do you know it’s the “main” reason?
What do you mean by “traditional QA testing”?

Goldsmith addresses that last question with a bit of a swipe at the context-driven school:

Many testers just start running spontaneous tests of whatever occurs to them. Exploratory testing is a somewhat more structured form of such ad hoc test execution, which still avoids writing things down but does encourage using more conscious ways of thinking about test design to enhance identification of tests during the course of test execution. Ad hoc testing frequently combines experimentation to find out how the system works along with trying things that experience has shown are likely to prompt common types of errors.

Spontaneous tests often reveal defects, partly because testers tend to gravitate toward tests that surface commonly occurring errors and partly because developers generally make so many errors that one can’t help but find some of them. Even ad hoc testing advocates sometimes acknowledge the inherent risks of relying on memory rather than on writing, but they tend not to realize the approach’s other critical limitations.

By definition, ad hoc testing doesn’t begin until after the code has been written, so it can only catch — but not help prevent — defects. Also, ad hoc testing mainly identifies low-level design and coding errors. Despite often being referred to as “contextual” testing, ad hoc methods seldom have suitable context to identify code that is “working” but in the service of erroneous designs, and they have even less context to detect what’s been omitted due to incorrect or missing requirements.

I’m not sure where Goldsmith got his information about the Context-driven approach to testing, but what he’s describing ain’t it! See here and here for much better descriptions.

Goldsmith contrasts “traditional QA testing” with something he calls “proactive testing.” Aside from “starting early by identifying and analyzing the biggest risks,” the proactive tester

…enlists special risk identification techniques to reveal many large risks that are ordinarily overlooked, as well as the ones that aren’t. These test design techniques are so powerful because they don’t merely react to what’s been stated in the design. Instead, these methods come at the situation from a variety of testing orientations. A testing orientation generally spots issues that a typical development orientation misses; the more orientations we use, the more we tend to spot. [my emphasis]

What are these “special risk identification techniques”? Goldsmith doesn’t say. To me, this is an enormous red flag that we’re probably dealing with a charlatan, here. Is he hoping that desperate people will pay him to learn what these apparently amazing techniques are?

His advice for ensuring a project’s requirements are “adequate” is similarly unhelpful. As near as I can figure it, reading his article, his solution amounts to making sure that you know what the “REAL” [emphasis Goldsmith] requirements are at the start of the project, so you don’t waste time on the requirements that aren’t “REAL”.

Is “REAL” an acronym for something illuminating? No. Goldsmith says he’s capitalizing to avoid the possibility of page formatting errors. He defines it as “real and right” and “needed” and “the requirements we end up with” and “business requirements.” Apparently, then, “most” projects that have difficulties are focusing on “product requirements” instead of “business requirements.”

Let’s say that again: To ensure your requirements are adequate you must ensure you’ve defined the right requirements.

I see stuff like this and start to wonder if perhaps my reading comprehension has taken a sudden nose dive.


Interview with a CEO

“So, tell me: How are you going to guarantee the accuracy and integrity of the data?” he asked.

I glanced at the clock on the wall: 2:25 p.m. The CEO and I had been talking since 2:00, and he had to be at his next meeting in 5 minutes.

I felt frozen, like a tilted pinball machine. For a moment I wasn’t even sure I’d heard the question right. He couldn’t seriously be asking a tester for… whaaa??? I could feel my adrenal glands dumping their contents into my blood stream.

“This is the moment,” I thought. “The point when this interview goes South.”

Part of me wanted to simply stand up, shake the CEO’s hand, thank him for the opportunity, and walk out. I could still salvage a nice afternoon before I had to be back at the airport.

Time seemed to slow to an agonizing crawl. Involuntarily, I pondered the previous 12 hours…

2:45AM Wake up, shower… 3AM Dress up in suit and tie (20 minutes devoted to fighting with tie)… 3:45AM Drive to airport… 5AM Sit in terminal… 6AM Board flight to San Francisco… 9AM Arrive SFO… 9:15AM Sit in (completely stationary) BART train… 10:00AM Miss Caltrain connection… 10:30AM Arrive at office, thanks to a ride from their helpful administrative assistant… 10:45AM Interview with the head of products… 11:30AM Interview with the head of development… 12:15PM Lunch… 2:00PM Interview with CEO…

As the epinephrine circulated through my body, creating a sensation akin to somersaulting backwards, I began to feel resentful. I’d flown there on my own dime, after having already talked with these guys by phone for several hours. I was under the impression that the trip would be more of a “meet & greet the team” social hour. Not a repeat of the entire interview process, from square one. The Head of Products had given me several assurances that I was his top choice and that they’d only be asking me to fly out if the position were essentially mine to refuse.

So, there I was. The CEO sat across the table from me, expecting an answer.

What I wanted to say was that I was in no position to guarantee anything of the sort, given my radical ignorance of the data domain, the data’s source(s), the sources’ track record(s) for accuracy, or how the data get manipulated by the in-house systems.

What I wanted to say was that his question was prima facie absurd. That I, as a tester, couldn’t “guarantee” anything other than that I would use my skills and experience to find as many of the highest risk issues as quickly as possible in the given time frame. However, when you’re dealing with any black box, you can’t guarantee that you’ve found all the problems. Certainty is not in the cards.

What I wanted to say was that anyone who sat in front of the CEO claiming that they could guarantee the data’s accuracy and integrity was clearly a liar and should be drummed out of the profession of software testing.

I wanted to say all that and more, but I didn’t. Given the day’s exhausting schedule, all these thoughts were little more than fleeting, inchoate, nebulous impressions. Plus, it seemed highly unlikely that the CEO, who struck me as an impatient man (your typical “Type A” personality), would be interested in spending the remaining 4 or 5 minutes discussing epistemology with me. Honestly, I’m not sure what I said, exactly. The question, and the CEO’s demeanor while asking it, had drained away any enthusiasm I had for the position. In all likelihood, my response was along the lines of “I have no idea how to answer that question.”

Whatever I said, it was obviously not how to impress an MBA from Wharton. I didn’t get offered the job.


Irreverence Versus Arrogance

Everything sacred is a tie, a fetter.
— Max Stirner

I am an irreverent guy. I’m a fan of South Park and QA Hates You, for example. Furthermore, I think it’s important–nay, essential–for software testers to cultivate a healthy irreverence. Nothing should be beyond question or scrutiny. “Respecting” something as “off limits” (also known as dogmatism) is bound to lead to unexamined assumptions, which in turn can lead to missed bugs and lower quality software. If anything, I think testers should consider themselves akin to the licensed fools of the royal court: Able–and encouraged–to call things as they see them and, especially, to question authority.

Contrast that with arrogance–an attitude often confused with irreverence. The distinction between them may be subtle, but it is key. Irreverence and humility are not mutually exclusive, whereas arrogance involves a feeling of smug superiority; a sense that one is “right.” Arrogance thus contains a healthy dose of dogmatism. The irreverent, on the other hand, are comfortable with the possibility that they’re wrong. They question all beliefs, including their own. The arrogant only question the beliefs of others.

I pride myself (yes, I am being intentionally ironic, here) on knowing this difference. So, it pains me to share the following email with you. It’s an embarrassing example of a moment when I completely failed to keep the distinction in mind. Worse, I had to re-read it several times before I could finally see that my tone was indeed arrogant, not irreverent, as I intended it. I’ll spare you my explanations and rationalizations about how and why this happened (though I have a bunch, believe me!).

The email–reproduced here unmodified except for some re-arranging, to improve clarity–was meant only for the QA team, not the 3rd-party developer of the system. In a comedy of errors and laziness it ended up being sent to them anyway. Sadly, I think its tone ensured that none of the ideas for improvements were implemented.

After you’ve read the email, I invite you to share any thoughts you have about why it crosses the line from irreverence into arrogance. Naked taunts are probably appropriate, too. On the other hand, maybe you’ll want to tell me I’m wrong. It really isn’t arrogant! I won’t hold my breath.

Do you have any stories of your own where you crossed the line and regretted it later?

The user interface for OEP has lots of room for improvement (I’m trying to be kind).

Below are some of my immediate thoughts while looking at the OEP UI for the front page. (I’ll save thoughts on the other pages for later)

1. Why does the Order Reference Number field not allow wildcards? I think it should, especially since ORNs are such long numbers.

2. Why can you not simply click a date and see the orders created on that date? The search requires additional parameters. Why? (Especially if the ORN field doesn’t allow wildcards!)

3. Why, when I click a date in the calendar, does the entire screen refresh, but a search doesn’t actually happen? I have to click the Search button. This is inconsistent with the way the Process Queue drop down works. There, when I select a new queue, it shows me that instantly. I don’t have to click the “Get Orders” button.

5. What does “Contact Name” refer to? When is anyone going to search by “Contact Name”? I don’t even know what a Contact Name is! Is it the patient? Is it the OEP user???

Click for full size

Click for full size

4. In fact, I *never* have to click the Get Orders button. Why is it even there on the screen?

6. Why waste screen space with a “Select” column (with the word “Select” repeated over and over again–this is UGLY) when you could eliminate that column and make the Order Reference number clickable? That would conserve screen space.

7. Why does OEP restrict the display list to only 10 items? It would be better if it allowed longer lists, so that there wouldn’t need to be so much searching around.

8. Why are there “View Notes” links for every item, when most items don’t have any notes associated with them? It seems like the View Notes link should only appear for those records that actually have notes.

9. Same question as above, for “Show History Records”.

10. Also, why is it “Show History Records” instead of just “History”, which would be more elegant, given the width of the column?

11. Speaking of that, why not just have “History” and “Notes” as the column headers, and pleasant icons in those rows where History or Notes exist? That would be much more pleasing to the eye.

Click for full size

Click for full size

12. In the History section, you have a “Record Comment” column and an “Action Performed” column. You’ll notice that there is NEVER a situation where the “Action Performed” column shows any useful information beyond what you can read in the “Record Comment” field. Why include something on the screen if it’s not going to provide useful information to the user?

For example:

Record Comment: Order checked out by user -TSIAdmin-
Action Performed: CheckOut

That is redundant information.

In addition to that, in this example the Record Create User ID field says “TSIAdmin”. That’s more redundant information.

There must be some other useful information that can be put on this screen.

13. Why does the History list restrict the display to only 5 items? Why not 20 items? Why not give the user the option to “display all on one page”?

Click for full size

Click for full size

14. In Notes section of the screen, the column widths seem wrong. The Date and User ID columns are very wide, leaving lots of white space on the screen.


Assumptions Are Dangerous Things

Sometimes – heh, usually – your product oracles are vague.  There’s little or no user manual, requirements specification, online help, tool tips, etc. In these situations I generally become “Annoying Question Man,” constantly badgering the programmers, project managers, sales team, or anyone else, asking: “How’s this thing supposed to work?” I’ve learned from hard experience that assumptions about expectations can and will come back to bite you and quite often, unfortunately, it’s difficult to know that you’re even making them.

It seems I am doomed to learn this particular lesson over and over again.

Last week I took a trip to the Great Smoky Mountains of Tennessee. My sole purpose for going there was to see their synchronous fireflies. Sadly, I was late to the show by about 2 weeks. They were nowhere to be found. I was crushed!

My experience with fireflies comes from living in Northern Virginia for several years, where the fireflies all appear between roughly June 20th and July 5th. Since (you’ll notice) this page – as well as several others I looked at while I was researching for the trip back in December – does not give dates, I assumed that meant they’d be on the same schedule. All fireflies are the same, right? That must be why no one mentions specific dates, right? Uh, apparently not. I’m often wrong about things, but most of the time it doesn’t sting this bad! “I should have known!” has been my mantra the last few days.

In an effort to learn what I may have done differently to prevent this mistake I went looking everywhere (meaning in nearby Gatlinburg and the Park’s visitor’s center) for information on the fireflies. I could find nothing. This was weird. I can’t be the only one who thinks these things sound super cool to see! I started wondering if I wasn’t suffering from some sort of psychological self-protection mechanism, forcing me to miss all the signs proving that I was a dummy and it was all my fault. Confirmation bias writ large and very pathological.

Regardless of blame, however, I literally missed the bugs because of my hidden assumptions!


The Blackest of Boxes

At a recent gig I was one of several veteran testers brought in to shore up a team that… I don’t want to say “lacked experience.” Probably the best description is that they were “insular.” They were basically re-inventing the wheel.

I had a number of discussions with the team’s manager and he seemed receptive to my suggestions, up to a point. Past that, though, all I heard was “We don’t do that.” When I first heard this my mind was suddenly struck with visions of Nomad, the robot from Star Trek, whose favorite thing to say was “Non sequitur. Your facts are uncoordinated.”

Don’t worry. While I was tempted to, I didn’t actually say it out loud.

Let me back up a little. The job involved testing a system that is sandwiched between two others–a digital scanner and a database. The system is designed to take the scanned information and process it so that the right data can be updated and stored, with a minimum of human intervention. That last part is key. Essentially most of what happens is supposed to be automatic and invisible to the end user. You trust that checks are in place such that any question about data integrity is flagged and presented to a human.

Testers, however, are not end users. This is a complicated system, with a lot of moving parts (metaphorically speaking). When erroneous data ends up in the database it’s not a simple matter to figure out where the problem lies. When I asked for tools to help the team “look under the hood” (for example, at the original OCR data or the API calls sent to the database) and eliminate much of the overhead involved with some of the tests, I was told “But we are User Acceptance Testing”, as if that were somehow an argument.

Worse, the tool was not being developed in-house. This meant that our interaction with the programmers was essentially non-existant. We’d toss our bugs over a high wall and they’d toss their fixes back over it. The team was actively discouraged from asking questions to programmers directly (the programmers were in another state, so obviously face-to-face communication was not possible, but even emails and phone calls were verboten).

One thing I’ve learned over the years is that it’s never a good idea to separate programmers and testers. It fosters an us-versus-them mentality instead of a healthy one where the testers are viewed as helping the programmers. It lengthens the lifespan of bugs since a programmer doesn’t have the option to walk over to a tester and ask to be shown the issue. Testers are deprived of the opportunity to learn important information about how the system is built–stuff that’s not at all apparent from the user interface, like snippets of code used in several unrelated places. A change to one area could break things elsewhere and the testers might otherwise never think to look there.

So, when I hear “We don’t do that” it strikes me as a cop out. It’s an abdication of responsibility. A non sequitur.


(Good) Testers Are Not Robots!

Toy RobotReading James Bach’s recent blog post this morning, “The Essence of Heuristics” – in particular the list of questions at the end – I was reminded, by way of stark contrast, of the testing culture I found when I started my current consulting gig.

One of the first things I was told was one of their testing “rules” – every test case should be repeated, with different data, 15 times. At first I simply marveled at this, privately. I figured someone must have a good reason for choosing 15 as the magic number. Why not 5? Or, for that matter, 256? Why every test case? Surely my time would be better spent doing a new test case instead of the 15th iteration of the current one, right?

Sooner or later, I thought, the rule’s reasonableness should become apparent. After a couple weeks I knew the team a little better, but the rule still seemed as absurd to me as when I first heard it, so I broached the topic.

“Why do you run 15 iterations of every test case?”

“Well, sometimes when we run tests, the first 10 or 12 will pass, but then the 11th or 13th, for example, will fail.”

“Okay, well, do you ever then try to discover what exactly the differences were between the passing and failing tests? So that you can be sure in the future you’ll have tests for both scenarios?”

<blank stare>

I quickly came to realize that this testing “rule” was symptomatic of a larger issue: an attitude in management that the team couldn’t be trusted to approach the testing problem intelligently. I saw evidence of this attitude in other ways. For example, we were told that all bug descriptions needed to include the date and time the bug occurred, so that the programmers would know where to look in the log files. When I pointed out that not all bugs will involve issues with logged events, I was told that they just didn’t want to confuse the junior team members.

Another example – and a particular pet peeve of mine – is the requirement that every test case include detailed step-by-step instructions to follow, leaving no room for creative thinking, interpretation, or exploration. The reasoning behind the excruciating detail, of course, is so the newest team members can start testing right away. My first objection to this notion is that the fresh eyes of a new user can see problems that veterans have become blind to. As such, putting blinders on the newbies is not a good idea. Also, why bypass the testing of product’s usability and/or the help documentation and user manual? New users are a great resource for that.

In short, testers are not robots, and treating them like they are will result in lower quality testing efforts.


The Post Hoc Fallacy

Correlation is not causation.

It seems a simple statement when you look at it. Just because night follows day does not mean that day causes night. However, it’s clear that people fall prey to this fallacy all the time. It’s what’s behind, for example, the superstitious rituals of baseball pitchers.

A far less trite example is modern medicine. You have a headache. You take a pill. Your headache goes away. Did it go away because of the pill you took? Maybe it would have gone away on its own.  How do you know?

Teasing out causation from mere correlation in cases like that, with potentially dozens of unknown and uncontrolled variables, is notoriously difficult. The entire industry of complimentary and alternative medicine banks on the confusion.

I was thinking about all this the other day when I was testing a tool that takes mailed orders for prescription drugs, digitizes the data, and then adds it all to a central database. I was focusing specifically on the patient address information at the time, so the rest of the orders, like payment information, was fairly simple–meaning all my test orders were expected to get assigned a payment type of “invoice”, which they did. So in the course of my address testing I “passed” the test case for the invoice payment type.

It wasn’t until later that I realized I had committed the fallacy Post hoc ergo propter hoc (“After this, therefore because of this”), just like the person who attributes the disappearance of their headache to the sugar pill they’ve just taken. I discovered that all orders were getting a payment type of “Invoice”, regardless of whether they had checks or credit card information attached.

Inadvertently, I had succumbed to confirmation bias. I forgot, momentarily, that proper testing always involves attempting the falsification of claims, not their verification.