Monthly Archives: June 2010

Assumptions Are Dangerous Things

Sometimes – heh, usually – your product oracles are vague.  There’s little or no user manual, requirements specification, online help, tool tips, etc. In these situations I generally become “Annoying Question Man,” constantly badgering the programmers, project managers, sales team, or anyone else, asking: “How’s this thing supposed to work?” I’ve learned from hard experience that assumptions about expectations can and will come back to bite you and quite often, unfortunately, it’s difficult to know that you’re even making them.

It seems I am doomed to learn this particular lesson over and over again.

Last week I took a trip to the Great Smoky Mountains of Tennessee. My sole purpose for going there was to see their synchronous fireflies. Sadly, I was late to the show by about 2 weeks. They were nowhere to be found. I was crushed!

My experience with fireflies comes from living in Northern Virginia for several years, where the fireflies all appear between roughly June 20th and July 5th. Since (you’ll notice) this page – as well as several others I looked at while I was researching for the trip back in December – does not give dates, I assumed that meant they’d be on the same schedule. All fireflies are the same, right? That must be why no one mentions specific dates, right? Uh, apparently not. I’m often wrong about things, but most of the time it doesn’t sting this bad! “I should have known!” has been my mantra the last few days.

In an effort to learn what I may have done differently to prevent this mistake I went looking everywhere (meaning in nearby Gatlinburg and the Park’s visitor’s center) for information on the fireflies. I could find nothing. This was weird. I can’t be the only one who thinks these things sound super cool to see! I started wondering if I wasn’t suffering from some sort of psychological self-protection mechanism, forcing me to miss all the signs proving that I was a dummy and it was all my fault. Confirmation bias writ large and very pathological.

Regardless of blame, however, I literally missed the bugs because of my hidden assumptions!

Share

The Black Swan

A Black SwanI hate the idea of writing a book review for a post. Somehow it strikes me as cheap and lazy to rely so heavily on the work of others for content, particularly when my blog is so new. Shouldn’t I be concerned with sharing my own thoughts instead of parroting the thoughts of others?

Even worse: choosing Nassim Taleb’s The Black Swan (second edition, just released a few weeks ago) as the review’s subject matter. Taleb has notorious disdain for reviewers, many of whom seem to either miss his message entirely* or distort it in some consequential fashion. Given the book’s Kolmogorov complexity, any attempt at encapsulation is bound to leave out something significant (in contrast to the easily summerizable journalistic “idea book of the week” that excites the MBAs and is the intellectual equivalent of fast food. Anyone remember Who Moved My Cheese??).

I think the book’s message is important, and Taleb, being a champion of the skeptical empiricist, says a great deal that should excite and inspire the software tester. So, I’m willing to risk appearing lazy, but let’s not call this post a review so much as a somewhat desultory sampler. The Black Swan is a philosophical essay that is both dense and broad, and explores many interesting ideas–irreverently, I might add. My aim here will be to stick to those ideas that pertain to testing. I’ll leave the rest for you to discover on your own if you should decide to pick up a copy of the book for yourself.

The Black Swan

“All swans are white.”

Before 1697, you could say this, and every sighting of another swan would add firmness to your conviction of its “truth”. But then Europeans discovered a black swan in Western Australia. A metaphor for the problem of induction was born.

Taleb’s Black Swan (note the capitalization) is distinct from the philosophical issue, however. I’ll let Taleb define it:

First, it is an outlier, as it lies outside the realm of regular expectations, because nothing in the past can convincingly point to its possibility. Second, it carries an extreme impact (unlike the bird). Third, in spite of its outlier status, human nature makes us concoct explanations for its occurrence after the fact, making it explainable and predictable.

I stop and summarize the triplet: rarity, extreme impact, and retrospective (though not prospective) predictability. A small number of Black Swans explain almost everything in our world, from the success of ideas and religions, to the dynamics of historical events, to elements of our own personal lives. [emphasis original]

I’m confident that you can already see where this applies in the world of software. A Black Swan would be any serious bug that made it into a released product and caused some sort of harm–either to customers or the company’s reputation (or both!).

Toyota’s recent brake system problems are a perfect example. Clearly they didn’t see this coming, and it’s cost them an estimated $2 billion.  You can bet they’re trying to figure out why they didn’t catch the problem earlier–and why they should have–and how to prevent similar problems in the future.

And there’s the rub! The problem with Black Swans is that they are unpredictable by nature. Reality has “epistemic opacity”, says Taleb, owing to various inherent limitations to our knowledge, coupled with how we often deal erroneously with the information we do have. Toyota might spend billions ensuring that their cars will never have brake problems of any kind ever again, only, perhaps, to find one day that, in certain rare situations, their fuel system catches fire. It happens precisely because it’s not planned for.

So, what can we, as testers, do about the Black Swans we might face? The Black Swan counsels primarily how not to deal with them, and Taleb openly laments the typical reaction to his “negative advice.”

…[R]ecommendations of the style “Do not do” are more robust empirically [see “Negative Empiricism,” below]. How do you live long? By avoiding death. Yet people do not realize that success consists mainly in avoiding losses, not in trying to derive profits.

Positive advice is usually the province of the charlatan [see “Narrative Fallacy,” below]. Bookstores are full of books on how someone became successful [see “Silent Evidence,” below]; there are almost no books with the title What I Learned Going Bust, or Ten Mistakes to Avoid in Life.

Linked to this need for positive advice is the preference we have to do something rather than nothing, even in cases when doing something is harmful. [emphasis original]

I’m reminded of a consulting gig where I explained to the test team’s managers that their method for tracking productivity was invoking Goodhart’s Law and was thus worse than meaningless, since it encouraged counterproductive behavior in the team. The managers agreed with my analysis, but did not change their methodology. After all, they said, they were required to report something to the suits above them. They didn’t seem to have an ethical problem with tracking numbers that they knew were bullshit.

Platonicity

The ancient Greek philosopher Plato had a theory that abstract ideas or “Forms,” such as the idea of the color red, were the highest kind of reality. He believed that Forms were the only means to genuine knowledge. The error of Platonicity, then, as defined by Taleb, is

…our tendency to mistake the map for the territory, to focus on pure and well-defined “forms,” whether objects, like triangles, or social notions, like utopias (societies built according to some blueprint of what “makes sense”), even nationalities. When these ideas and crisp constructs inhabit our minds, we privilege them over other less elegant objects, those with messier and less tractable structures…

Platonicity is what makes us think that we understand more than we actually do. But this does not happen everywhere. I am not saying that Platonic forms don’t exist. Models and constructions, these intellectual maps of reality, are not always wrong; they are wrong only in some specific applications. The difficulty is that a) you do not know beforehand (only after the fact) where the map will be wrong, and b) the mistakes can lead to severe consequences. These models are like potentially helpful medicines that carry random but very severe side effects.

The error of platonification has a lot in common with the error of reification, but there is a subtle difference. Platonification doesn’t require that you believe your model is real (as in, “concrete”), only that it is accurate.

Again I’m sure you’re already thinking of ways this applies in software testing. You build a model of a system you’re testing. Soon you forget that you’re using a model and become blind to scenarios that might occur outside of it. Even worse, you write a few hundred test cases based on your model and convince yourself that, once you’ve gone through them all, you’ve “finished testing.”

Negative Empiricism

I mentioned above that The Black Swan is almost entirely advice about what not to do. However, in the chapter he devotes to confirmation bias and its brethren, Taleb introduces the heuristic of “falsification.” I hope you’ll forgive my quoting rather liberally from the section, here. He seems, for a moment, to be speaking directly to software testers:

By a mental mechanism I call naïve empiricism, we have a natural tendency to look for instances that confirm our story and our vision of the world – these instances are always easy to find. Alas, with tools, and fools, anything can be easy to find. You take past instances that corroborate your theories and you treat them as evidence. For instance, a diplomat will show you his “accomplishments,” not what he failed to do. Mathematicians will try to convince you that their science is useful to society by pointing out instances where it proved helpful, not those where it was a waste of time, or, worse, those numerous mathematical applications that inflicted a severe cost on society owing to the highly unempirical nature of elegant mathematical theories.

The good news is that there is a way around this naïve empiricism. I am saying that a series of corroborative facts is not necessarily evidence. Seeing white swans does not confirm the nonexistence of black swans. There is an exception, however: I know what statement is wrong, but not necessarily what statement is correct. If I see a black swan I can certify that all swans are not white!

This asymmetry is immensely practical. It tells us that we do not have to be complete skeptics, just semiskeptics. The subtlety of real life over the books is that, in your decision making, you need to be interested only in one side of the story: if you seek certainty about whether the patient has cancer, not certainty about whether he is healthy, then you might be satisfied with negative inference, since it will supply you the certainty you seek. So we can learn a lot from data – but not as much as we expect. Sometimes a lot of data can be meaningless; at other times one single piece of information can be very meaningful. It is true that a thousand days cannot prove you right, but one day can prove you to be wrong.

The person who is credited with the promotion of this idea of one-sided semiskepticism is Sir Doktor Professor Karl Raimund Popper, who may be the only philosopher of science who is actually read and discussed by actors in the real world (though not as enthusiastically by professional philosophers)… He writes to us, not to other philosophers. “We” are the empirical decision makers who hold that uncertainty is our discipline, and that understanding how to act under conditions of incomplete information is the highest and most urgent human pursuit. [emphasis original]

It always rankles when I hear someone (who is – usually – not a tester) declare something like “We need to prove the program works.” Obviously anyone who says this has a fundamental misconception of what is actually possible. And how many times has a programmer come to you claiming that he tested his code and “the feature works” – but you discover after only a couple tests that his “tests” were within only a narrow range, outside of which the feature breaks immediately?

All The Rest

I’ve only touched on a very small part of the contents of The Black Swan, but hopefully enough to convince you that it’s required reading for software testers. I’ll close the post with short descriptions of a few of the bigger ideas in the book that I skipped:

  • Mediocristan – A metaphorical country where deviations from the median are small and relatively rare, and those deviations can’t meaningfully affect the total. Think heights and weights of people. Black Swans aren’t possible here.
  • Extremistan – A metaphorical country where Black Swans are possible, because single members of a population can affect the aggregate. Think income or book sales.
  • Ludic Fallacy – Roughly speaking, the belief that you’re dealing with a phenomenon from Mediocristan when it’s actually from Extremistan. The Ludic Fallacy is a special case of the Platonic Fallacy.
  • Narrative Fallacy – The tendency to believe or concoct explanations that fit a complicated set of historical facts because they sound plausible. Conspiracy theories are only a small facet of this. These narratives cause us to think that past events were more predictable than they actually were. We become, as Taleb puts it, “Fooled by Randomness.”
  • Silent Evidence – That part of a population that is ignored because it is “silent,” meaning either difficult or impossible to see. We see all the risk-takers who succeeded in business, but not all risk-takers who failed. The result is the logical error called survivorship bias.

*An example of this is found in the quote from GQ magazine that appears, ironically, on the front cover of the book itself: “The most prophetic voice of all.” Taleb’s point is to be wary of anyone who claims he can predict the future. He says of himself, “I know I cannot forecast.”

Share

The Blackest of Boxes

At a recent gig I was one of several veteran testers brought in to shore up a team that… I don’t want to say “lacked experience.” Probably the best description is that they were “insular.” They were basically re-inventing the wheel.

I had a number of discussions with the team’s manager and he seemed receptive to my suggestions, up to a point. Past that, though, all I heard was “We don’t do that.” When I first heard this my mind was suddenly struck with visions of Nomad, the robot from Star Trek, whose favorite thing to say was “Non sequitur. Your facts are uncoordinated.”

Don’t worry. While I was tempted to, I didn’t actually say it out loud.

Let me back up a little. The job involved testing a system that is sandwiched between two others–a digital scanner and a database. The system is designed to take the scanned information and process it so that the right data can be updated and stored, with a minimum of human intervention. That last part is key. Essentially most of what happens is supposed to be automatic and invisible to the end user. You trust that checks are in place such that any question about data integrity is flagged and presented to a human.

Testers, however, are not end users. This is a complicated system, with a lot of moving parts (metaphorically speaking). When erroneous data ends up in the database it’s not a simple matter to figure out where the problem lies. When I asked for tools to help the team “look under the hood” (for example, at the original OCR data or the API calls sent to the database) and eliminate much of the overhead involved with some of the tests, I was told “But we are User Acceptance Testing”, as if that were somehow an argument.

Worse, the tool was not being developed in-house. This meant that our interaction with the programmers was essentially non-existant. We’d toss our bugs over a high wall and they’d toss their fixes back over it. The team was actively discouraged from asking questions to programmers directly (the programmers were in another state, so obviously face-to-face communication was not possible, but even emails and phone calls were verboten).

One thing I’ve learned over the years is that it’s never a good idea to separate programmers and testers. It fosters an us-versus-them mentality instead of a healthy one where the testers are viewed as helping the programmers. It lengthens the lifespan of bugs since a programmer doesn’t have the option to walk over to a tester and ask to be shown the issue. Testers are deprived of the opportunity to learn important information about how the system is built–stuff that’s not at all apparent from the user interface, like snippets of code used in several unrelated places. A change to one area could break things elsewhere and the testers might otherwise never think to look there.

So, when I hear “We don’t do that” it strikes me as a cop out. It’s an abdication of responsibility. A non sequitur.

Share

Opportunity Cost

I’ve always had an abiding love of economics – which, contrary to what seems to be popular belief, is not about how to balance your checkbook, but about what it means to make choices in the face of scarcity. Once you’re even just a little familiar with the economic way of thinking then you never see the world in quite the same way again.

A fundamental component of economic thought is the notion of “opportunity cost.” The idea was probably most famously and elegantly described by the French economist Frédéric Bastiat in his essay “What Is Seen and What Is Not Seen”, in 1848. Its basics are this: The real cost of a thing is not the time or the money you spend on it, but the alternate choice that you’ve given up. For example, the opportunity cost of the omelet you had for breakfast is the pancakes you didn’t have.

Testers – ever familiar with tight schedules, under-staffing, and a potentially infinite list of tests to perform – probably understand opportunity cost at a more visceral level than your typical academic economist. When testers keep opportunity cost in mind, they’ll constantly be asking themselves, “Is this the most valuable test I could be doing right now?”

This brings us to the perennial question: Should testing be automated?

Setting aside the deeper question of “Can it?”, the answer, obviously, should involve weighing the benefits against the costs – including, most importantly, the opportunity cost. The opportunity cost of automated testing is the manual testing you have to give up, because – let’s not kid ourselves – it’s the rare software company that will hire a whole new team whose sole job is planning, writing, and maintaining the test scripts.

And let’s say the company does hire specialist automation testers. Well, it seems there’s always an irresistible inclination to, in a pinch (and Hofstadter’s Law ensures there’s always a pinch), have the automation team help with the manual testing efforts – “only to get us through this tough time”, of course. Funny how that “tough time” has a tendency to get longer and longer. Meanwhile, the automation scripts are not being written, or else they’re breaking or becoming obsolete. (Note that, of course, when a company puts automation on hold in favor of doing manual testing, it’s an implicit recognition of the opportunity cost of the automation.)

There’s another factor to consider: How valuable are these automated tests going to be?

I’ve already made the argument that good testers are not robots. Do I need to point out what automated testing is? Scripts are the pinnacle of inflexible and narrowly focused testing. A human performing the same test is going to see issues that the machine will miss. This is why James Bach prefers to call manual testing “sapient”, instead. He has a point!

Furthermore, except in particular domains (such as data-driven testing) any single test will lose value each time it is performed, thanks to something else from economic theory: the law of diminishing returns. Sure, sometimes developers break stuff that was working before, but unless there’s no chance the manual – sorry sapient – testers were going to timely find the problem in the course of their – ahem! – sapient tests, the value of any given automated test will asymptotically approach zero. Meanwhile its cost of maintenance and execution will only increase.

I won’t go so far as to say that test automation is never worth it. I’m sure there are situations where it is. However, given the precious opportunity cost involved, I think it’s called for less frequently than is generally believed.

Share