Tag Archives: Narrative Fallacy

Testing’s Quiet Evidence

Since my post about Nassim Taleb’s The Black Swan I’ve continued to muse about the book and its ideas. In particular, while thinking about the notion of “silent evidence,” I realized there was a connection to testing that I hadn’t noticed before. I won’t flatter myself by thinking I missed the link due to some inherent subtlety in the concept. More likely it’s because I already associated the idea with something else: get-rich-quick schemes. Bear with me for a minute while I explain…

A few years ago I was consumed with debunking network marketing companies and real estate investing scams (If you’re genuinely curious why, I explain it all here). My efforts were focused primarily on a real estate “guru” who lives nearby, in Glendale, AZ. As with most con artists, his promotional materials include dozens of narrative fallac–uh… testimonials from former “students” who claim they achieved “great success” using the investment methods he teaches.

Of course, what the “guru” doesn’t promote are the undoubtedly much higher number of  people who attended his “boot camps” or bought his materials and either, A) did nothing, or B) tried his techniques and lost money (On the rare occasions such people are even mentioned, there’s always a ready explanation for them: The blame lies not with the technique, but with the practitioner. Voilà! The scammer has just removed your ability to falsify his claims!). Taleb calls these people the “silent evidence.” To ignore them when evaluating a population (in this case, the customers of a particular guru) is to engage in survivorship bias and miscalculate what you’re trying to measure. Con artists of all stripes make millions encouraging their customers to do this.

Testers are not con artists (as a rule, I mean), but we do have something that, while perhaps not silent, should be considered at least very quiet. In contrast to the scammers, it’s not to our advantage that it stay quiet. In fact, I’m starting to wonder if keeping it quiet is not at least in part to blame for some of the irrational practices you find in dysfunctional test teams, such as the obsession with test cases.

What am I talking about?

I am referring, dear reader, to all the bugs that were found and fixed prior to release. All those potentially serious issues that were neutralized before they could do any damage. No one thinks about them, because they don’t exist, except as forgotten items in a database no one cares about any more. But they’re there–hundreds, maybe thousands of them–quietly paying tribute to averted disasters, maintained reputations, even saved money (hence, why I call them “quiet” rather than silent: they’re still there if you look for them).

Meanwhile, the released product is out in the world, exposing its inevitable and embarrassing flaws for all to see, prompting CEOs and sales teams to wonder, “What are those testers doing all day? Why aren’t they assuring quality?” Note that this reaction is precisely the survivorship bias I mentioned above. The error causes them to undervalue the test team, in a way exactly analogous to how dupes of the real estate gurus overvalue the guru.

Okay, so what to do about this? I confess that, as yet, I do not know. Right now all I can say is it behooves us as testers to come up with ways of better publicizing the bugs that we find–to turn our quiet evidence into actual evidence. As to how to go about that, well, I’m open for suggestions.


The Black Swan

A Black SwanI hate the idea of writing a book review for a post. Somehow it strikes me as cheap and lazy to rely so heavily on the work of others for content, particularly when my blog is so new. Shouldn’t I be concerned with sharing my own thoughts instead of parroting the thoughts of others?

Even worse: choosing Nassim Taleb’s The Black Swan (second edition, just released a few weeks ago) as the review’s subject matter. Taleb has notorious disdain for reviewers, many of whom seem to either miss his message entirely* or distort it in some consequential fashion. Given the book’s Kolmogorov complexity, any attempt at encapsulation is bound to leave out something significant (in contrast to the easily summerizable journalistic “idea book of the week” that excites the MBAs and is the intellectual equivalent of fast food. Anyone remember Who Moved My Cheese??).

I think the book’s message is important, and Taleb, being a champion of the skeptical empiricist, says a great deal that should excite and inspire the software tester. So, I’m willing to risk appearing lazy, but let’s not call this post a review so much as a somewhat desultory sampler. The Black Swan is a philosophical essay that is both dense and broad, and explores many interesting ideas–irreverently, I might add. My aim here will be to stick to those ideas that pertain to testing. I’ll leave the rest for you to discover on your own if you should decide to pick up a copy of the book for yourself.

The Black Swan

“All swans are white.”

Before 1697, you could say this, and every sighting of another swan would add firmness to your conviction of its “truth”. But then Europeans discovered a black swan in Western Australia. A metaphor for the problem of induction was born.

Taleb’s Black Swan (note the capitalization) is distinct from the philosophical issue, however. I’ll let Taleb define it:

First, it is an outlier, as it lies outside the realm of regular expectations, because nothing in the past can convincingly point to its possibility. Second, it carries an extreme impact (unlike the bird). Third, in spite of its outlier status, human nature makes us concoct explanations for its occurrence after the fact, making it explainable and predictable.

I stop and summarize the triplet: rarity, extreme impact, and retrospective (though not prospective) predictability. A small number of Black Swans explain almost everything in our world, from the success of ideas and religions, to the dynamics of historical events, to elements of our own personal lives. [emphasis original]

I’m confident that you can already see where this applies in the world of software. A Black Swan would be any serious bug that made it into a released product and caused some sort of harm–either to customers or the company’s reputation (or both!).

Toyota’s recent brake system problems are a perfect example. Clearly they didn’t see this coming, and it’s cost them an estimated $2 billion.  You can bet they’re trying to figure out why they didn’t catch the problem earlier–and why they should have–and how to prevent similar problems in the future.

And there’s the rub! The problem with Black Swans is that they are unpredictable by nature. Reality has “epistemic opacity”, says Taleb, owing to various inherent limitations to our knowledge, coupled with how we often deal erroneously with the information we do have. Toyota might spend billions ensuring that their cars will never have brake problems of any kind ever again, only, perhaps, to find one day that, in certain rare situations, their fuel system catches fire. It happens precisely because it’s not planned for.

So, what can we, as testers, do about the Black Swans we might face? The Black Swan counsels primarily how not to deal with them, and Taleb openly laments the typical reaction to his “negative advice.”

…[R]ecommendations of the style “Do not do” are more robust empirically [see “Negative Empiricism,” below]. How do you live long? By avoiding death. Yet people do not realize that success consists mainly in avoiding losses, not in trying to derive profits.

Positive advice is usually the province of the charlatan [see “Narrative Fallacy,” below]. Bookstores are full of books on how someone became successful [see “Silent Evidence,” below]; there are almost no books with the title What I Learned Going Bust, or Ten Mistakes to Avoid in Life.

Linked to this need for positive advice is the preference we have to do something rather than nothing, even in cases when doing something is harmful. [emphasis original]

I’m reminded of a consulting gig where I explained to the test team’s managers that their method for tracking productivity was invoking Goodhart’s Law and was thus worse than meaningless, since it encouraged counterproductive behavior in the team. The managers agreed with my analysis, but did not change their methodology. After all, they said, they were required to report something to the suits above them. They didn’t seem to have an ethical problem with tracking numbers that they knew were bullshit.


The ancient Greek philosopher Plato had a theory that abstract ideas or “Forms,” such as the idea of the color red, were the highest kind of reality. He believed that Forms were the only means to genuine knowledge. The error of Platonicity, then, as defined by Taleb, is

…our tendency to mistake the map for the territory, to focus on pure and well-defined “forms,” whether objects, like triangles, or social notions, like utopias (societies built according to some blueprint of what “makes sense”), even nationalities. When these ideas and crisp constructs inhabit our minds, we privilege them over other less elegant objects, those with messier and less tractable structures…

Platonicity is what makes us think that we understand more than we actually do. But this does not happen everywhere. I am not saying that Platonic forms don’t exist. Models and constructions, these intellectual maps of reality, are not always wrong; they are wrong only in some specific applications. The difficulty is that a) you do not know beforehand (only after the fact) where the map will be wrong, and b) the mistakes can lead to severe consequences. These models are like potentially helpful medicines that carry random but very severe side effects.

The error of platonification has a lot in common with the error of reification, but there is a subtle difference. Platonification doesn’t require that you believe your model is real (as in, “concrete”), only that it is accurate.

Again I’m sure you’re already thinking of ways this applies in software testing. You build a model of a system you’re testing. Soon you forget that you’re using a model and become blind to scenarios that might occur outside of it. Even worse, you write a few hundred test cases based on your model and convince yourself that, once you’ve gone through them all, you’ve “finished testing.”

Negative Empiricism

I mentioned above that The Black Swan is almost entirely advice about what not to do. However, in the chapter he devotes to confirmation bias and its brethren, Taleb introduces the heuristic of “falsification.” I hope you’ll forgive my quoting rather liberally from the section, here. He seems, for a moment, to be speaking directly to software testers:

By a mental mechanism I call naïve empiricism, we have a natural tendency to look for instances that confirm our story and our vision of the world – these instances are always easy to find. Alas, with tools, and fools, anything can be easy to find. You take past instances that corroborate your theories and you treat them as evidence. For instance, a diplomat will show you his “accomplishments,” not what he failed to do. Mathematicians will try to convince you that their science is useful to society by pointing out instances where it proved helpful, not those where it was a waste of time, or, worse, those numerous mathematical applications that inflicted a severe cost on society owing to the highly unempirical nature of elegant mathematical theories.

The good news is that there is a way around this naïve empiricism. I am saying that a series of corroborative facts is not necessarily evidence. Seeing white swans does not confirm the nonexistence of black swans. There is an exception, however: I know what statement is wrong, but not necessarily what statement is correct. If I see a black swan I can certify that all swans are not white!

This asymmetry is immensely practical. It tells us that we do not have to be complete skeptics, just semiskeptics. The subtlety of real life over the books is that, in your decision making, you need to be interested only in one side of the story: if you seek certainty about whether the patient has cancer, not certainty about whether he is healthy, then you might be satisfied with negative inference, since it will supply you the certainty you seek. So we can learn a lot from data – but not as much as we expect. Sometimes a lot of data can be meaningless; at other times one single piece of information can be very meaningful. It is true that a thousand days cannot prove you right, but one day can prove you to be wrong.

The person who is credited with the promotion of this idea of one-sided semiskepticism is Sir Doktor Professor Karl Raimund Popper, who may be the only philosopher of science who is actually read and discussed by actors in the real world (though not as enthusiastically by professional philosophers)… He writes to us, not to other philosophers. “We” are the empirical decision makers who hold that uncertainty is our discipline, and that understanding how to act under conditions of incomplete information is the highest and most urgent human pursuit. [emphasis original]

It always rankles when I hear someone (who is – usually – not a tester) declare something like “We need to prove the program works.” Obviously anyone who says this has a fundamental misconception of what is actually possible. And how many times has a programmer come to you claiming that he tested his code and “the feature works” – but you discover after only a couple tests that his “tests” were within only a narrow range, outside of which the feature breaks immediately?

All The Rest

I’ve only touched on a very small part of the contents of The Black Swan, but hopefully enough to convince you that it’s required reading for software testers. I’ll close the post with short descriptions of a few of the bigger ideas in the book that I skipped:

  • Mediocristan – A metaphorical country where deviations from the median are small and relatively rare, and those deviations can’t meaningfully affect the total. Think heights and weights of people. Black Swans aren’t possible here.
  • Extremistan – A metaphorical country where Black Swans are possible, because single members of a population can affect the aggregate. Think income or book sales.
  • Ludic Fallacy – Roughly speaking, the belief that you’re dealing with a phenomenon from Mediocristan when it’s actually from Extremistan. The Ludic Fallacy is a special case of the Platonic Fallacy.
  • Narrative Fallacy – The tendency to believe or concoct explanations that fit a complicated set of historical facts because they sound plausible. Conspiracy theories are only a small facet of this. These narratives cause us to think that past events were more predictable than they actually were. We become, as Taleb puts it, “Fooled by Randomness.”
  • Silent Evidence – That part of a population that is ignored because it is “silent,” meaning either difficult or impossible to see. We see all the risk-takers who succeeded in business, but not all risk-takers who failed. The result is the logical error called survivorship bias.

*An example of this is found in the quote from GQ magazine that appears, ironically, on the front cover of the book itself: “The most prophetic voice of all.” Taleb’s point is to be wary of anyone who claims he can predict the future. He says of himself, “I know I cannot forecast.”