Guest post: Michael Blastland on Uncertainty

michael blastlandThis week we have a guest post from journalist, broadcaster and author Michael Blastland. In addition to creating the BBC 4 Radio programme ‘More or Less’, he has authored several books including The Tiger That Isn’t (published in the US as The Numbers Game: The Commonsense Guide to Understanding Numbers in the News, in Politics, and in Life) and The Only Boy in the World, about his son’s autism. He is a well-known campaigner for statistical literacy. His most recent book, The Norm Chronicles: Stories and numbers about danger, looks at the risks of everyday life and how to decode them. 

People tend not to like uncertainty. It’s confusing. It makes our choices riskier. What are we supposed to do when we’re not sure what’s going on?

No, if it can be nailed down, nail it. If it can be settled, sort it. And even if it can’t, maybe any answer is better than none. Faced with the stranger on the moor who says the true path is definitely this way, or the one who says ‘not sure, maybe over there somewhere,’ which do you choose?

For the stranger on the moor substitute the political leader, or the business leader. We like people who seem to know.

Then, a few weeks ago an old friend, Oli Hawkins, said he’d had an idea.  

Understatement.

What’s more, it was an idea about how to show the uncertainty in data.

Hazardous understatement.

More accurately, it was an idea about how to bring uncertainty to life so that we see its full extent and implications.

And I thought: this is brilliant; some people will hate it.

What I think Oli had done was to find a way of making statistical doubt more visible. This is no small trick. In doing so, he might have helped us see the world differently. But there’s also little doubt that it makes life less comfortable.

The nub of the problem he has been trying to overcome is, in a word, pictures.

I agree, that doesn’t sound like a problem. In fact, pictures are often the answer to the problem of how to interpret data. They can crystalize ideas and make vagueness vivid. Turned into pictures, numbers escape the fog of evidence for the blue sky of clarity. We take in so much more from a picture than from columns of data, we spot patterns, faster, we remember the picture, it can even be beautiful.

As with a character in film compared with a character in a novel, the wry smile and the twinkle in the eye is given settled form. For some of us, it’s hard to stop thinking that James Bond is Sean Connery.

‘So?’ you say. ‘What’s wrong with that? Isn’t this exactly what visualisation strives to do.’ Well, sometimes there’s nothing wrong at all. Sometimes it’s fab.

And sometimes it’s fantasy. Especially when the ideas themselves ooze doubt, when vagueness and uncertainty might be half the point, when the numbers are more mush than concrete.

I’m a huge fan of visualisation. Who isn’t? But uncertainty is visualisation’s portrait in the attic: a dodgy secret, an orthogonal truth, in keeping with the human tendency to avoid it.

How to say that the line is most likely here, doing this, but could be way over there doing that? This has never, in my view, been satisfactorily sorted. The understandable tendency of a lot of data-viz is to ignore it.

On those occasions uncertainty is acknowledged, a standard approach is the error bar. Here’s an example from Oli’s discussion of the problem:

blastland 1

‘The margin of error’ he says ‘reflects the 95% confidence interval for the estimate, which means there is a 95% chance that the actual value is within the range shown by the error bar and a 5% chance that it is outside this range. The size of the error bar is determined by the size of the sample on which the estimate is based.’

But as Oli points out, the error bars simply follow the trend.

They move up and down in a neat little dance either side of the central estimate, and our eyes follow, as if all estimates dance in the same direction. In fact, the true value might lie at any point along those error bars, or beyond, though with diminishing probability. That is, the true value could be at the top of one error bar and the bottom of the next. So this visualisation – improvement though it is on a plain bar chart – arguably obscures the potential movement.

Another example is the Bank of England’s fan charts for GDP, which apply both to future estimates and, more to the point here, to GDP in the past, about which we also remain uncertain. These fan charts show a range of estimates of the true value, in bands of probability.

They’re good. I like them. But they have exactly the same problem. All estimates echo the central line and visually reinforce our impression of the trend. Not the idea at all.

blastland 2

What we tend to ‘see’ in this chart, I think, is a rise and then a fall in the rate of growth in the past few years that might have happened higher or lower than the central estimate, but was basically in lockstep with it. And people draw all sorts of conclusions from that supposed trend about the conduct of economic policy.

But is it true? Because what could have happened is that the rate of GDP growth rose continually since 2009, as it swung from the bottom to the top of the Bank’s range of estimates. Rather than an economy that skirted double or even triple-dip recession, maybe we had an economy going from strength to strength for more than three years. Or maybe it was the other way round and we recovered spectacularly in late 2009 and then slammed into reverse and another shallow but protracted recession.

You’ll find little economic comment to this effect, and it’s not the Bank’s nor the ONS’s best guess, but it is perfectly within what the Bank thinks are reasonable bounds of uncertainty. Maybe one reason this discussion doesn’t happen, and the doubts tend to be smothered in the rush to an appalled/euphoric (delete as applicable) reaction, is because we don’t have the right way of showing their extent.

And fan charts like these are a relatively recent innovation. Before them, the lines were even more concrete.

There are other techniques for representing uncertainty. Howard Wainer’s ‘Picturing the Uncertain World’ is an interesting exploration of the subject. But we can, and should do more.

‘You know…’ I say, trying to inspire audiences of designers, ‘you have an opportunity here to work out how to use visual techniques to bring uncertainty properly to life. Do that, and you could help people see, maybe for the first time, the way that statistical evidence relates to real events. This could change the way we see the world.’

But if that sounds too much like hard work, well then, as I’ve put it elsewhere, we can always carry on with the same old statistical blah… only prettier. As Tim Harford has said, mis-information can be beautiful too.

My own attempt at the uncertainty problem was to make some fantasy league tables in which the position of each imagined school, or hospital, or whatever, bounced up and down randomly within the confidence intervals, moving up and down all over the shop. Who really ranked where? You couldn’t be sure. Which is irritating, but often as it should be.

But how to make this movement proportionate to the real probabilities? Cue Oli. He has found a way http://olihawkins.com/visualisation/1 to animate the estimates within the confidence intervals so that they pop up just as often as probability suggests they should – given the data. He shows that this can be done with interval data so that we discover how different a trend might look over time, as well as with categorical data – like the school league-table example. He’s done it as a series of snapshots rather than a continually fluid movement, which helps pick out more clearly what the true trend might have been.

And…? Isn’t all this obvious? If that’s what you think, you’d be right in the sense that it is all implied by the existing maths of confidence intervals.

The answer may be that all that is new here is the articulation of an idea. And it may be true that the idea is already latent in the prior concept of confidence intervals. So what’s the big deal?

The big deal for me is that an idea that is latent – except in the minds of a few – isn’t an idea at all for the many. Articulating it is every bit as important as knowing it. I would say that, being in the communication business. But maybe the proof of how important it is to articulate these things, and also the proof of how well it’s been done to date, is how little there is in public argument about the extent of the uncertainty around numbers like these or what that uncertainty implies. If the idea is obvious, where’s it been?

Now you could just put that absence down to the ignorance of the commentariat and politicians, or you could add that maybe we could do it differently.

The acid test is what we see with the new method. Applied to the migration data, the effect is electric. Here are a few grabs from Oli’s visualisation as it runs through the variety of stories that could have been told.

Like this one…

blastland 3

Fairly flat, bit of a crest around 2010 maybe, maybe a hint of a rising trend – though this could be no more than a couple of weird years. Nothing to my eye leaps off the page over the long run.

Or like this.

blastland 4

Which looks pretty clearly like a step change in 2004. The numbers roughly double. A good one for those who want to say we ‘lost control of the borders’ and a sharply different reading of history.

Or what about this?

blastland 5

In which the key date moves back six years as we see a broadly rising trend all the way until about 2010, when ‘determined action by the Coalition finally brought it under control,’ presumably.

Or like this, when determined action by the Coalition since 2010 made hardly any difference.

blastland 6

Just click and play to see the variety of stories that could be true. The implications of the uncertainty are easier to grasp and harder to ignore. What also emerges is that some stories are more common and consistent than others. Very few iterations show 2012 higher than 2010 for example. So we see both what is most uncertain, and what is most likely. It’s not at all the case that the upshot of all this is to throw up our hands and say we’re clueless about what happened.

Not new? It’s revelatory. What if we did it to the GDP lines on the Bank of England’s fan chart, and animated them through a range of possible stories in all their top-to-bottom potentially volatile variety? What if we did the same to the monthly unemployment data?

Yes, it’s disturbing, destabilising, unsatisfactory in so many ways. It makes the world less nailable, less sorted. And I love it.

What’s especially thought provoking is that it makes you wonder how many more techniques there might be that could bring life to statistical insights, rather than bringing design or false clarity to dodgy data.

Don’t get me wrong. I think there’s some fantastic stuff out there. And anyway, uncertainty isn’t always a big factor. All the same, data visualisation is no more than a fancy distraction if it doesn’t help us see better. But when it does…  wow.

Norm Chronicles interactive site

Profile in the Guadian

Advertisement

2 thoughts on “Guest post: Michael Blastland on Uncertainty

  1. Great post. I really like the idea of using animation to convey uncertainty, but I’m a little worried that the approach you’ve suggested might actually exaggerate the uncertainty.

    First of all, Oli notes that the data are independent insofar as they are drawn from non-overlapping time periods. But it seems reasonable to suppose that there might be some dependency in the error from year to year. This would make it relatively unlikely that you’d have an extreme overestimate one year immediately followed by an extreme underestimate the next.

    Second, it ignores regression to the mean. The larger values are likely to be overestimates and the smaller values are likely to be underestimates. This means that smoother trajectories are more likely than the animations suggest.

    I’m not a statistician, so I may well be wrong on both counts. I think there are ways around the second issue. Not so sure about the first, unless you had a really good model of how the data are generated.

  2. Pingback: Viewpoint: Why I’m Leaving Academia | :InDecision:

Want to say something?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s