People tend not to like uncertainty. It’s confusing. It makes our choices riskier. What are we supposed to do when we’re not sure what’s going on?
No, if it can be nailed down, nail it. If it can be settled, sort it. And even if it can’t, maybe any answer is better than none. Faced with the stranger on the moor who says the true path is definitely this way, or the one who says ‘not sure, maybe over there somewhere,’ which do you choose?
For the stranger on the moor substitute the political leader, or the business leader. We like people who seem to know.
Then, a few weeks ago an old friend, Oli Hawkins, said he’d had an idea.
What’s more, it was an idea about how to show the uncertainty in data.
More accurately, it was an idea about how to bring uncertainty to life so that we see its full extent and implications.
And I thought: this is brilliant; some people will hate it.
What I think Oli had done was to find a way of making statistical doubt more visible. This is no small trick. In doing so, he might have helped us see the world differently. But there’s also little doubt that it makes life less comfortable.
The nub of the problem he has been trying to overcome is, in a word, pictures.
I agree, that doesn’t sound like a problem. In fact, pictures are often the answer to the problem of how to interpret data. They can crystalize ideas and make vagueness vivid. Turned into pictures, numbers escape the fog of evidence for the blue sky of clarity. We take in so much more from a picture than from columns of data, we spot patterns, faster, we remember the picture, it can even be beautiful.
As with a character in film compared with a character in a novel, the wry smile and the twinkle in the eye is given settled form. For some of us, it’s hard to stop thinking that James Bond is Sean Connery.
‘So?’ you say. ‘What’s wrong with that? Isn’t this exactly what visualisation strives to do.’ Well, sometimes there’s nothing wrong at all. Sometimes it’s fab.
And sometimes it’s fantasy. Especially when the ideas themselves ooze doubt, when vagueness and uncertainty might be half the point, when the numbers are more mush than concrete.
I’m a huge fan of visualisation. Who isn’t? But uncertainty is visualisation’s portrait in the attic: a dodgy secret, an orthogonal truth, in keeping with the human tendency to avoid it.
How to say that the line is most likely here, doing this, but could be way over there doing that? This has never, in my view, been satisfactorily sorted. The understandable tendency of a lot of data-viz is to ignore it.
On those occasions uncertainty is acknowledged, a standard approach is the error bar. Here’s an example from Oli’s discussion of the problem:
‘The margin of error’ he says ‘reflects the 95% confidence interval for the estimate, which means there is a 95% chance that the actual value is within the range shown by the error bar and a 5% chance that it is outside this range. The size of the error bar is determined by the size of the sample on which the estimate is based.’
But as Oli points out, the error bars simply follow the trend.
They move up and down in a neat little dance either side of the central estimate, and our eyes follow, as if all estimates dance in the same direction. In fact, the true value might lie at any point along those error bars, or beyond, though with diminishing probability. That is, the true value could be at the top of one error bar and the bottom of the next. So this visualisation – improvement though it is on a plain bar chart – arguably obscures the potential movement.
Another example is the Bank of England’s fan charts for GDP, which apply both to future estimates and, more to the point here, to GDP in the past, about which we also remain uncertain. These fan charts show a range of estimates of the true value, in bands of probability.
They’re good. I like them. But they have exactly the same problem. All estimates echo the central line and visually reinforce our impression of the trend. Not the idea at all.
What we tend to ‘see’ in this chart, I think, is a rise and then a fall in the rate of growth in the past few years that might have happened higher or lower than the central estimate, but was basically in lockstep with it. And people draw all sorts of conclusions from that supposed trend about the conduct of economic policy.
But is it true? Because what could have happened is that the rate of GDP growth rose continually since 2009, as it swung from the bottom to the top of the Bank’s range of estimates. Rather than an economy that skirted double or even triple-dip recession, maybe we had an economy going from strength to strength for more than three years. Or maybe it was the other way round and we recovered spectacularly in late 2009 and then slammed into reverse and another shallow but protracted recession.
You’ll find little economic comment to this effect, and it’s not the Bank’s nor the ONS’s best guess, but it is perfectly within what the Bank thinks are reasonable bounds of uncertainty. Maybe one reason this discussion doesn’t happen, and the doubts tend to be smothered in the rush to an appalled/euphoric (delete as applicable) reaction, is because we don’t have the right way of showing their extent.
And fan charts like these are a relatively recent innovation. Before them, the lines were even more concrete.
There are other techniques for representing uncertainty. Howard Wainer’s ‘Picturing the Uncertain World’ is an interesting exploration of the subject. But we can, and should do more.
‘You know…’ I say, trying to inspire audiences of designers, ‘you have an opportunity here to work out how to use visual techniques to bring uncertainty properly to life. Do that, and you could help people see, maybe for the first time, the way that statistical evidence relates to real events. This could change the way we see the world.’
But if that sounds too much like hard work, well then, as I’ve put it elsewhere, we can always carry on with the same old statistical blah… only prettier. As Tim Harford has said, mis-information can be beautiful too.
My own attempt at the uncertainty problem was to make some fantasy league tables in which the position of each imagined school, or hospital, or whatever, bounced up and down randomly within the confidence intervals, moving up and down all over the shop. Who really ranked where? You couldn’t be sure. Which is irritating, but often as it should be.
But how to make this movement proportionate to the real probabilities? Cue Oli. He has found a way http://olihawkins.com/visualisation/1 to animate the estimates within the confidence intervals so that they pop up just as often as probability suggests they should – given the data. He shows that this can be done with interval data so that we discover how different a trend might look over time, as well as with categorical data – like the school league-table example. He’s done it as a series of snapshots rather than a continually fluid movement, which helps pick out more clearly what the true trend might have been.
And…? Isn’t all this obvious? If that’s what you think, you’d be right in the sense that it is all implied by the existing maths of confidence intervals.
The answer may be that all that is new here is the articulation of an idea. And it may be true that the idea is already latent in the prior concept of confidence intervals. So what’s the big deal?
The big deal for me is that an idea that is latent – except in the minds of a few – isn’t an idea at all for the many. Articulating it is every bit as important as knowing it. I would say that, being in the communication business. But maybe the proof of how important it is to articulate these things, and also the proof of how well it’s been done to date, is how little there is in public argument about the extent of the uncertainty around numbers like these or what that uncertainty implies. If the idea is obvious, where’s it been?
Now you could just put that absence down to the ignorance of the commentariat and politicians, or you could add that maybe we could do it differently.
The acid test is what we see with the new method. Applied to the migration data, the effect is electric. Here are a few grabs from Oli’s visualisation as it runs through the variety of stories that could have been told.
Like this one…
Fairly flat, bit of a crest around 2010 maybe, maybe a hint of a rising trend – though this could be no more than a couple of weird years. Nothing to my eye leaps off the page over the long run.
Or like this.
Which looks pretty clearly like a step change in 2004. The numbers roughly double. A good one for those who want to say we ‘lost control of the borders’ and a sharply different reading of history.
Or what about this?
In which the key date moves back six years as we see a broadly rising trend all the way until about 2010, when ‘determined action by the Coalition finally brought it under control,’ presumably.
Or like this, when determined action by the Coalition since 2010 made hardly any difference.
Just click and play to see the variety of stories that could be true. The implications of the uncertainty are easier to grasp and harder to ignore. What also emerges is that some stories are more common and consistent than others. Very few iterations show 2012 higher than 2010 for example. So we see both what is most uncertain, and what is most likely. It’s not at all the case that the upshot of all this is to throw up our hands and say we’re clueless about what happened.
Not new? It’s revelatory. What if we did it to the GDP lines on the Bank of England’s fan chart, and animated them through a range of possible stories in all their top-to-bottom potentially volatile variety? What if we did the same to the monthly unemployment data?
Yes, it’s disturbing, destabilising, unsatisfactory in so many ways. It makes the world less nailable, less sorted. And I love it.
What’s especially thought provoking is that it makes you wonder how many more techniques there might be that could bring life to statistical insights, rather than bringing design or false clarity to dodgy data.
Don’t get me wrong. I think there’s some fantastic stuff out there. And anyway, uncertainty isn’t always a big factor. All the same, data visualisation is no more than a fancy distraction if it doesn’t help us see better. But when it does… wow.