Advanced Techniques in Lying using Data Visualizations
Advanced Techniques in Lying Using Data Visualizations
Discover the power of chart design to manipulate an audience towards any narrative
source: unsplash.com
This article is satire. Using bogus visualizations to spread misinformation is not cool. The purpose of this article is to teach you how to spot misinformation.
Do you have a narrative you need to sell? Perhaps you’ve predicted something at work, which you now need to prove. Maybe someone on X said your political opinion was wrong, so you need ammunition for a counter-attack.
Data visualizations are often the final layer in data analysis, used for presenting insights quickly and easily. The audience can vary, from board members to TV viewers to government officials to social media followers. However, despite their differences, they often have a few things in common. They’re usually not a data technical professional, and they usually don’t have time to delve into the details. This makes data visualizations the perfect tool for manipulating your audience, whatever the data, whatever your narrative.
Using real-world examples, this article delves into four advanced techniques that data-savvy manipulators use to get their own message across, regardless of what the raw data actually says. These are:
Omit unwanted data pointsExploit pattern psychologySelectively categorize dataStrategically adjust readability
By the end of this article your tool bag will be fully equipped to spread misinformation in its most potent form, statistics.
TL;DR
Selectively removing or shuffling data around is often powerful enough to support contradictory narratives.For audiences that don’t have time to critically analyze the data, using simple psychological patterns, such as red = good, green = bad, can be enough to sway them.The only way to stop manipulative data techniques is to call it out when you see it.
Omitting unwanted data points
If you’re into healthy living, you will likely be aware of the best selling book “Why We Sleep” by Matthew Walker. Health books didn’t just casually fall into no.1 place on the Sunday Times Bestseller list. So, how did “Why We Sleep” achieve such a remarkable feat? If you’ve read the book, like I did, you will know exactly why. It is a powerful, bordering on frightening, piece of non-fiction with the central message: if you don’t take your sleep seriously, you will suffer a world of health issues.
It’s not just the message that’s powerful. The book contains an abundance of academic research and data analysis, all of which unequivocally supports the narrative. By the end of the book, there is no doubt about it — the message is true, and we, the enlightened few, then go out to tell everyone else they should read it too. The end result for the publisher is 10s of millions of dollars in revenue. I’m sure Penguin gave Matthew Walker a big pat on the back, and a few million for himself, naturally.
What Matthew Walker did brilliantly to ensure a faultless narrative was, of course, omit unwanted data. There is no mention of research that goes against the idea that less sleep doesn’t make a difference, or could even be good for you. But, what is truly impressive is that there are no unwanted data points for analyses that are actually used in the book!
For example, on discussing the likelihood of injury in sports vs average sleep duration, a chart is shown, on page 129 of the book, presenting the results of a research paper. In it we have 4 four data points, 6hr = 72%, 7hr = 60%, 8hr = 34%, 9hr = 18%. The underlying data that built this chart comes from research published in the Journal of Paediatric Orthopaedics, titled “Chronic lack of sleep is associated with increased sports injuries in adolescent athletes”. The paper’s data is exactly the same, except for the fact that there is an additional data point. It has data for athletes with five hours average sleep, which had an injury risk of 54%.
Why might this data point have been omitted from Matthew Walker’s chart? It’s in a peer reviewed journal, so it’s not as if the five hours data point isn’t usable data. Of course, the answer is that it goes against the narrative. It’s much better to have a chart that goes from lowest sleep = worst to most sleep = best. Next time, when you have a pesky data point or two, remember, you can just remove it. If it’s acceptable enough for a Penguin published book by an author who is a UC Berkeley professor, then it’s acceptable enough for us.
Created by the author using Google Sheets.
Exploit pattern psychology
This may seem a little less obvious than the first technique, but it’s actually quite simple. It is about being aware of how people instinctively feel about seeing things on charts, and taking advantage of that for your benefit. Common perceptive patterns are:
Green = goodRed = badSharp rise = significantBig = significantCorrelation = causation
Fox News are great at this sort of stuff. However, depending on your audience’s ability to critically assess your work, you may need to adjust how much you take advantage of this technique.
Created by the author using Google Sheets.
Above uses the popular trick of starting the Y-axis at greater than zero. The effect is a much sharper difference between data points, which translates to a feeling of significance. Of course, if the direction between data points goes against your narrative, you will want to set the Y-axis as low as possible, as to convey a feeling of insignificance in difference.
Another way of presenting the data to suit your agenda is using colors. For most people, the color red is associated with something bad. Presenting a chart with a wall of red can be very effective in providing a sense of fear, crisis, alarm etc. to the audience. Climate change is a particularly great topic to use red as it’s also used to symbolize heat and fire.
A chart by The Guardian in the article “What’s happening with the climate crisis and heat-trapping emissions in Australia” perfectly executes the use of red.
‘World on fire’ version of chart. Created by the author using Tableau. Data source: Global Carbon Budget, 2022
Notice how every country on the map is red, even countries that produce zero fossil fuel emissions would be highlighted in a pinky red. Using the exact same data, without a full red color scale, produces a chart with a much less fearful punch to it, such as the one below:
‘Only parts of the world on fire’ version of chart. Created by the author using Tableau. Data source: Global Carbon Budget, 2022
Be selective in categorizing
This technique is somewhat similar to omitting data. However, it isn’t quite the same. Omitting data is simply removing the unwanted data from view, selective categorization is about shuffling data around until we find an adequate story. It is also less susceptible to criticism, and as such is most commonly used in areas where there is an attempt at critically assessing the analysis, such as published papers in peer-reviewed journals.
I’ll start with a simple toy example to explain what I mean by shuffling data around. Say I am studying the number of people who live in each household of a village. I have collected the following raw data:
If I want to convey the message that the village has many overcrowded households, then I can present the data like so:
Created by the author using Google Sheets.
All I have done is bundle 5,6,7, and 8 together to create one single data point, 5+. The final result is a chart with the biggest people per household category appearing the largest. Perfect for my narrative.
But, what if I don’t want that. In fact, what if I want to convey the exact opposite message. I can shuffle the data around a bit, and end up presenting the data like so:
Created by the author using Google Sheets.
Here, what I have done is bundle 0, 1, and 2 together, 3, 4, and 5 together, and 6, 7, and 8 together. As you can see, without even omitting a single point of raw data, I have completely changed the story.
This technique was brilliantly implemented in an academic research paper focusing on whether more guns reduce or increase crime. A highly cited paper by John Lott and David Mustard titled, “Crime, Deterrence, and Right‐to‐Carry Concealed Handguns” provides a remarkable chart of the consequences of implementing handgun concealment laws designed to increase the number of guns on the street.
source: “Crime, Deterrence, and Right‐to‐Carry Concealed Handguns” by John Lott and David Mustard
Take another second to really look at what this chart is saying, because it is quite remarkable. It would suggest that violent crimes are almost universally due to the lack of concealed handguns, as evident by the lack of variance i.e. the crime rate smoothly declines post introduction of handgun laws. So, how did the authors achieve such a remarkable chart?
Firstly, it takes advantage of technique number two, exploit pattern psychology, by starting the Y-axis at greater than zero. Secondly, and most influentially, it has selectively chosen what is and isn’t considered as a ‘violent crime’, ‘concealed handgun law’, and the ‘population’ of the study. For example, in the violent crime category, they choose to allow robbery and aggravated assaults with a gun, but not without a gun e.g. a knife. Why? Because, robberies and aggravated assaults on average increased post adoption of handgun laws, which goes against the narrative. But taking a small slice of this violent crime i.e. robberies and aggravated assaults with a gun, the average conveniently decreased post adoption of handgun laws.
Similarly, some regions and years were excluded from the population sample. These regions and years had huge spikes in violent crimes post concealed handgun law adoption. The reason the authors gave for removing these populations was due to ‘higher drug prices’ (I’m not making this up, page 24 “Crime, Deterrence, and Right‐to‐Carry Concealed Handguns”).
Repeat this sort of selective categorization a few more times, and voila — the perfect chart!
Strategically adjust readability
If, sadly, you can’t avoid publishing some unsightly data in your charts, there is one final technique you can use to maintain your narrative. Make the charts unbearable to look at. A classic is to present the data as a cold hard table. For example:
Created by the author using Microsoft Word
Did you read it all? If so you’re a stronger person than I am. I’ll be honest with you, I have no idea if this data is trying to manipulate me or not, all I know is I don’t want to read that ugly table, and you can’t make me!
If you do have some good data sprinkled around the nasty narrative-killing data, then do your best to make the good stuff as readable as possible. Nice simple line or bar charts will do. And, of course, make sure all the bad data gets the table treatment.
Conclusion
A message to data scientists, business analysts, academic researchers, and everyone else involved in presenting data: Manipulating your audience can be easy, but it’s not cool. It is important to make a deliberate effort to consider whether your work uses any of these four techniques. It may not have been intentional, but confirmation bias, as well as pressure from others, does cause manipulative data tricks to slip into one’s work.
A message to data consumers (basically, everyone): I hope this article has empowered you to critically evaluate the data and statistics you encounter. Furthermore, I encourage readers to be vocal about any poor data ethical discoveries you may find. Whether it is via verbal discussion, a social media post, blog, video essay, note in the comment section… the list goes on, please don’t be shy from criticising it when you see it.
If you’d like more of my insights on uncovering data issues, I encourage you to read my previous article, Stop being data driven, as well as following me on Medium for future articles. If you’d like to spread the reach of this article, please give it a clap (or 50 😏) and share on social media. Thanks for reading. Until next time, peace out data champs! ✌️
Advanced Techniques in Lying using Data Visualizations was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.