Bonus Post: Big Data

Everybody Lies:  Big Data, New Data, and What the Internet Can Tell Us about Who We Really Are     Seth Stephens-Davidowitz     (2017)


Forget all those social science surveys that ask 200 people about their sexual preferences or their attitudes towards those outside their own racial group. Everybody lies, or at least enough people lie to make the results of such surveys highly suspect. We also lie to our families, to our friends, and to our doctors. This is the message from economist Seth Stephens-Davidowitz, who mines vast troves of anonymous data from Google searches, social media sites, and similar sources to try to get closer to the truth.

Stephens-Davidowitz has an engaging way of presenting the complex statistical analyses that he performs. He proceeds by topic, telling stories that uncover fallacies in our assumptions about subjects such as prejudice, child abuse, abortion, economic mobility, and basketball stardom. For example, there’s an assumption that African American boys from impoverished neighborhoods have a good chance of making it in the National Basketball Association. Stephens-Davidowitz crunches the Big Data and finds that it’s actually mostly middle-class African American boys who succeed in basketball, though there are notable exceptions, like LeBron James.

The analyses of Americans’ views on race—particularly in relation to the presidential elections of 2008, 2012, and 2016—are enlightening. Stephens-Davidowitz studied millions of Google searches for such topics as racist jokes, as well as the rise of the website Stormfront, which he describes as “America’s most popular online hate site” (137). He concludes, “Trump rode a wave of white nationalism. There is no evidence here that he created a wave of white nationalism. Obama’s election led to a surge in the white nationalist movement. Trump’s election seems to be a response to that.  . . .States disproportionately affected by the Great Recession saw no comparative increase in Google searches for Stormfront.” (139) In other words, racism has probably played a larger role than economic hardship in recent elections.

To his credit, Stephens-Davidowitz does not view everything through the lens of the internet. “The Big Data revolution is less about collecting more and more data. It is about collecting the right data. But the internet isn’t the only place where you can collect new data and where getting the right data can have profoundly disruptive results.” (62) He recounts the story of how one horse enthusiast’s meticulous data collection about the physical characteristics of race horses led to a highly accurate method for predicting winners.

Stephens-Davidowitz does touch on the issue of the ethics of tapping Big Data for understanding human nature, particularly with respect to financial transactions. “Do we want to live in a world in which companies use the words we write to predict whether we will pay back a loan? It is, at a minimum, creepy—and quite possibly, scary.” (260) He also has plenty of cautions against confusing correlation with causality. But I would have liked to see more discussion in this book about the ethical implications of using Big Data in the first place. Do we give up all our rights to privacy when we initiate a search on Google, even if big data is supposedly anonymous? Where are the protections for human subjects that are required in more conventional social science surveys? How can we be sure of the motives of the data seekers typing in those Google queries?

And what about corporate abuse of Big Data?  Stephens-Davidowitz says, “Data on the internet . . . can tell businesses which customers to avoid and which they can exploit. It can also tell customers the businesses they should avoid and who is trying to exploit them. Big Data to date has helped both sides in the struggle between consumers and corporations. We have to make sure it remains a fair fight.” (265) I’m skeptical that consumers can be protected against corporations in the current political climate. And the conclusions that Stephens-Davidowitz presents about Americans’ racial prejudices must be pretty disheartening to anyone interested in societal equity and social justice. All the more reason why you should read this important book, which explains an effective means of probing the truth beneath the lies that everybody tells.

This post was a mid-week bonus. Come back to the Cedar Park Book Blog on Friday for the regular post!