Data versus Conventional Wisdom: A Book Review

By Chuck | August 16, 2005

Freakonomics by Steven Levitt and Stephen Dubner provides an economist’s analysis of some social issues not normally associated with economics, such as, the causes of crime, the impact of parenting on child development, and the power of information in combating racial discrimination. This review identifies four common themes running throughout the book and offers a critical assessment of the strenghts and weaknesses of this book from one data librarian’s perspective.Following the suggestion of two IASSIST friends, I added Freakonomics: A Rogue Economist Explores the Hidden Side of Everything by Steven Levitt and Stephan Dubner to my vacation reading list, although I must admit that this was the only non-mystery that I read this summer. Levitt, who a few years ago was selected as the best American economist under forty, grapples with some social issues not normally associated with economists, such as, the causes of crime, the impact of parenting on child development, and the power of information in combating racial discrimination. Dubner is a journalist and author who in 2003 wrote an article about Levitt for The New York Times Magazine. Together they tell several stories based on economic analyses that challenge conventional wisdom regarding these social issues. What is freakonomics? They use this term to describe their application of economic analysis to any ‘freakish’ curiosity that catches their attention.


While the authors deny using a common thread to unify their book, I found four distinctive strands running throughout their stories. First, they follow in the footsteps of John Kenneth Galbraith questioning simplistic explanations rooted in conventional wisdom (pp. 89-90). They are, in a sense, social science muckrakers.


Secondly, Levitt and Dubner remind the reader repeatedly of the difference between correlation and causation. This is particularly evident in the chapter, Where Have All the Criminals Gone? where they tackle head-on the fundamental confusion between correlation and causality.


[I]f you just look at raw correlations between police and crime, you will find that when there are more police, there tends to be more crime. That doesn’t mean, of course, that the police are causing the crime … (p. 126)


The third strand, which is closely related to the second, is their commitment to move beyond simple correlation to a case of causality. The authors present several examples that incorporate solid research design to establish firmer ground for causal explanations, although the authors never speak directly about research design. The methods upon which they rely include establishing sequential time order, comparing control to experimental groups, controlling for spurious effects and normalizing measures to facilitate fair comparisons. In the chapter, What Makes a Perfect Parent, they mention a natural experiment resulting from a program by the Chicago Public School (CPS) system to desegregate its schools that exemplifies in their eyes an ideal for social research.


In the interest of fairness, the CPS resorted to a lottery. For a researcher, this is a remarkable boon. A behavioral scientist could hardly design a better experiment in his laboratory. Just as the scientist might randomly assign one mouse to a treatment group and another to a control group, the Chicago school board effectively did the same. Imagine two students, statistically identical, each of whom wants to attend a new, better school. Thanks to how the ball bounces in the hopper, one goes to the new school and the other stays behind. Now imagine multiplying those students by the thousands. The result is a natural experiment on a grand scale. This is hardly the goal in the mind of the Chicago school officials who conceived the lottery. But when viewed in this way, the lottery offers a wonderful means of measuring just how much school choice – or, really, a better school – truly matters (p. 158).


A final interwoven theme is the authors’ reliance on data, although at times they mistakenly use information and data interchangeably. They assert, “If you learn to look at data in the right way, you can explain riddles that otherwise might have seemed impossible (p. 14).” You find them often asking, “What do the data reveal?” In the tradition of one tribe of economists, measurement is paramount: “Knowing what to measure and how to measure it makes a complicated world much less (p. 14).”


These four threads occur throughout the book underlining the importance of good social science research in a quest to understand complex social issues. As a data professional in the social sciences, I enjoyed the ease by which they present social science explanations without slipping too far into jargon. As social researchers struggle to communicate their findings with the general public, the style applied in Freakonomics is worth consideration.


This book demonstrates the immense value of accessible research data for secondary analysis. Yet, one of the toughest challenges of secondary analysis is locating data sources that contain the appropriate mix of variables to pursue a specific line of investigation. Operationalising a research question remains an art form rarely discussed by researchers and insight into how the authors accomplished this feat is not to be found in Freakonomics. How difficult was it to find data allowing them to investigate their research questions? How often did they have to reshape their initial research question to fit existing data because they could not find data that would support their original research? These are questions that I would have liked them to address.


I was disappointed that the authors did not spend more time explaining their choice of dependent variables, a concept that they never broach. For example, when discussing the impact of parenting, the authors move from the nature-versus-nurture debate to explaining student performance on standardised tests. Why did they choose test scores as their key dependent variable in this instance? Was their choice based merely on convenience? We are never given an explanation other than being told that this variable exists in the Early Childhood Longitudinal Survey. In conjunction with this, I wanted to hear explicitly how their dependent variables related directly to their original research questions.


While Levitt and Dubner rely extensively on a rich variety of data, they do not always cite the data. As a data librarian, I wish that the authors had followed the data citation practice of Robert Putnam in Bowling Alone: The Collapse and Revival of American Community, where a separate appendix was dedicated to all data sources used in his book. If we are expected to believe the authors truly value data, evidence of this conviction would have been most easily demonstrated through a comprehensive listing of data sources. This was a missed opportunity to place data prominently in the eye of the public.


Chuck Humphrey