Questionable Things, Extraordinary Things, and Making Sure the Data Gods Let You Into Heaven: A Case Study of Introducting Graduate Students to Cleaning Data
The Guardian's 10 Rules of Data Journalism including the following: Data journalism is 80% perspiration, 10% great idea, and 10% output (https://www.theguardian.com/news/2014/mar/17/facts-are-sacred-exclusive-extract). While the percentages vary from situation to situation, this rule does capture a simple truth: the grubby work of getting data into shape is often much more time-consuming than the (maybe) glamorous, red-carpet work of analyzing data and presenting results. However, methods classes often focus more on the glamor than the grubbiness. Thesis and dissertation students are often left to their own devices to figure out how to work with data that are much more "messy" than the cleaned-up data they work with in classes.
How, then, to address this disparity? In this presentation, I will talk about organizing labs on cleaning data as part of a Public Health class from the Spring of 2024 on working with administrative and geospatial data to research drug-related harms and policy interventions. I will provide background for how I got involved in the class, what principles and particulars I tried to convey in the labs, and the replication assignment I created for the students in the class to bring together the material in the labs. The presentation will also discuss how I am updating the labs and material for the class in the upcoming Spring 2025 semester.