Cooking Data Reflection Paper

Project 3

Course: Intro to Quantitative Analysis for Geographers

Date: Spring 2021

Type: Literature Review

Background: Throughout this course we learned how to use quantitative analysis in many different ways to draw conclusions. To conclude the semester, we reviewed a book that tells the reader how raw data is collected and how this collected data is processed. It also showed the differences between clean and dirty data, which occurs with the processing of the raw data as well as the difficulties that come with cross-cultural research.

Summary: The book called Cooking Data takes the reader on the journey it takes to collect primary data. Through this journey the difference between clean and dirty data is explained through the data that is being collected in Malawi on the AIDS epidemic. The author explains that clean data is, “data that are accurate, reliable, efficiently and ethically collected and representative of sufficiently large and bounded samples over time” (Biruck 7). This data can be collected under the proper circumstances, but an easy to use and understand survey questionnaire is needed in order to keep the data as clean as possible. Gathering clean data is made possible by the people that set up for the survey before the surveyors ever enter the field as they are the ones that are setting the surveyors up for success. On the other hand, dirty data is raw data that is, “deformed, dirty, or useless through bad data practices and human error or other contingencies in the field” (Biruck 4). Raw data can become dirty very easily through human mistakes or even through the terrible notion of a surveyor making up numbers on a questionnaire. Also, when the surveyors struggle with the survey questionnaire, they are more likely to be collecting dirty data because they are more likely to make a small mistake that could possibly throw off the entire survey. This book also talks about the difficulties that come with cross-cultural research since translating a survey question for a person that speaks another language can impact the accuracy of the results because it is very difficult to directly translate certain things leading to misleading or dirty data.

Comments and Critics: It can be a very challenging task to collect clean data, and this is the very reason why dirty data is such a common occurrence. Many of the people that were followed in this book also had with a language barrier while they were researching in Malawi, which caused mishaps in the data, which during processing became dirty data. Overall, raw data can very easily become dirty, which is why it is so important to have multiple people working on a dataset so that any discrepancies can be noticed and taken care of before the data is published. Another reason why raw data becomes dirty is because the scientists that are collecting the data are tired and overworked, which causes them to make mistakes while processing the data. All of this clearly shows that the preparation stage that occurs before the data scientists ever enter the field is extremely important so that we can process more clean data without having to worry about cleaning up so much dirty data.