OverviewTeaching: 5 min
Exercises: 0 minQuestions
How do you approach and evaluate data of unknown origin?
What are the pain points when inheriting a data project?Objectives
Evaluate a project for reproducibility.
Identify assumptions and red flags.
Recognize documentation and structure gaps.
You have just started a new job and have to take over the work of a previous employee who has left the lab and gone off the grid. You receive an Excel file of this person’s life work. Your boss has instructed you to:
- Make sense of all the data he has collected
- Write a report on the findings to share with others in the lab, so they may use the data and analyses in their own work.
The file which was sent to you can be downloaded here: gapminderDataFiveYear_superDirty.xlsx.
Download the file. With the goal of making sense of the data, what can you tell me about this data, and how do you know that?
- What is it?
- Where did it come from?
- When was it collected?
- Has anything been changed? If so, why was it changed?
Using disorganized data is time-consuming and error prone.
Collaborators like your past self do not respond to email.