So you have a new data set. Before you dive into running models and tests, you need to inspect your data. John Tukey, a prominent statistician, coined the term “exploratory data analysis”. Data exploration can inform a number of decisions:
In this lesson, we begin with a messy version of the Gapminder data and explore it together. We will find some issues with the data and teach you how to correct them. After making the data tidy, you will be able to plot the variables in different ways and see patterns.
Prerequisites
Some experience with Python is helpful, but not strictly needed.
Data Exploration | Tidying, summarzing, and plotting data | Lesson narrative | Student notebook |