Data & Project Organization

This lesson shows how to organize data in projects by adopting naming conventions, directory structures, and metadata standards in ways that encourage reproducibility.

We’ll be exploring a data set, identifying the pitfalls of dealing with real-world data, and providing a framework for the rest of the workshop.

Prerequisites

Schedule

Setup Download files required for the lesson
00:00 1. Introduction How do you approach and evaluate data of unknown origin?
What are the pain points when inheriting a data project?
00:05 2. Project Structure How can you organize a project so it makes sense to your future self?
What are some useful file naming strategies?
Why and for what should we use README files?
How do you document modifications that have been made?
00:25 3. Metadata Why should we use README files?
What format should README files be in?
What type of information goes into a README file?
When should a README file be updated?
00:35 4. Modifying data What are the pitfalls of modifying data by hand?
How do you document modifications that have been made?
How should you structure the data cleaning process?
00:50 5. Concluding thoughts How do you organize data to encourage reproducibility across different projects?
How do you record the origin and history of data?
How do build up a project that can be understood by yourself or others in the future?
00:55 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.