Metadata

Overview

Teaching: 5 min
Exercises: 5 min
Questions
  • Why should we use README files?

  • What format should README files be in?

  • What type of information goes into a README file?

  • When should a README file be updated?

Objectives
  • Describe the purpose of including README files with your project.

  • Describe common locations for README files.

  • Describe the appropriate level of detail to include in a README.

Why READMEs

Every project should describe to users what the purpose of the project is. This is commonly done in a README file. As the starting point for a project the README file is formatted as plain text (or markdown) to make it easily readable. A README file should include the following information:

Think about the beginning of this lesson, when we had nothing but a file with a name. These are the things that would have made it easy to make sense of that data.

So, before we make any modifications to the raw data, we need a practice for how to record the initial state of the data, as well as our modifications.

Adding a Top Level README

To add a README to our project, open a text editor. For Mac users this can be BBEdit, NotePad++ for Windows users.

Now, let’s make a README

Project name
Today's date
Maintainer's contact info
Data Origin
3-4 sentences about the goal of the project

This file serves as the starting point for future you, or anyone who receives this data.

Adding a README in a Subdirectory

README files in subdirectories are a good idea too. Often there are many files, and it’s distracting to fill the top-level README with details about smaller pieces of the project.

Keeping the READMEs up-to-date

Self-documenting Projects

READMEs are commentary on what we consider the “real work”, and realistically can be an afterthought. We’ve all had projects under a deadline or someone asking for a result, and the documentation step is easy to defer until later.

Later never comes, or we forget the details by the time it does. So another good practice is to use good, descriptive names on files, directories, and in code. These are for our benefit, not the computer.

Project README

gapminder/README.md

gapminder
=========

## Project Summary

This project analyzes population-level statistics about many countries to
determine if there is a relationship between x and y.

Started: 2017-03-15
Maintainer: Dan Leehr dan.leehr@duke.edu

## Data Origin

This data is the gapminder dataset, originally collected and published in XXX,
and retrieved from [1]. The dataset reflects population-level statistics about
many countries spanning the last several decades.

[1] https://github.com/Reproducible-Science-Curriculum/organization-RR-Jupyter/raw/gh-pages/data/gapminderDataFiveYear_superDirty.xlsx


## Summary of changes

2017-03-15	dan.leehr@duke.edu	Inherited gapminderDataFiveYear_superDirty.xlsx from Frank Grimes.
					Placed raw data in 00_raw.
2017-03-15	dan.leehr@duke.edu	Cleaned gapminderDataFiveYear_superDirty.xlsx file in 01_cleaning and
					exported to CSV format in 02_cleaned.

Good, Better, Best

  • Good Plain text
  • Better Date, name, contact info, short summary, Markdown
  • Best Plus history of all changes to the project, checksums

Key Points

  • All projects should include a README file in the top directory.

  • README files should include contact points and names of maintainers, date, brief description of the intent of the project, and the source of any data files.

  • Use a README to include changes made over time.

  • README files should be made in a plain text format.