Documentation

Overview

Teaching: 10 min
Exercises: 15 min
Questions
  • Why should I invest time in good documentation?

  • How does my target audience influence my documentation strategy?

  • What are some published examples of good documentation?

Objectives
  • Describe how documentation is useful to yourself and to others

  • Evaluate and rank the quality of comments in published notebooks

  • Evaluate and rank the quality of existing metadata records.

  • Describe types of metadata directly relevant for research reproducibility.

Overview

Documenting your process, especially as it concerns your data, is a key element of making your research more reproducible. If you do not thoroughly record all the data manipulation steps you used to process data, it will likely be impossible for you, or anyone else, to repeat the analysis in the future (Wilson et al. 2016). Using the Jupyter Notebook for scripting your data processing is powerful because it saves the code – the what – and interspersed it the motivations behind each step, i.e., the why.

There is also project-level documentation that isn’t needed to understand a particular series of data processing steps, but to understand the organization of the project as a whole. Finally, documentation can be used to aid discoverability.

In this lesson, we will discuss the types and styles for documentation, their utility, and how you might tailor them for different audiences.

Learning objectives

Documentation best practices

Consider the target audiences

README file

It is important to write a brief overview of your project. A README file is a short file (think 1-pager) in the project’s home directory, and typically is the main entry point for readers to the project, including in particular the code. It should thus answer questions others will commonly have when they come upon the project, including the following:

A README should be written in text, with markup that is easy to read (such as Markdown, Reitz 2016).

Based on the above, items to include in a README file include the following:

Exercise 1

Compare and contrast different research product archives for the quality and value of their documentation, and their corresponding utility for reuse.

Metadata quality: Good - Better - Best

Metadata is the contextual information required to interpret data (Fig 1) and should be clearly defined and tightly integrated with data . The importance of metadata for context, reusability, and discovery has been written about at length in guides for data management best practices. Hart _et al. Ten Simple Rules for Digital Data Storage. PLoS Comput Biol. 2016;12: e1005097_

Metadata include information about data points, observations (rows, columns), samples, etc. There are also record-level metadata (metadata of research inputs and products as records), including typically the following:

Good metadata are important for reproducible research, because they describe the data at various levels:, including measurement protocols, observations, versions of software and other tools, and thus provide the context for interpreting the data, analysis, and results.

Metadata also aid discovery.

Exercise 2

This is a continuation of Exercise 1. Rank the following Zenodo records from from 1 (most helpful/informative) to 3 (least helpful/informative) for metadata quality.

Discuss the following questions:

  • What were the criteria that you used to rank?
  • What was missing?
  • What was the most helpful?
  • What was the most critical piece of information?

Examples for learning what’s possible

Key Points

  • Your code tells what you did. Your documentation tells why you did it and why it is important.

  • Documentation is the key to communicating your workflow and findings with your future self, collaborators, peers, and the general public.

  • Jupyter Notebooks are powerful because it allows documenting the what (the code) and the why (the motivation and/or intepretation) interspersed with each other.

  • Good, better, best: Some metadata are already much better than none, more metadata make better metadata.