Reproducible Research using Jupyter Notebooks:
Workshop Overview

Prerequisites

The course is aimed at graduate students, postdocs, and other researchers who perform computational analysis or work. The material uses basic Python for teaching and illustrating the key concepts. Advanced knowledge of Python is not needed, but some familiarity with Python will aid in absorbing the material.

Workshop Overview

This document provides basic information about Reproducible Science with Jupyter Notebook workshops for instructors:

All of our material is on GitHub with a CC0 copyright waiver: Reproducible Science Curriculum on GitHub

Learning Objectives

The following are the overarching learning objectives for the curriculum.

Workshop outline

A Reproducible Science with Jupyter Notebooks Curriculum workshop currently has five modules:

  1. Introduction
  2. Data and Project Organization
  3. Data Exploration
  4. Automation
  5. Publication and Sharing

I. Introduction

Goals: Students will understand the concept, importance, and components of reproducible research; understand the strengths of Jupyter Notebooks as a tool for reproducible research; be able tp create and navigate through a Jupyter Notebook containing Markdown and Code cells; and be able to know and access the broader Jupyter and Python ecosystems and communities.

Instructor’s skills: Familiarity with Jupyter notebooks; familiarity with markdown; basic python skills.

Materials
Repository: https://github.com/Reproducible-Science-Curriculum/introduction-RR-Jupyter

II. Data and Project Organization

Goals: Students will learn recognizing common data file formats and how to import them into a Jupyter notebook; be able to design and justify a directory structure and file naming convention for a project; be able to move from an empty notebook through exploratory analysis into a more refined script or set of notebooks that communicates results reproducibly.

Instructor’s skills: Good understanding of file organisation in research projects. Understanding of file structure on major operating systems (Windows, Linux/Unix, Mac OS) and the interface/commands for managing files and folders. Understanding of basic file types (binary vs. text). At least a basic overview of how files are stored (and deleted) in different operating systems. Understanding of file and folder naming conventions (names, extensions etc.).

Materials
Repository: https://github.com/Reproducible-Science-Curriculum/organization-RR-Jupyter

III. Data Exploration

Goals: Students will be able to assess the structure and cleanliness of their dataset; be able to describe their findings, translate results, and summarize their thought process in a narrative comprised of Markdown text and Python code in a Jupyter Notebook; learn practices for modifying raw data to prepare a clean data set in a reproducible and documented way; and be able to assess whether their data is “Tidy”, and how to arrange it into a tidy format.

Instructor’s skills:

Materials
Repository: https://github.com/Reproducible-Science-Curriculum/data-exploration-RR-Jupyter

IV. Automation

Goals: Students will learn how to programmatically assemble a manuscript using elements generated by a notebook, including text, headings and figures generated from code and data.

Instructor’s skills: Good understanding of programming concepts, in particular code modularisation, writing and using functions, code reusability and so on. Good understanding of selected software engineering concepts such as project build and automation, code testing, continuous integration and so on. Solid knowledge of Python, Jupyter, and relevant packages (consult the materials for details). Understanding of basic statistical concepts (consult the materials for details).

Materials:
Repository: https://github.com/Reproducible-Science-Curriculum/automation-RR-Jupyter

V. Publication and Sharing

Goals: Students will learn how to export their notebooks in a variety of formats for publication; be able to describe the utility of documentation to themselves and others; be able to describe and compose appropriate and descriptive keywords for a given record; be able to define and describe the importance of unique identifiers for data, publication and software; and learn how to select an appropriate license for their research artifacts.

Instructor’s skills: Understanding of requirements for reproducible publication. Understanding of differences between publication and sharing. Understanding the difference between open and restricted access publication. Overview of tools and repositories for publishing research outputs. Knowledge of different licensing models and ability to discuss major differences between the most commonly used licenses in research.

Materials:
Repository: https://github.com/Reproducible-Science-Curriculum/publication-RR-Jupyter

Workshops held previously

This curriculum is in early development and has not been taught yet.

Ongoing work

These materials are being developed and revised on an ongoing basis. The list of GitHub issues for the Reproducible-Science-Curriculum gives a pretty good idea of what is happening and what needs to be done.