Find us on GitHub

A Reproducible Research Workshop

Duke University, Perkins LINK 072 (Classroom 6)

Mar 15-16, 2017

9:00am - 5:00pm

Instructors: Karen Cranston (Duke, Biology), Hilmar Lapp (Duke, GCB), Dan Leehr (Duke, GCB), R. Burke Squires (NIAID), Jamie Whitacre (UC Berkeley)

General Information

This hands-on workshop teaches basic concepts, skills and tools for working more effectively and reproducibly with data in Juypter notebooks.

The following are the overarching learning objectives for the curriculum.

  • Understand the value of reproducible research practices for more effective research for the current and future you.
  • Understand the value of reproducible research practices for advancing research as a whole.
  • Understand what is meant by making your research more reproducible.
  • Know practices to make your research more reproducible, in particular by using Jupyter Notebooks, and have the skills to do so.
  • Have the confidence and foundation to continue improving reproducibility of your research.
  • Understand what’s possible to further advance reproducibility of your research.
Please also see the general overiew of the workshop and the curriculum.

Participants should bring their laptops and plan to participate actively. By the end of the workshop learners should be able to more effectively manage and analyze data and be able to apply the tools and approaches directly to their ongoing research.

Who: This workshop is aimed at graduate students, postdocs, and other researchers who perform computational analysis or work. The material uses basic Python and Jupyter Notebooks for teaching and illustrating the key concepts. Advanced knowledge of Python is not needed, but some familiarity with it will aid in absorbing the material.

Where: Perkins LINK 072 (Classroom 6). Get directions with OpenStreetMap or Google Maps.

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating sytem (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below). They are also required to abide by Data Carpentry's Code of Conduct.

Contact: Please mail hilmar.lapp@duke.edu for more information.


Preliminary Schedule

Day 1

Morning Introduction, Data & Project Organization
Afternoon Data Exploration

Day 2

Morning Automation
Afternoon Publishing and Sharing

Etherpad: http://pad.software-carpentry.org/2017-03-15-duke.
We will use this Etherpad for chatting, taking notes, and sharing URLs and bits of code.


Syllabus

Introduction

  • Intro to and motivation for reproducibile research
  • Intro to Jupyter Notebooks
  • Navigating the Notebook
  • Python and Jupyter Resources

Data & Project Organization

  • Evaluating data for reproducibility
  • Organizing project files and directories
  • Writing metadata files
  • Modifying data reproducibly
Lesson Materials

Data Exploration

  • Importing data into Jupyter Notebooks
  • Assessing structure and cleanliness
  • Data cleaning
  • Summarizing data
  • Visualization with Matplotlib & Seaborn
Lesson materials and download

Automation

  • Best practices of variable naming
  • Learn about "Don't Repeat Yourself" (DRY)
  • Modularizing your code
  • Refactoring your code
  • Defining functions
  • Importing custom functions and using in a Jupyter notebook
  • Basic python testing (time permitting)
Lesson materials, Data download

Publishing and Sharing

  • Exporting the Notebook
  • Documentation
  • Record-level Metadata
  • Publication: Identifiers and licensing for research products.
Lesson materials

Setup

To participate in this workshop, you will need working copies of the described software. Please make sure to install everything (or at least to download the installers) before the start of your workshop. Participants should bring and use their own laptops to insure the proper setup of tools for an efficient workflow once you leave the workshop.

Jupyter notebook

The best way to install the Jupyter notebook is to use Anaconda.

Regardless of how you choose to install it, please make sure you install Python version 3.x (e.g., 3.4 is fine).

Windows

  1. Open http://continuum.io/downloads with your web browser.
  2. Download the Python 3 installer for Windows.
  3. Install Python 3 using all of the defaults for installation except make sure to check *Make Anaconda the default Python*.

Mac OSX

  1. Open http://continuum.io/downloads with your web browser.
  2. Download the Python 3 installer for Mac OSX.
  3. Install Python 3 using all of the defaults for installation

Text Editor

When you're writing code, it's nice to have a text editor that is optimized for writing code, with features like automatic color-coding of key words. The default text editor on Mac OS X and Linux is usually set to Vim, which is not famous for being intuitive. if you accidentally find yourself stuck in it, try typing the escape key, followed by :q! (colon, lower-case 'q', exclamation mark), then hitting Return to return to the shell.

Windows

Video Tutorial

nano is a basic editor and the default that instructors use in the workshop. To install it, download the Software Carpentry Windows installer and double click on the file to run it. This installer requires an active internet connection.

Others editors that you can use are Notepad++ or Sublime Text. Be aware that you must add its installation directory to your system path. Please ask your instructor to help you do this.

Mac OS X

nano is a basic editor and the default that instructors use in the workshop. See the Git installation video tutorial for an example on how to open nano. It should be pre-installed.

Others editors that you can use are Text Wrangler or Sublime Text.