A curriculum for teaching Reproducible Computational Science bootcamps

Hilmar Lapp, Duke University
Participants of the Reproducible Science Curriculum Hackathon

BOSC 2015, Dublin, Ireland
CC0

Reproducibility crisis

  • Only 6 of 56 landmark oncology papers confirmed
  • 43 of 67 drug target validation studies failed to reproduce
  • Effect size overestimation is common

Nature Special Issue on Challenges in Irreproducible Research

Reproducibility matters

Lack of reproducibility in science causes significant issues

  • For science as an enterprise
  • For other researchers in the community
  • For public policy

Science retracts gay marriage paper

  • Science retracted (without lead author's consent) a study of how canvassers can sway people's opinions about gay marriage

  • Original survey data was not made available for independent reproduction of results (and survey incentives misrepresented, and sponsorship statement false)

  • Two Berkeley grad students attempted to replicate the study and discovered that the data must have been faked.

Source: http://news.sciencemag.org/policy/2015/05/science-retracts-gay-marriage-paper-without-lead-author-s-consent

Reproducibility matters

Lack of reproducibility in science causes significant issues

  • For science as an enterprise
  • For other researchers in the community
  • For public policy
  • For patients

Seizure study retracted after authors realize data got "terribly mixed"

From the authors of Low Dose Lidocaine for Refractory Seizures in Preterm Neonates (doi:10.1007/s12098-010-0331-7:

The article has been retracted at the request of the authors. After carefully re-examining the data presented in the article, they identified that data of two different hospitals got terribly mixed. The published results cannot be reproduced in accordance with scientific and clinical correctness.

Source: Retraction Watch

Reproducibility matters

Lack of reproducibility in science causes significant issues

  • For science as an enterprise
  • For other researchers in the community
  • For policy making
  • For patients
  • For oneself as a researcher

Reproducibility = Accelerating science

  • If my research is difficult to reproduce it impedes my lab, and my future self.

Any work you do to make your analysis more reproducible pays dividends for colleagues and your future self.

Jeremy Leipzig

Reproducible computational research is challenging

  • Most software has many dependencies, any one of which can fail to install.
  • Gaps and errors in docs may be harmless for experts, but are often fatal for “method novices”.
  • Software evolution means that parameters that worked a year ago may now throw an error.
  • Dependency hell: baseline software and packages differ from one to another.

NESCent Informatics experiment on reproducing reproducible computational research

Bewildering technology soup

  • Distributed version control
  • Git, Mercurial, Subversion
  • Provenance
  • SHA256
  • Docker, Docker Hub
  • Continuous Integration

  • Literate programming
  • RMarkdown, Knitr
  • DataCite DOIs
  • Dryad, Zenodo, Figshare
  • HIPAA, PHI
These are all about technology, not scientific discovery.

Reproducible Science Curriculum Workshop & Hackathon

To develop an open source curriculum for a two-day workshop on reproducibility for computational research

Reproducible Science Curriculum logo

Reproducible Science Curriculum Workshop & Hackathon

Reproducible Science Curriculum Workshop & Hackathon Participants Reproducible Science Curriculum Workshop & Hackathon Participants

  • Held December 11-14 at NESCent in Durham, NC
  • Two days brainstorming / unconference, followed by two days curriculum development
  • 21 participants comprising statisticians, biologists, bioinformaticians, open-science activists, programmers, graduate students, postdocs, untenured and tenured faculty

Reproducible Science Curriculum Workshop & Hackathon