Reproducible Computational Science:
Challenges and opportunities for research and IT

Panel at Duke TechExpo 2015
April 17, 2015

Moderator: Hilmar Lapp
Center for Genomic and Computational Biology (GCB)

You can find (and copy, edit, ...) the slides online:

All material for this panel is online (to be copied, edited, ...)

The Reproducibility Crisis

  • Only 6 of 56 landmark oncology papers confirmed
  • 43 of 67 drug target validation studies failed to reproduce
  • Effect size overestimation is common

Computional research:
Availability and technical challenges

Reproducing reproducible computational science:
an experiment

  • Software with many dependencies -> exponentially lower probability that all install

  • Holes or errors in documentation -> harmless for experts, often fatal for "method novice"

  • Software evolution & rot -> parameters that worked 1 year ago now throw an error

  • Dependency hell: baseline software and packages differ depending on who is trying to reproduce

Good - Better - Best

Peng, R. D. “ Reproducible Research in Computational Science Science 334, no. 6060 (2011): 1226–1227

Lessons re: End-to-end reproducibility

Any work you do to make your analysis more reproducible pays dividends for colleagues and your future self.

Jeremy Leipzig

A bewildering tech soup

  • Version control
  • Distributed version control
  • Git, Mercurial, Subversion
  • Provenance
  • SHA256
  • Docker
  • Docker Hub
  • Container tagging
  • Drone, Travis, Circle CI
  • VM memory, storage limits
  • Literate programming
  • Markdown
  • RMarkdown
  • Knitr
  • packrat
  • HIPAA, protected data
  • Firewalls
  • DataCite DOIs
  • Zenodo, Figshare
  • Dryad

A huge opportunity for Research Informatics to accelerate science

Panelists

  • Hilmar Lapp, Dan Leehr (Center for Genomic and Computational Biology)
  • Mine Çetinkaya-Rundel (Department of Statistical Science)
  • Karen Cranston (National Evolutionary Synthesis Center - NESCent)
  • Mark Delong (OIT, Research Computing)
  • Erich Huang (Div. of Translational Bioinformatics, Department of Biostatistics & Bioinformatics)
  • Darin London (OIT, Office of Research Informatics)

Intro Talks - 5 minutes each

  • Erich Huang:  Provenance and metadata APIs enabling reproducible research data
  • Karen Cranston:  Reproducible Science Curriculum - how to make computational research more reproducible for the rest of us