"Reproducibilty is actually all about being as lazy as possible!",
– Hadley Wickham (via Twitter, 2015-05-03)"Reproducibilty is actually all about being as lazy as possible!",
– Hadley Wickham (via Twitter, 2015-05-03)Depending on your previous experience with R Markdown, this lesson might seem too advanced as it provides ways to deal with annoyances you can face when you try to use it within a large project.
If you have never used R Markdown before this is a good opportunity to learn efficient practices right away.
None of these tools are exceedingly hard to learn or to grasp, but trying to learn everything at once might feel overwhelming.
R Markdown allows you to mix code and prose, which is wonderful and very powerful, but can be difficult to manage if you don't have a good plan to get organized.
Demonstrate writing functions to generate the clean version of your data, your figures, your tables and your manuscript.
Why? Having all the content of your manuscript as a function will greatly facilitate the upkeep of your manuscript as it forces to be organized.
By breaking down your analysis into functions, you end up with blocks of code that can interact and depend on each others in explicit ways.
so you can focus on the important stuff!
-If all your analysis is made up of scripts, with pieces that are repeated in multiple parts of your document, things can get out of hand pretty quickly.
Easiest way: add comments around your functions to explicitly indicate the purpose of each function, what the arguments are supposed to be (class and format) and the kind of output you will get from it.
Document not only the kind of input your function takes, but also the format and structure of the output.
roxygen is a format that allows the documentation of functions, and it can easily be converted into the file formats used by R documentation.
Code > Insert Roxygen Skeleton
or type Ctrl + Alt + Shift + R
on your keyboard.If these issues break something in your analysis, you might be able to find it easily, but more often than not, these issues might produce subtle differences in your results that you may not be able to detect.
If all your code is made up of functions, then you can control the input and test for the output. It is something that would be difficult if not impossible to do if all your analysis is in the form of a long script.
The testthat package provides a powerful and easy-to-use framework to build tests for your functions.
data-raw
: the original data, you shouldn't edit or otherwise alter any of the files in this folder.data-output
: intermediate datasets that will be generated by the analysis.
fig
: the folder where we can store the figures used in the manuscript.R
: our R code (the functions)
tests
: the code to test that our functions are behaving properly and that all our data is included in the analysis.Today we are going to work on functionalizing a knitr document that is more complex than what we have seen so far but not quite as complex as a "real" research document could look like.
Let's take a look at example-manuscript
folder…