June 1, 2015
We learned that everyone struggles with reproducibility and that it is a hindrance to moving science forward
We focused on 4 problems: organization, documentation, automation, and dissemination with a fairly simple analysis. Over the two day workshop, data analysis tasks will become more complex as we gather more data and ask more complicated questions
Documentation: difference between binary files (e.g. docx) and text, files and why text files are preferred for documentation, use markdown to document your workflow so that anyone can pick up your data and follow what you are doing
Organization: tools to organize your projects so that you don't have a single folder with hundreds of files
Automation: the power of scripting in the R programming language and how you can integrate that into markdown to create automated data analyses
Dissemination: publishing is not the end of your analysis, rather it is a way station towards your future research and the future research of others
There are a number of other great programming tools out there that can also be used to improve the reproducibility of your analysis
The key is to use some type of language that will allow you to automate and document your analysis
Once you master one language you'll probably find it easier to learn another
Upper right: workspace and a history of the commands that you've previously entered
Lower right: Any plots that you generate + access to files, help, packages
Left: Console
intro-01-template.Rmd
NOT about understanding all the R commands, but rather getting the big picture of how using R in this way facilitates reproducible analyses
Append the new data gapminder-7080.csv
and gapminder-90plus.csv
to your existing data set.
Be careful as you do so, as the ordering of columns in the data set may not match between the different CSV files!
Create line plots of life expectancy over time for Canada, Mexico, and the United States that run from 1952 to 2007.
Stretch goal: In the same plot, add similar line plots for Cambodia, China, and Japan and Uganda, Egypt, and South Africa.
Create a scatter plot depicting GDP vs. life expectancy of countries in Europe for 2007.
Stretch goal: In the same plot, add another scatter of points for Asia, Africa, and the Americas, coloring the countries from each region (continent) with the same color.
intro-02-template.Rmd
make clean; make
?