Writing functions to generate figures

Now that we have functions to generate the datasets we need for our paper, we can start using them to generate the figures.

To illustrate this concept, we are going to generate a figure before converting it into a function.

gdp <- read.csv(file="data-output/gdp.csv")

gdp %>%
    filter(country %in% c("United States", "France", "Zimbabwe", "South Africa")) %>%
    ggplot(aes(x = year, y = lifeExp, color = country)) +
    geom_point(aes(size = gdpPercap)) + geom_line() +
    geom_vline(xintercept = 1985, color = "red", linetype = 2)

Writing to PDF

If we want to make a PDF file of this figure we could do:

pdf(file="gdp_comparison.pdf", width=8, height=6)

gdp %>%
    filter(country %in% c("United States", "France", "Zimbabwe", "South Africa")) %>%
    ggplot(aes(x = year, y = lifeExp, color = country)) +
    geom_point(aes(size = gdpPercap)) + geom_line() +
    geom_vline(xintercept = 1985, color = "red", linetype = 2)


dev.off()

Shortcomings

It's not very neat, we are hardcoding where the year break (the vertical dotted red line) should occur. This example is relatively simple but if you are building a more complex figure that relies on several variables, it means that they will be globally available in your knitr document, potentially causing conflicts down the road.

Improvements

Let's convert this into a function:

fig_gdp_comparison <- function() {

    gdp %>%
      filter(country %in% c("United States", "France", "Zimbabwe", "South Africa")) %>%
      ggplot(aes(x = year, y = lifeExp, color = country)) +
      geom_point(aes(size = gdpPercap)) + geom_line() +
      geom_vline(xintercept = 1985, color = "red", linetype = 2)

}

Prettier

so this part gets a little prettier:

pdf(file="fig_gdp_comparison.pdf", width=8, height=6)
fig_gdp_comparison()
dev.off()

If you start making a lot of figures, it would be nice to have to repeat this first and third lines…

Automated

Let's create another function that will automate this process:

## An example of a function that generates a PDF file from a function
## that creates a plot
## See http://nicercode.github.io/blog/2013-07-09-figure-functions/
make_pdf <- function(expr, filename, ..., verbose = TRUE) {
    if (verbose) {
        message("Creating: ", filename)
    }
    pdf(file = filename, ...)
    on.exit(dev.off())
    eval.parent(substitute(expr))
}
make_pdf(fig_gdp_comparison(), "fig_gdp_comparison.pdf", width = 8, height = 6)

Further improvements

We can even improve our fig_gdp_commparison to make it a little more general. For instance, we can add arguments such that the vertical line isn't always at 1985 but is specified. We can use the same approach for the list of countries to be included in the plot:

fig_gdp_comparison <- function(year_break = 1985,
                               countries = c("United States", "France", "Zimbabwe", "South Africa")) {

    gdp %>%
      filter(country %in% countries) %>%
      ggplot(aes(x = year, y = lifeExp, color = country)) +
      geom_point(aes(size = gdpPercap)) + geom_line() +
      geom_vline(xintercept = year_break, color = "red", linetype = 2)

}

Your turn

## Create your own function that generates a plot and use it with make_pdf.

## If you are looking for some inspiration (or not too familiar with R
## syntax), the code below compares the relationship between GDP and
## life expectancy for Japan and Finland.

finland <- read.csv(file = "data-raw/Finland-gdp-percapita.csv")
japan <- read.csv(file = "data-raw/Japan-gdp-percapita.csv")
comp_finland_japan <- rbind(finland, japan)

ggplot(comp_finland_japan, aes(x = gdpPercap, y = lifeExp, color = country)) +
   geom_point() +
   stat_smooth(method = "lm", se = FALSE)