Saturday, July 4, 2015

Scatterplots with ggplot2

While the standard R installation comes with basic scatterplot functions, one often needs more advanced functions. To this end, one very popular package is ggplot2, where gg refers to "grammar of graphics".

library(ggplot2)
library(choroplethr)
data(df_state_demographics)
names(df_state_demographics)

outputs

[1] "region"            "total_population"  "percent_white"     "percent_black"  
[5] "percent_asian"     "percent_hispanic"  "per_capita_income" "median_rent"    
[9] "median_age"    

Suppose we want to see the relationship between per capita income and median rent in each state. A simple way of doing this would be 

ggplot(df_state_demographics, aes(x=per_capita_income, y=median_rent)) 
    + geom_point(shape=1)

The first argument tells us the dataset we want to use. We can then specify the x and y variables within aes. Think of aes as creating an "aesthetic", or something which allows you to specify which variables go where. 
At a bare minimum, ggplot requires us to specify what shape we want the plots to take. These must be added on as a separate layer. Hence the command "+ geom_point(shape=1)". The output follows:



If we want a linear regression line, we can tack on another layer: 

ggplot(df_state_demographics, aes(x=per_capita_income, y=median_rent)) 
    + geom_point(shape=1)
    + geom_smooth(method=lm)

which gives us



What if we don't want confidence intervals? Then we can try

ggplot(df_state_demographics, aes(x=per_capita_income, y=median_rent)) 
    + geom_point(shape=1)
    + geom_smooth(method=lm, se=FALSE)

 and...


Finally, what if we want to do LOESS? Just omit the arguments within geom_smooth.

ggplot(df_state_demographics, aes(x=per_capita_income, y=median_rent)) 
    + geom_point(shape=1)
    + geom_smooth()


Of course, we would want to make things fancier. For example, we might want to add a title. To make things simple, let's save our base plot as base_plot:

base_plot <- ggplot(df_state_demographics, aes(x=per_capita_income, y=median_rent)) 
    + geom_point(shape=1)
    + geom_smooth()



How can we:
1. Add a title?

base_plot + ggtitle("State per capita income and median rent")

2. Add labels?

base_plot + xlab("Per capita income ($)") + ylab("Median rent ($)")

Of course, this is just the tip of the iceberg. You may wish to see this excellent tutorial (part of this blogpost was drawn from there).

No comments:

Post a Comment