ggplot2
continuedNote: Simply add the questions for this lab to the end of the R Markdown document you have been using for Lab 2. We will have some time in class on Monday to wrap things up in case you do not finish.
This lab will be a continuation of our exploration of
ggplot2
. Whereas the first lab was oriented around creating
a number of standard plots from the data, here we will focus on a number
of ancillary issues, including titles and labels, legends, and themes.
The bulk of this lab will be focused on the topic of scales,
which manage the relationship between the data and the resulting
aesthetics. We will conclude by taking a closer look at some of the
arguments that can be used to augment different layers.
By default, plots made with ggplot2
do not include a
title, and the labels for the axes are taken from the variable names
given in aes()
. This is the case, for example, when we have
our plot of engine displacement (displ
) and highway miles
(hwy
):
library(ggplot2)
## Prettier graphs
theme_set(theme_bw())
ggplot(mpg, aes(displ, hwy)) +
geom_point()
We can add titles or change the x and y axis labels with the
functions ggtitle
, xlab
, and
ylab
, respectively
library(ggplot2)
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
ggtitle("Engine size to fuel economy") +
xlab("Displacement") +
ylab("Fuel Economy (Highway)")
Note that just like in the first lab, we can add subsequent
components with +
. As another note, it is common to create
a new line for each layer for readability.
As is typically the case with ggplots, there are multiple ways to
accomplish the same goal. The labs()
function allows us to
modify multiple labels at once by specifying them with an argument
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
labs(x = "Displacement", y = "Fuel Economy (Highway)", title = "Engine size to fuel economy")
The labs()
function also takes arguments for any
grouping aesthetics. For example, if we use the shape
aesthetic in aes()
we can then pass a shape
argument to labs()
to rename the legend of the plot (note:
the function factor()
as in factor(cyl)
turns
a continuous variable into a categorical one – we will learn more about
this later).
The argument name is the same as what is used for creating the groups, and changing these will make corresponding changes in the legend:
## Without label
ggplot(mpg, aes(displ, hwy, shape = factor(cyl))) +
geom_point()
## With label
ggplot(mpg, aes(displ, hwy, shape = factor(cyl))) +
geom_point() +
labs(shape = "Cylinders") # Since we used shape aesthetic, we use "shape" here
Question 10 Using the mpg
dataset,
create a box plot with class
on the x-axis and
cty
on the y-axis. Add a color aesthetic that accounts for
year
(by default, year
is a continuous
variable. Use factor()
to make it a categorical). Create
appropriate labels for the axes, title, and legend.
As you might imagine, there are a tremendous variety of options to
modify the style of your graphic. The collection of non-data related
elements of your plot, including the appearance of titles, labels,
legends, tick marks and lines all make up what is known as the
theme. Elements related to the theme are modified with the
theme()
function; a quick look at ?theme
demonstrates how comprehensive this list can be. Here, however, we
consider only a small subset of these items to demonstrate how the
process works. It is less important that any of these are memorized;
rather, knowing that such possibilities exist should assist you when
using search engines to learn how to modify your graphics.
The system for modifying themes consists of two components:
For example, elements consisting of text are modified with
the element function element_text()
. We can also see some
of the particulars that can be modified with ?element_text
.
To motivate this, consider the following box plot:
ggplot(mpg, aes(class, hwy)) +
geom_boxplot()
Because of the width of our figure, all of the labels on the x-axis
are bunched together. We can help fix this problem by rotating the axis
text on the x-axis. That is, we are modifying the element
axis.text.x
(that is, text that is located on the x-axis)
with the element function element_text
ggplot(mpg, aes(class, hwy)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle = 45))
Here, we see that the rotation has helped with the overlapping, but
now the text is running into our plot. We can further alter the
V erticle ad JUST
ment with vjust
ggplot(mpg, aes(class, hwy)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle = 45, vjust = 0.5))
It is highly unlikely (and completely unnecessary) that you would
remember on your own that text on the x-axis is specified with
axis.text.x
. However, if you find yourself in a situation
in which you have a general idea of what you want to change, it is
likely that looking through the arguments of ?theme
that you would find something matching what you want to do. This, along
with diligent search engine use, makes for a potent strategy in solving
most ggproblems.
Question 11 For this question, use the code in the block below. To the plot that is generated, modify the following:
plot.title
by changing its color to red and
writing it in italics.You will want to investigate ?theme
to find the
appropriate ways to do this.
ggplot(mpg, aes(displ, hwy, color = factor(cyl))) +
geom_point() +
labs(title = "My plot")
As mentioned above and explored in the previous section, ggplot manages the relationship between the data and aesthetics through the use of scales. And, we saw, that the scales used for the axes were different depending on whether or not the associated variables were continuous or discrete. As we will now see, the relationship between data and the color aesthetic is no different.
Consider the last lab, for example, in which we plotted the
relationship between displacement and highway miles colored by cylinder.
When cyl
was stored as a numeric (or integer) vector, the
resulting color scale was continuous, taking all values between
dark and light blue. However, once we included color as a factor, the
color scale became discrete, offering four distinct colors to represent
our groups:
This is an illustration of color being treated as either a
continuous or discrete scale. These scales are modified with
the functions scale_color_continuous()
and
scale_color_discrete()
.
There are primarily two types of continuous color scales we will concern ourselves with, and this will depend upon what we are trying to demonstrate. Generally speaking, there are two possible options:
Roughly corresponding to these two options are two types of color scales readily available for ggplot: viridis and gradient:
ggplot(mpg, aes(displ, hwy, color = cty)) +
geom_point() +
scale_color_continuous(type = "viridis")
ggplot(mpg, aes(displ, hwy, color = cty)) +
geom_point() +
scale_color_continuous(type = "gradient")
The viridis scales constitute a set of different color maps that are designed with a few thoughts in mind:
A range of different viridis scales are provided in ggplot, though
their description is not particularly well documented. You can select
different scales by passing an additional argument option
with options available for “A”-“H”. Here are a few for illustration:
ggplot(mpg, aes(displ, hwy, color = cty)) +
geom_point() +
scale_color_continuous(type = "viridis", option = "A") + ggtitle("Magma")
ggplot(mpg, aes(displ, hwy, color = cty)) +
geom_point() +
scale_color_continuous(type = "viridis", option = "D") + ggtitle("Viridis")
ggplot(mpg, aes(displ, hwy, color = cty)) +
geom_point() +
scale_color_continuous(type = "viridis", option = "E") + ggtitle("Cividis")
ggplot(mpg, aes(displ, hwy, color = cty)) +
geom_point() +
scale_color_continuous(type = "viridis", option = "H") + ggtitle("Turbo")
The gradient color type, on the other hand, gives you a bit more
control. Here, you can specify a high
and low
value, indicating the range of colors on which you wish to gradient.
Choosing colors that are on the opposite ends of a color wheel will give
you the best contrast.
ggplot(mpg, aes(displ, hwy, color = cty)) +
geom_point() +
scale_color_continuous(type = "gradient", high = "orange", low = "blue")
A list of colors provided in R are available here
Question 12 For this question, we are going to use
another dataset built into R, the USArrests
(see
?USArrets
). To do this, simply copy this line into your R
Markdown file and run it:
data("USArrests")
Create a scatter plot using this data with the urban population on the x-axis and the number of assaults per 100,000 residents on the y-axis. Then, choose two sensible colors and add a color gradient corresponding to the murder rate. Looking at this plot, does it seem that high rates of murder are more likely to correspond with larger urban population or with states with high rates of assault?
While there is an associated scale_color_discrete()
function for use with discrete variables, we will instead use a similar
function, scale_color_brewer()
, which comes with a full
suite of pre-built palettes for use with discrete variables. These can
be found in the documentation for ?scale_color_brewer()
.
The great thing about this is that with minimal effort, we can feel
confident that our colors are going to look good
ggplot(mpg, aes(displ, hwy, color = factor(cyl))) +
geom_point() + scale_color_brewer(palette = "Spectral")
ggplot(mpg, aes(displ, hwy, color = factor(cyl))) +
geom_point() + scale_color_brewer(palette = "Set2")
We conclude by considering the fill
aesthetic, similar
to color but used in bar charts and box plots. This works in an
identical way except we instead use scale_fill_brewer()
rather than scale_color_brewer()
## Load college dataset
college <- read.csv("https://collinn.github.io/data/college2019.csv")
ggplot(college, aes(Enrollment, Region, fill = Type)) +
geom_boxplot() +
scale_fill_brewer(palette = "Pastel2")
You can see a full list of the palettes available by looking in
?scale_color_brewer()
or
?scale_fill_brewer()
.
Question 13 For this question, we are going to use
the built-in dataset in R, ToothGrowth
(?ToothGrowth
) which measures the length of odontoblasts
(cells responsible for tooth growth) in 60 guinea pigs in response to
the administration of supplemental vitamin C. Three different doses of
the vitamin were given via two different delivery methods, either orange
juice or ascorbic acid. Begin by creating a box plot with the length of
odontoblasts on the x-axis and the dose (use factor(dose)
to make it categorical) on the y-axis. Use the fill
aesthetic to indicate supplement type, and use
scale_fill_brewer()
to select a different palette for the
colors. What can we learn from this plot? Is a higher dose of vitamin C
associated with increased tooth growth? Did either of the supplements
appear any better than the other?