This is an R Markdown document.
At present, you are reading it formatted as an HTML document, but as we have seen, it can also be rendered as a PDF – when working with an .Rmd file, you can choose how it is presented by selecting the arrow next to the Knit ball of yarn an choose between “Knit to HTML” or “Knit to PDF”. For most of the work in this class, we will be knitting to a PDF to facilitate submissions to Gradescope.
When you Knit an R Markdown file, it will ask you to first save and name your document. I recommend sticking to a consistent format throughout the semester: something like “lastname_firstname_lab/hwk#”. For example, I would name this “nolte_collin_lab1.Rmd”. After you knit, this will also be the name of your pdf.
For this lab, we will be following along with this document which we can begin by downloading the .Rmd file associated with it here; this is allow us to see side-by-side how the .Rmd file looks compared to the finished output.
For example, reading this online we see that this sentence is bold, while in the .Rmd file we see that the sentence is a plain-text sentence surrounded by two asterisks (*) on each side. This is an example of using Markdown to format our documents. For this lab, we will be following along with the .Rmd file, filling bits in as prompted. Let’s begin by introducing some of the more common markdown syntax that we will use in this class.
We can use .Rmd files to write text just as we would with a standard word processor. As we saw previously, using two asterisks creates bold text, whereas using single asterisks creates text that is italicized.
Headers can be created using the # symbol. Using one creates a large header, two a smaller header, and so on. You can see examples of this in the text above. Take note that you need to leave a space between the # and the header text – forgetting to do so will render the text exactly as you have written it.
#For example, this will not show up as a header
We can also create sequential lists. To do so, start new lines with sequential numbers
We can create un-ordered lists with using stars or dashes
We can also create horizontal dividers with a line of stars (at least 3, but any number greater is ok). This is a good way to separate your answers from a given problem
Things are more clear if I put my answers between bars, but I need to be sure to leave a blank line between my text and the bottom star
Question 1:
Between the stars below, do the following:
Once you have done this, Knit to PDF by clicking the bar of yarn above and verify that everything looks like it should.
Whereas R is the programming language responsible for doing our relevant computation, our interactions with R will be primarily through R Studio. When you first open R Studio, your screen will partitioned into 3 panes: Console, Environment, and Files (and often, Editor)
The console makes up the entire left-hand side of the screen, and it
is here that we can interact with R directly. Some information will
appear on start-up, followed by a prompt (indicated with
>
). Try copying and pasting the following lines into
your console one at a time and pressing enter
5 + 2
sqrt(25)
pi
(3^2 + 1)*5
x <- 5
The environment panel is located in the top right and includes all of
the named variables that we have created in R. If you copied the lines
above correctly, you should see a single line here, indicating that the
variable x
has the value 5
. When starting a
new R session, this pane will be empty.
In the bottom right is the Files pane which, in addition to giving us information about our file system also includes tabs for Plots and Help. Copy each of the lines below to see how this pane changes
?sqrt
plot(1:10)
You can click any of the tabs in this pane to change what is currently being shown.
Finally, we have the editor. If you start a new R Studio session, there will be nothing in the editor, but this will change if we open either a .R or .Rmd file. An R Script is the simplest type of file which can be created from the menus at the top:
File -> New File -> R Script
The editor pane will appear with a blank R file in which you can write and store R code. Try copying the R commands from the Console section into an R Script, click on the line with code and click “Ctrl+Enter” on your keyboard. R Studio should execute this code and display the results in the Console. You do not have to save this file, and you can close it when you are finished.
While it is true that R scripts (files that end in .R) are used for the vast majority of written R code, we will primarily be using R Markdown files (files that end in .Rmd).
Perhaps the most obvious utility in using R Markdown documents for this course is their ability to interweave regular text with R. We can include and run R code by including it in a code chunk, such as the one shown below:
# This is a code comment, beginning with #
x <- 4
x^2
## [1] 16
This code will run and display its results when we push the Knit button, but we can also run the code without knitting by place by again clicking the line with our cursor and hitting “Ctrl+Enter”
For this portion of the lab, we will be introducing some of the basic
mechanics in the R language. This will include vectors
,
similar to the variables we discussed in class, as well as
data.frames
, the primary data type we will be using for
collections of observations.
You are welcome to follow along on the course website or read the R Markdown directly. In either event, I highly encourage you to try running the code that we see, either by coping it from the website and pasting into the console, or by running with the R Markdown code chunks.
Vectors in R are the simplest type of object. Every vector has two
attributes that will dictate how we use it: length and type. We have
already seen vectors of length one, which are created by assigning a
number to a name using <-
# This is a numeric vector of length 1
x <- 4
x
## [1] 4
We can create vectors of greater than length one using
c()
, which is short for “combine”
y <- c(2, 4, 6, 8, 10)
y
## [1] 2 4 6 8 10
Sequences of vectors can also be created with :
z <- 1:5
z
## [1] 1 2 3 4 5
Every element of a vector has a position that we can access directly
using square brackets []
in a process called
subsetting. Here, we grab the 3rd element of the vector
w
w <- c(3, 6, 9, 12)
w[3]
## [1] 9
Question 2:
Let’s practice creating vectors and subsetting with a short exercise.
x
that has all of the
numbers from 1 to 10Just as we saw in class that variables can be quantitative or categorical, so it is with vectors in R. There are primarily 3 types of vectors we need to be aware of:
x <- c(1, 2, 3)
x <- c("a", "b", "c")
x <- c(TRUE, FALSE, TRUE)
Don’t worry about memorizing everything – we will continue to review them together as they appear in the course. For now, we will be satisfied with simply having made our introductions.
As we mentioned in class, the type of a variable will often dictate what we are able to do with it. For example, it should make sense to take the mean of a numeric variable, but less so with a character vector:
# This will return a warning and NA
z <- c("a", "b", "c")
mean(z)
## Warning in mean.default(z): argument is not numeric or logical: returning NA
## [1] NA
x <- c(1, 2, 3)
mean(x)
## [1] 2
Some situations can be confusing based on appearances. The
class()
function will tell us what type a vector is if we
are not sure
# This looks numeric
z <- c("1", "2", "3")
mean(z)
## Warning in mean.default(z): argument is not numeric or logical: returning NA
## [1] NA
class(z)
## [1] "character"
Usually in situations like this, we can coerce a vector to being a different type
z_num <- as.numeric(z)
class(z_num)
## [1] "numeric"
mean(z_num)
## [1] 2
One important characteristic of a vector to note: all elements of a vector must be the same type. They are either all numeric, all logical, or all character.
Question 3:
Can you think of a situation in which we start with a numeric vector that we would like to coerce to a character vector? Write your answer in the space below:
Most of the data we will be working with in this class will be stored in objects known as data frames. Put simply, a data frame is a collection of vectors that are all the same length.
# Create two vectors of the same length
x <- c("Alice", "Mary", "Bob")
y <- c(1, 4, 10)
# We can create a data frame with columns named `name` and `value`
df <- data.frame(name = x, value = y)
df
## name value
## 1 Alice 1
## 2 Mary 4
## 3 Bob 10
As we can see, this has the format of having rows as observations and
columns being variables. We can select one of the variables from our
data.frame
using $
which will extract the
vector
# Extract the "name" column
df$name
## [1] "Alice" "Mary" "Bob"
# Extract the "value" column and use it to find median
median(df$value)
## [1] 4
While we can create data frames directly, it is far more likely that
we will be reading into R data that has already been created. The most
common way to do so is with the read.csv()
function (CSV
stands for comma separated values). This can be read directly from your
computer OR you can pass in a URL
# Use read.csv to pull Happy Planet data
planet <- read.csv("https://remiller1450.github.io/data/HappyPlanet.csv")
(Images in this next section will appear online but not on your own computer as you do not have these images)
Once a data frame is read in, you will see it show up in your environment. Right away, we get two critical pieces of information: this data frame has 143 observations and 11 variables.
If we click on the blue arrow to the left, we will get a drop down showing us a more detailed description of the variables we do have. Note also the additional information given about the vectors (num = numeric, int = integer, chr = character).
Clicking on the white box pane to the right of planet
brings us to the tabular view in R, giving us the opportunity to
navigate and explore the data more directly
Question 4:
For this question, we will be using the HappyPlanet
data
that we have just read in:
Part A Looking at the Happy Planet data, explain in one or two sentences what consitutes an observation in this dataset
Part B Using $
to extract columns
from the dataset, what is average life expectancy of countries in the
dataset?
Part C Are there any variables in this dataset that are stored as a numeric that would be better suited as a categorical variable? Explain your answer
This concludes our first introduction to R! Once you have finished with your lab, make sure that you are able to successfully Knit to PDF. Once that is finished, we are ready to submit to Gradescope.
If you are in STA-209-03 (1-2:20), the Entry Code is B272WD
If you are in STA-209-05 (2:30-3:50), the Entry Code is B2WB5B