Your homework should follow the style given in this template. Feel free to download and use this for your assignment.

Textbook Questions

From the Introduction to Modern Statistics Textbook, do the following exercises (you do not need to read anything from the textbook to answer these):

Class Questions

Question 1

For this question, we will be using the iris dataset, giving the measurements, in centimeters, of the variables for sepal and petal length and width

To load this data into R, simply copy and paste the following into your Rmd file in an R code chunk

data(iris)

Use this data to answer the following questions:

  • Part A How many observations and variables are in the iris dataset? In one sentence, briefly describe what constitutes an observation in this data.
  • Part B Create the appropriate plot to visualize the relationship between the variables Sepal.Width and Sepal.Length. Do these two variables appear to be associated? If so, comment on the strength of this association.
  • Part C Create the plot again, this time adding an additional aesthetic for the variable Species. Has anything changed in the association between Sepal.Width and Sepal.Length? Comment on the strength, form, and direction of any associations you see.

Question 2

This question will involve the penguins dataset.

pengy <- read.csv("https://collinn.github.io/data/penguins.csv")

Part A How many observations are included in the penguin dataset? What does each observation represent?

Part B How many of each species of penguin is included in the dataset? Which species has the greatest number of observations?

Part C What type of plot would be most appropriate to summarize the flipper length, measured in millimeters, of the penguins in the dataset? Produce this plot and comment on what you observe.

Part D Observing multiple potential centers in a distribution can often suggest the presence of multiple groups. How many centers do there appear to be? Create different plots with faceting to see if you can determine which “groups” might be present in the distribution of flipper length.

Part E Reproduce the following plot as closely as you can. (The palette for this color scheme is “Set1”)