Your homework should follow the style given in this template. Feel free to download and use this for your assignment.
From the Introduction to Modern Statistics Textbook, do the following exercises (you do not need to read anything from the textbook to answer these):
For this question, we will be using the iris dataset,
giving the measurements, in centimeters, of the variables for sepal and
petal length and width
To load this data into R, simply copy and paste the following into your Rmd file in an R code chunk
data(iris)
Use this data to answer the following questions:
iris dataset? In one sentence, briefly describe what
constitutes an observation in this data.Sepal.Width and
Sepal.Length. Do these two variables appear to be
associated? If so, comment on the strength of this
association.Species. Has anything
changed in the association between Sepal.Width and
Sepal.Length? Comment on the strength,
form, and direction of any
associations you see.This question will involve the penguins dataset.
pengy <- read.csv("https://collinn.github.io/data/penguins.csv")
Part A How many observations are included in the penguin dataset? What does each observation represent?
Part B How many of each species of penguin is included in the dataset? Which species has the greatest number of observations?
Part C What type of plot would be most appropriate to summarize the flipper length, measured in millimeters, of the penguins in the dataset? Produce this plot and comment on what you observe.
Part D Observing multiple potential centers in a distribution can often suggest the presence of multiple groups. How many centers do there appear to be? Create different plots with faceting to see if you can determine which “groups” might be present in the distribution of flipper length.
Part E Reproduce the following plot as closely as you can. (The palette for this color scheme is “Set1”)