Due on Gradescope Thursday, Feb 12 at 10pm.
Chapter 4
Write a short sentence indicating that you read the chapter and have internalized as a critical part of your core identity the importance of maintaining a legible style when writing code.
Chapter 9
The data frame economics is included in the
ggplot2 package and contains US economic data provided by
the US Federal Reserve
library(ggplot2)
data(economics)
Using the economics dataset, do the following:
date,
psavert (personal savings rate), and uempmed
(median duration of unemployment)For this question, we need to install that Lahman
package which is a database of Major League Baseball statistics
collected by Sean Lahman from the 1871-2016 seasons. The database
contains several data.frames which can be loaded into our environment
using the data() function.
# install.packages("Lahman")
library(Lahman)
data("Teams")
data("People")
data("Batting")
Part A: Use the group_by() and
summarize() functions to find the total number of home runs
for each player in the Batting data fame. Then, store the
top 30 players (with the most career home runs) in a separate data
frame. Hint: While there are a number of ways to select the top
30 players, the dplyr function slice_head()
might be useful (?slice_head)
Part B: It has been hypothesized in several sports
that an athlete’s
birth month is related to future success in sports. Using your data
from Part A, join the birth month information from the
People data frame. Then create a data visualization
exploring whether birth months appear to be uniformly distributed among
the players. Be sure that every month is represented on your axis, even
if no players have a birthday in that month.
Using the Teams data frame in the Lahman
package, display the top ten teams in terms of “slugging percentage”
(SLG) since 1969.
SLG is computed as the team’s total bases divided by the total “at
bats” (AB in the data set). To find the total number of
bases, you should assign a value of 1 for singles, 2 for doubles, 3 for
triples, and 4 for home runs (that is, the sum of all of these will give
you the total number of bases).
Hint: The variables X2B, X3B, and
HR represent doubles, triples, and home runs, respectively.
There is no variable for singles, but one can be computed using the
variable H which represents the total number of hits. If we
subtract the total number of doubles, triples, and home runs from the
hits, we will be left with the total number of singles.
Sample output of only the first three teams is printed below to help validate your own solutions:
## yearID teamID SLG
## 1 2023 ATL 0.50080
## 2 2019 HOU 0.49546
## 3 2019 MIN 0.49407