In the lab, we briefly discussed the topic of magic numbers, that is, numbers that are “hard-coded” into our code rather than a more explicit statement of our intended actions. Magic numbers make our code less robust to accidents or changes, potentially introducing errors as we iteratively change parts of our analysis.
The code below contains several instances of magic numbers:
x <- c(1,2,3, "four", 5, "six", 7, 6, 2)
### START ###
x <- as.numeric(x)
# Get rid of NA values
x <- x[c(1,2,3,5,7,8,9)]
# Create a vector to add to x
y <- rep(1, length = 7)
# Create new data.frame with x, y, and x+y
df <- data.frame(x = x, y = y, z = x + y)
## Only keep values where z > 5 and x <= 7 and grab column "z"
z_new <- df[c(4,5,6), 3]
### END ###
z_new
## [1] 6 8 7
Copy the code from the lines between ### START ###
and
### END ###
into the block below and modify it to remove
the magic numbers with the appropriate expressions. Leave a comment for
each modification explaining why the change is made. The value for
z_new
should remain unchanged:
x <- c(1,2,3, "four", 5, "six", 7, 6, 2)
# Write updated code here
z_new
## [1] 6 8 7
By removing instances of magic numbers, we can be sure that the
“logic” of our operations will stay the same, even if the input changes.
To verify this, copy your updated code again into the block below with
the new input vector x
. Verify that the results make
sense
## "new" vector x
x <- c(3, 7, "four", 2)
# Write same updated code here and verify it works
z_new
## [1] 6 8 7
\(~\)
The data at the URL below contains information from the Ames Assessor’s Office used in computing assessed values for individual residential properties sold in Ames, IA from 2006 to 2010. A detailed description of each of the variables can be found here.
https://remiller1450.github.io/data/AmesHousing.csv
Read this data into R and store it in a data frame named
housing
. Check the class of the variable
MS.SubClass
and compare it with the description given in
the link above. Based on your assessment, should this variable be
coerced to a different type? Briefly explain
# code here
Find the total number of homes in this data set with missing values
for the variable Garage.Type
# code here
Create a subset of the data set containing homes that are not missing
a value for Garage.Type
. What is the mean value of the
variable Garage.Area
for these homes?
# code here
Using the variable Exter.Cond
(exterior condition) and
the full housing dataset (from Part A), create a factor ordered from
“Poor” condition (Po
) to “Excellent” condition
(Ex
) following the order given in the detailed description
for this variable. Use barplot()
and table()
to construct a bar chart showing how many homes are in each category
# code here
\(~\)
The Washington Post maintains a database of fatal shootings by police officers in the line of duty. Details on their methodology can be found here.
The URL below contains data for all individuals entered into the database between 2015 and 2019
https://remiller1450.github.io/data/Police2019.csv
Write code that reads data from the given url and stores it as a
data.frame named police
. Find the average age of
individuals in the data set, removing missing values as necessary
# put code here
Included in this data is an indicator for state. Report the five
states with the largest number of fatal shootings by police. Using magic
numbers is OK here. (Hint: ?sort
).
# put code here
To important variables to consider when analyzing police shootings
are whether or not the suspect was fleeing and if they were considered a
threat, contained in the variables flee
and
threat_level
, respectively. In this final part, we are
going to investigate the relationship between these two variables. We
will do so by taking the following steps:
police
data called police2
where we remove all
observations where threat_level
is equal to
“undetermined”.table()
function to create a table from
police2
where the first variable is
threat_level
and the second variable is a logical vector
indicating if flee
is equal to “Not fleeing” or not. Assign
this table to the variable tab
prop.table()
function, first investigate by
setting the argument margin = 1
, so that the proportions
are computed by row. Assuming that the subject was not fleeing,
does it appear as if the threat level changed the probability of a fatal
shooting? Now investigate with margin = 2
so that the
proportions are computed by column Assuming that the threat level of the
subject was “attack”, does it appear as if the subject fleeing or not
changed the probability of a fatal shooting? Based on this, how would
you describe the relationship between the flee
and
threat_level
variables?# put code here