Lab 7 – Hypothesis Testing

Introduction

This lab will introduce the idea of hypothesis testing with a particular type of test known as a $t$-test. By t-test, we simply mean performing a hypothesis test in which we construct a test statistic, $t$ that follows a t-distribution.

This lab will introduce two R functions used in hypothesis testing: the t.test() function and chisq.test()

Testing Single Means

In R, we can create perform hypothesis testing and create standard confidence intervals based on the $t$-distribution using the function t.test(). While the t.test() function will take a number of useful arguments, for now we only concern ourselves with three:

x, a numeric vector we wish to construct a confidence interval for
mu, a single number indicating the null hypothesis
conf.level which gives the confidence interval for a specified $\alpha$ (our confidence interval is $1 - \alpha$)

The output of the t.test() function will include a collection of useful information, including the point estimate and confidence interval for the variable. For example, here we use the built-in mtcars dataset along with t.test() to find a 90% confidence interval for average miles per gallon:

t.test(mtcars$mpg, conf.level = 0.9)

## 
##  One Sample t-test
## 
## data:  mtcars$mpg
## t = 18.9, df = 31, p-value <0.0000000000000002
## alternative hypothesis: true mean is not equal to 0
## 90 percent confidence interval:
##  18.284 21.897
## sample estimates:
## mean of x 
##    20.091

From the output, you should note a few things:

At the bottom of the output, we see our point estimate, $\overline{x} = 20.09$
Just above the point estimate, we see our estimate of the 90% confidence interval, here $(18.24, 21.89)$
All of the information above this is related to hypothesis testing. This includes:
- A $t$-statistic, equal to $t = 18.9$
- Degrees of freedom equal to $df = 31$
- A $p$-value of $p < 2 \times 10^{-15}$

However, as we were not conducting any hypothesis test in this particular example, this information can be ignored.

By default, the t.test() function assumes a null hypothesis of $H_0: \mu_0 = 0$. As our t-statistic is computed as

\[ t = \frac{\overline{x} - \mu_0}{\hat{\sigma}/\sqrt{n}} \] this gives rise to the computed statistic $t = 18.9$ that we saw in the output (and was subsequently used to compute the p-value).

To see a more realistic example, suppose we wanted to investigate the hypothesis that the average miles per gallon was equal to 1 mpg. That is, suppose we have $H_0: \mu_0 = 18$. We could evaluate this hypothesis using the t.test function with the mu argument:

t.test(x = mtcars$mpg, 
       mu = 18, 
       conf.level = 0.9)

## 
##  One Sample t-test
## 
## data:  mtcars$mpg
## t = 1.96, df = 31, p-value = 0.059
## alternative hypothesis: true mean is not equal to 18
## 90 percent confidence interval:
##  18.284 21.897
## sample estimates:
## mean of x 
##    20.091

Note that neither our point estimate or our 90% confidence interval changed. What did change, however, was the calculated t-statistic as well as the p-value.

Question 1: Below is code to recreate the two subsets of the hawk data containing only Red-tailed hawks

## RT Hawk data
hawks <- read.csv("https://collinn.github.io/data/hawks.csv")
hawks <- filter(hawks, Species == "RT")

## Subset of RT Hawk data
set.seed(89)
idx <- sample(seq_len(nrow(hawks)), size = 20)
hawks2 <- hawks[idx, ]

Part A: Perform a t-test on both hawks and hawks2 to test the hypothesis that the variable Weight has a mean value of $\mu_0 = 1050$. According to the output, what is the sample mean for weight for each sample? If you were testing at the $\alpha = 0.05$ level, which would you conclude for each test?
Part B: Explain why the conclusions you came to were different in Part A.

Two Sample Tests

The t.test() function can also be used to perform a two-sample $t$-test to compare the difference in means between two groups. Typically the default argument for the null hypothesis, mu = 0, is what we want and we leave it unchanged. There are two ways that the t.test() function is used for a two-sample test, both of which we illustrate here.

Two Sample t-test

The syntax for how the t.test() function is used depends on how the underlying data.frame is formatted. For example, consider two data frames each consisting of observations from two groups, A and B. In the first data.frame, each group has it’s own column, with the values of the observations being the rows:

## Each column is a group
df1 <- data.frame(A = 1:4, 
                  B = 3:6)
df1

##   A B
## 1 1 3
## 2 2 4
## 3 3 5
## 4 4 6

In the second format, one column specifies the name of the group, while a second column records each of the observations. Data.frames formed this way are said to be in long format. In any event, a quick comparison shows that the data included in both is the same:

df2 <- data.frame(Group = rep(c("A", "B"), each = 4), 
                  Val = c(1:4, 3:6))
df2

##   Group Val
## 1     A   1
## 2     A   2
## 3     A   3
## 4     A   4
## 5     B   3
## 6     B   4
## 7     B   5
## 8     B   6

In the first case, where each group has its own column, we again use the $ construction to extract the variables from the data.frame. We perform out test like this:

## Each column is a different group
t.test(df1$A, df1$B)

## 
##  Welch Two Sample t-test
## 
## data:  df1$A and df1$B
## t = -2.19, df = 6, p-value = 0.071
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.23371  0.23371
## sample estimates:
## mean of x mean of y 
##       2.5       4.5

Again, we see a computed t-statistic, a p-value, and a confidence interval for the true difference.

The second instance using a formula syntax, Val ~ Group, with the values we wish to test on the left hand side, and the group membership on the right. Done this way, we need to remember to pass the data.frame in as an argument as well:

t.test(Val ~ Group, data = df2)

## 
##  Welch Two Sample t-test
## 
## data:  Val by Group
## t = -2.19, df = 6, p-value = 0.071
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -4.23371  0.23371
## sample estimates:
## mean in group A mean in group B 
##             2.5             4.5

A quick inspection shows that the results of this are identical to those of the first.

Question 2 For this question, we will be using the sandwich dataset which includes the number of ants found on a sandwich left out at a picnic according to the toppings, bread, and butter that was on it

sandwich <- read.csv("https://collinn.github.io/data/sandwich.csv")

Perform a two sample t-test to determine if the mean value of ants on a sandwich was the same for sandwiches that had butter compared to those that did not

Paired Testing

We can implement paired t-tests in R by simply setting the argument paired = TRUE in the t.test() function. Here, for example, is the data for our French language institute we saw in class:

french <- read.csv("https://collinn.github.io/data/french_institute.csv")

head(french)

##   pre post
## 1  32   34
## 2  31   31
## 3  29   35
## 4  10   16
## 5  30   33
## 6  33   36

Because pre and post test scores are each contained in their own column, we will use the second method of passing in our arguments to the t.test() function:

t.test(french$pre, french$post, paired = TRUE)

## 
##  Paired t-test
## 
## data:  french$pre and french$post
## t = -3.86, df = 19, p-value = 0.001
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -3.8539 -1.1461
## sample estimates:
## mean difference 
##            -2.5

Question 3 Use dplyr and the mutate() function to create a new variable in the french dataset called diff where diff = post - pre. Perform a one-sample t-test on this variable and compare it with the paired output above. How do the t-statistic, degrees of freedom, and p-values compare with what we found above? Explain in 1-2 sentences why this is the case.

$\chi^2$ Tests

This portion of the labs deals with tests of goodness-of-fit and tests of independence for categorical variables. Both tests are performed with the construction of the $\chi^2$ test statistic, computed as

\[ \chi^2 = \sum_{i=1}^k \frac{(\text{Observed}_i - \text{Expected}_i)^2}{\text{Expected}_i} \]

where $i = 1, \dots, k$ iterates through each of the cells of the table and where the expected counts are what we should expect to see under the null hypothesis. The resulting test statistic follows a $\chi^2$ distribution, with degrees of freedom calculated to be:

In the one categorical case, $df = k-1$ where $k$ is the number of unique categories
In the two categorical case, $df = (\text{# rows} - 1) \times (\text{# columns} - 1)$

In both instances, we can compute this test statistic in R using the chisq.test() function which will give us a value for the test statistic, the degrees of freedom, and the resulting p-value under the null hypothesis.

Goodness-of-fit Tests (1 Variable)

For the one categorical variable cases, all we need for the chisq.test() function is a vector including the counts for each of our categories, along with the proportions we would expect to see under the null hypothesis.

For example, consider again our example from class in which we wish to create a standardized exam with 400 questions, each question having multiple choice answers A-E. If we wish to create an exam in which the probability of the answer for each question being equal, we would have a null hypothesis of the form

\[ H_0: p_A = p_B = p_C = p_D = p_E = 0.2 \]

where the expected counts should be (under the null):

A	B	C	D	E
80	80	80	80	80

In class, we considered an example where the observed distribution had the following counts:

A	B	C	D	E
74	90	76	87	73

To conduct a $\chi^2$ test that our observed values where drawn from the null distribution, we would create our vector with these values, along with the associated probabilities:

## Create obs variable with the observed counts
obs <- c(74,90,76,87,73)

## Create probabilities for each under H_0
prob <- c(0.2,0.2,0.2,0.2,0.2)
  
## Pass this into chisq.test, with null probabilities
chisq.test(obs, p = prob)

## 
##  Chi-squared test for given probabilities
## 
## data:  obs
## X-squared = 3.12, df = 4, p-value = 0.54

By default, chisq.test() will assign an equal probability to each of the outcomes. In this case, because our null hypothesis sets each of these probabilities to be equal, we can omit the probability argument when calling the function:

chisq.test(obs)

## 
##  Chi-squared test for given probabilities
## 
## data:  obs
## X-squared = 3.12, df = 4, p-value = 0.54

In each case, we find a test statistic of $\chi^2 = 3.12$ and a $p$-value of $p = 0.54$.

Contrast this with the example we gave for Alameda county, where we were investigating the observed counts of each race/ethnicity in the jury pooled which we compared against proportions estimated from US Census data. Because our null hypothesis in this case no longer assumes that our expected proportions are equal, we will have to pass them in ourselves. The values here are taken from the lecture slides:

jury <- c(780, 117,114,384,58)
probs <- c(0.54, 0.18, 0.12, 0.15, 0.01)
chisq.test(jury, p = probs)

## 
##  Chi-squared test for given probabilities
## 
## data:  jury
## X-squared = 357, df = 4, p-value <0.0000000000000002

Again, we get the value of the $\chi^2$ statistic, along with degrees of freedom and a $p$-value.

Question 4: Just as a normal distribution describes phenomenon in which observations tend to fall symmetrically about the mean, the Poisson distribution is a probability distribution that is used to describe the number of discrete events we should expect to occur within a period of time (i.e., how many students walk into the student union each hour). This question involves assessing whether or not a particular collection of data is consistent with a Poisson distribution.

In 1898, Ladislaus von Bortkiewicz collected data on the number of soldiers in the Prussian army that were kicked to death by mules. The data collected follows 10 corps, each observed for 20 years, resulting in a total of 200 “corp-years”. In 109 corp-years, there were no deaths from mule kicks, in 65 corp-years, there was one death, and so on. Below is a table with all of the observed data

Number of Deaths	Number of Corps-Years
0	109
1	65
2	22
3+	4

If the number of death via mules did match a Poisson distribution, the expected number of corps-years with each death count should have the following proportions:

0 deaths	1 deaths	2 deaths	3+ deaths
0.55	0.33	0.1	0.02

Conduct a $\chi^2$ test to determine whether or not the observed data is consistent with what we should expect to see if death-by-mule-kick truly did follow a Poisson distribution. What $p$-value do you find, and what does this say about your null hypothesis? (note: you may ignore the warning about expected counts)

Tests for independence (2 Variable)

When considering two categorical variables, we are (usually) less interested in assessing goodness-of-fit in the expected counts of two variables and more interested in determining if there is any sort of association between the two variables. In other words, for a single observation, does knowing the value of one categorical variable give us information about the other?

Consider, for example, data on auto accidents in Florida recording the number of fatal and non-fatal fatalities, along with seatbelt use. That data is summarized again here in the following table:

	Fatal	Non-fatal
No Seatbelt	1085	55623
Seatbelt	703	441239

For our $\chi^2$ test, we need only pass a matrix or table of these values and chisq.test() will automatically perform a test of independence:

seatbelt <- matrix(c(1085, 703, 55623, 441239), nrow = 2)
chisq.test(seatbelt) # we do not need to add proportions here

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  seatbelt
## X-squared = 4324, df = 1, p-value <0.0000000000000002

Our p-value here suggests that our observed data is not consistent with the null hypothesis that the variables are independent, leading us to conclude that seat-belt adherence and fatalities are associated with.

Most frequently, we will be passing a table into chisq.test(). For example, consider a table from the titanic dataset tabulating the number of survivors according to class:

## Create titanic data frame
titanic <- as.data.frame(Titanic)
titanic <- titanic[rep(1:nrow(titanic), times = titanic$Freq), ]

## Create table with survival status based on Class
titanic_table <- with(titanic, table(Class, Survived))

## Print out the table
titanic_table

##       Survived
## Class   No Yes
##   1st  122 203
##   2nd  167 118
##   3rd  528 178
##   Crew 673 212

We can then pass this directly into chisq.test() to test for association

chisq.test(titanic_table)

## 
##  Pearson's Chi-squared test
## 
## data:  titanic_table
## X-squared = 190, df = 3, p-value <0.0000000000000002

Question 5: This question will use the college dataset loaded below

college <- read.csv("https://collinn.github.io/data/college2019.csv")

Part A: Create a proportional bar chart between the variables for Region and Type Does it appear that there is an association between these two variables? In other words, does knowing anything about the region tell you anything about the proportion of Public or Private colleges?
Part B: Using the table() function, create a 2-way table of these two variables. Save this table to a new variable called tab (Using the table function to create the two-way table will be a useful way to test the association of categorical variables in your final projects.)
Part C: Conduct a $\chi^2$ test on the observed frequencies that you found in the table in Part B. Based on this, what would you conclude about there being an association between the variables for Region and Private?