Type the following:
I will not copy and paste anything from this homework assignment page into my own solutions because if I do there is a high probability that my R Markdown file will not render correctly
Below is the equation used to construct confidence intervals using critical values:
\[ \overline{x} \pm C \times \frac{\hat{\sigma}}{\sqrt{n}} \] Explain what impact each term has on the size and location of confidence intervals and explain how the sample size impacts the critical values for a given confidence interval.
Which of these would have a larger impact on the size of a 95%
confidence interval when \(\hat{\sigma} =
5\) and \(n = 20\). Use
qt()
to find the correct critical values
As we have seen in class, a \(z\)-statistic tells gives us a measurement of how an individual observation compares to the mean in units of standard deviation. Suppose that we collect two samples from a population where we know with certainty that \(\mu = 50\) and \(\sigma = 3\). Suppose we then collect two different samples of different sizes and find the following estimates of the mean:
Consider the distribution of \(\overline{x}_1\) and \(\overline{x}_2\). Use this information to answer the following:
Part A In terms of absolute distance, which of these samples has a sample mean that is further from the true mean?
Part B Find the z-score for each of these sample statistics. Based on this, which sample mean appears to be the greater number of standard deviations away from the mean?
Part C When talking about distance from the mean (i.e., which is “furthest”), do we care if our statistic is positive or negative? Why or why not.
Part D Reconcile the differences you found between Part A and Part B. Why does the mean from Sample 2 appear to be further from \(\mu\) than Sample 1, despite \(\overline{x}_2\) being closer than \(\overline{x}_1\)?
As we saw in class, the Central Limit Theorem tells us that, for a population with mean \(\mu\) and standard deviation \(\sigma\), the sample mean will follow an approximately normal distribution with
\[ \overline{x} \sim N \left( \mu, \frac{\sigma}{\sqrt{n}} \right) \] In fact, the CLT also applies to our estimates of proportions which, as can be seen, are a type of mean. For example, consider ten coin flips in which four of them result in heads. We understand this proportion to be
\[ \hat{p} = \frac{4}{10} = 0.4 \] If we were to record these values as 1s and 0s instead and take the mean, we would find the same
## Ten flips, four heads
flips <- c(0, 1, 1, 0, 0, 0, 1, 0, 0, 1)
mean(flips)
## [1] 0.4
In fact, there is a special formula for the distribution of a proportion, \(p\), based on the calculation for its variance. It can be shown that, for “large enough” \(n\), the CLT tells us that the distribution of a proportion is given as
\[ \hat{p} \sim N \left( p, \ \ \sqrt{\frac{p(1-p)}{n}} \right) \] Given a sample proportion, we can use the CLT to construct a confidence interval for the proportion, just as we did for the sample mean. Use this information to answer the following problem:
In a study conducted by Johns Hopkins University researchers investigated the survival of babies born prematurely. They searched their hospital’s medical records and found that of 39 babies born at 25 weeks gestation (15 weeks early), 31 of these babies went on to survive at least 6 months.
Part A Using a normal approximation, construct a 95% confidence interval to estimate the true proportion of babies born at 25 weeks that are expected to survive at least 6 months.
Part B An article on Wikipedia suggests that of babies born at 25 weeks, 72% are expected to survive at least 6 months. Is this estimate consistent with what we found in Part A?