Last updated: Thu Feb 15 06:56:51 PM 2024

Exam 1 Review Material

The content of the first exam is primarily revolved around the basics of data: what is it, how do we quantify it, and how can we present it to others.

Data overview

Statistics is concerned with studying populations and parameters by collecting samples and computing statistics

An observation is the smallest unit of study within a population. Attributes of observations are called variables

Variables:

  • Quantiative
    • Continuous (blood pressure)
    • Discrete (number of people)
  • Categorical
    • Binary (disease status)
    • Nominal (favorite color)
    • Ordinal (education attained)

In a data frame, rows are observations, columns are variables. The column associated with an ID or a name is known as an identifier.

Summaries of Data

A distribution describes what values a variable takes and how frequently those values occur

Visualization:

  • Categorical
    • Bar charts
      • stacked
      • dodged
      • conditional/proportion
  • Quantitative
    • Histogram
      • shape
      • center
      • spread
      • skew (what is left/right skew)
    • Boxplot
      • Can I draw histogram from boxplot?
      • Can I draw boxplot from histogram?
    • Scatter plots
      • form (linear/nonlinear)
      • direction
      • strength (correlation)

What if I have:

  • One categorical
  • One quantitative
  • Two or more categorical
  • Two or more quantitative
  • Categorical and quantitative

Numerical Summaries:

  • Categorical
    • tables
      • Frequency tables
      • Proportions
    • conditional tables
  • Quantitative
    • Median/mean (centrality)
    • Quartiles
    • deviance/IQR (spread)
    • Correlation
      • Pearson (linear)
      • Spearman (monotone)
    • Regression
      • slope and intercept
      • \(R^2\)
      • Interpret
      • Correlation and ‘regression to the mean’
      • Categorical variable as predictor
    • Robust statistics – insensitive to outliers

Two variables associated if the value of one tells us something about the value of another

Contingency tables and odds, and odds ratios

Standardized variables, what are they? z scores, what true about z-scores for correlation to be positive?