Last updated: Thu Feb 15 06:56:51 PM 2024
Exam 1 Review Material
The content of the first exam is primarily revolved around the basics
of data: what is it, how do we quantify it, and how can we present it to
others.
Data overview
Statistics is concerned with studying populations and parameters by
collecting samples and computing statistics
An observation is the smallest unit of study within a population.
Attributes of observations are called variables
Variables:
- Quantiative
- Continuous (blood pressure)
- Discrete (number of people)
- Categorical
- Binary (disease status)
- Nominal (favorite color)
- Ordinal (education attained)
In a data frame, rows are observations, columns are variables. The
column associated with an ID or a name is known as an
identifier.
Summaries of Data
A distribution describes what values a variable takes and how
frequently those values occur
Visualization:
- Categorical
- Bar charts
- stacked
- dodged
- conditional/proportion
- Quantitative
- Histogram
- shape
- center
- spread
- skew (what is left/right skew)
- Boxplot
- Can I draw histogram from boxplot?
- Can I draw boxplot from histogram?
- Scatter plots
- form (linear/nonlinear)
- direction
- strength (correlation)
What if I have:
- One categorical
- One quantitative
- Two or more categorical
- Two or more quantitative
- Categorical and quantitative
Numerical Summaries:
- Categorical
- tables
- Frequency tables
- Proportions
- conditional tables
- Quantitative
- Median/mean (centrality)
- Quartiles
- deviance/IQR (spread)
- Correlation
- Pearson (linear)
- Spearman (monotone)
- Regression
- slope and intercept
- \(R^2\)
- Interpret
- Correlation and ‘regression to the mean’
- Categorical variable as predictor
- Robust statistics – insensitive to outliers
Two variables associated if the value of one tells us something about
the value of another
Contingency tables and odds, and odds ratios
Standardized variables, what are they? z scores, what true about
z-scores for correlation to be positive?