4 Introduction to Probability and Simulation

“All life is an experiment. The more experiments you make, the better.”

— Ralph Waldo Emerson

Learning objectives

Conceptual understanding of randomness and simulation
Learn the scientific definition of probability
Methods for calculating probabilities

4.1 Randomness and Simulation

In this section, we will work to both understand randomness and how it can be used with a computer to quickly simulate an outcome. To do this, we will start with a game in which two dice are rolled. Because we cannot predict the outcome of a particular roll with certainty, rolling dice is an example of a random process. In contrast, a process in which the outcome can be determined beforehand is known as deterministic. An example of a deterministic process would be computing the area of a given triangle from its base and height – no matter how many times this process is repeated, the area is always \(\frac{1}{2} * base * height\).

In our dice game, we are interested in the sum total of the roll. For example, if a 2 and a 6 are rolled, the sum total is 8. Each die has 6 outcomes, so there are 36 possible ways to roll the two die:

(1,1)	(1,2)	(1,3)	(1,4)	(1,5)	(1,6)
(2,1)	(2,2)	(2,3)	(2,4)	(2,5)	(2,6)
(3,1)	(3,2)	(3,3)	(3,4)	(3,5)	(3,6)
(4,1)	(4,2)	(4,3)	(4,4)	(4,5)	(4,6)
(5,1)	(5,2)	(5,3)	(5,4)	(5,5)	(5,6)
(6,1)	(6,2)	(6,3)	(6,4)	(6,5)	(6,6)

Exercise 4.1

There are 11 possible sum totals you can achieve when rolling two six-sided dies, {2, 3, 4, …, 11, 12}. In this game, you win if your roll adds up to 7.

Out of the 36 possible rolls listed above, how many rolls result in a win? What does this tell you about the chance of winning the game?
Using the applet, play the game 30 times and fill out the table below with whether you won and rolled a 7 (W) or lost and rolled any other total (L).

Game	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
Result
Game	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30
Result

What proportion of times did you win the game? Was this close to what you expected? Why or why not?

While a relatively simple example, this game helps us to illustrate two important concepts. First, without having rolled a single die, we are still able to determine how likely we are to win the game. This was done by first enumerating the total number of possible outcomes and then counting what proportion of these outcomes resulted in a win. The collection of all possible outcomes of a random process is called the sample space, while any particular outcome is known as an event. An event can be a single outcome, or even a collection of outcomes, but it will always be contained within the sample space.

A more subtle observation is that in playing our game, we have not physically rolled any dice ourselves. Instead, we have used the computer to perform a simulation, a powerful method that quickly and consistently enables us to repeat a random process. Simulations are made up of two components:

The conditions and behavior of the experiment are determined in advance through the simulation parameters.
The experiment is performed with the aid of a computer.

In the dice rolling simulation, the parameters specified that each side of the die were equally likely and that two dies were rolled each time the game was played. These parameters were specified internally, such that the user cannot alter the specifications. Later in this chapter, we will work with simulations where the parameters can be changed. Lastly, the use of the computer may seem trivial, but it has two important implications:we were able to exactly specify the simulation parameters, and we are able to repeat this exact same experiment knowing all conditions will remain exactly the same.

Definition 4.1

Random Process: An act or process that results in an outcome that cannot be predicted with certainty

Deterministic Process: An act or process that results in an outcome that is not random

Sample Space: The set of all possible outcomes from a random process

Event: An outcome or collection of outcomes from a random process

Simulation: A tool to replicate random processes an arbitrary number of times

Simulation Parameters: Values which change the behavior of the simulation

4.2 Probability

Statistical inference is founded in the ability to quantify uncertainty. Probability is the mechanism that allows us to do so, by telling us how likely something is to happen. People talk loosely about probability all the time. For example, “What’s the chance of rain tomorrow?” or “How likely is it that drug A is better than drug B?” In order for probability to be used for statistical inference, we must be precise about with our definition.

We would all agree that the probability of heads when flipping a fair coin is 50% and the probability of rolling a 2 on a 6-sided die is 1/6, but why is that true? Well, if we were to flip a coin many, many times, we would expect half of the flips to result in heads. Similarly, if we roll a 6-sided die over and over, 1/6 of the rolls should result in a value of 2. The idea is that when we talk about probability, we are thinking about a long-run frequency or what will happen if the random process is repeated over and over again under the same conditions. If we think about probabilities in terms of the long-run frequencies, we can define and quantify probability as the fraction of times an event occurs if a random process is repeated indefinitely. This means that probabilities are always between 0 and 1, since we can never observe greater or fewer events than the number of times the process is repeated, e.g. we can never observed 12 heads on 10 coin flips.

This leads us to some important properties of probabilities:

Properties of Probability

The sum of probabilities for all outcomes in the sample space, \(\mathcal{S}\), must equal 1
For any event, the probability of that event is the sum of the probabilities for all the outcomes in that event

Consider flipping a fair coin three times. Each act of flipping the coin is random process – the coin might land on heads, and it might land on tails. Letting \(H\) be shorthand for the flip resulting in heads and \(T\) be shorthand for the flip resulting in tails, the sample space can be enumerated as

\[\mathcal{S} = \{HHH, HHT, HTH, THH, TTH, THT, HTT, TTT\},\]

giving us eight possible results from the three coin flips. As each outcome is equally likely (why?), the first property tells us that the probability of any specific outcome (say, \(HHH\)) is 1/8. The second tells us that the probability of heads on the first toss is \(4/8 = 1/2\), since four of the eight possible outcomes have heads on the first toss (\(\{HHH, HHT, HTH, HTT\}\) ). These properties underlie a lot of the more complicated formulas and concepts we will cover in this chapter, although we don’t always think about them explicitly.

Exercise 4.2

We will now investigate the long-run frequency definition of probability using an applet which simulates coin flipping. Internally, the simulation parameters specify that each coin flip has a 50% chance of heads. As a user, you are able to determine how many coin flips you would like to perform. The simulation results are summarized in three figures:

(Top right) the flip results are shown with a blue “T” indicating the flip resulted in tails and a pink “H” indicating the flip resulted in heads.
(Bottom left) the total number of heads and tails across all flips is tallied.
(Bottom right) the running total of the proportion of heads is plotted. For example, if we flip a coin three times and observe THH, the running proportion of heads is 0/1=0 after the first flip (T), 1/2=0.5 after the second flip (TH), and 2/3=0.67 after the third flip (THH). The dotted red line on this plot falls at 0.5, which translates to half of the flips resulting in heads.

Set the number of flips to 10 and click the "Flip Coins" button 15 times. Each time you perform an experiment, record the number of heads you observed in the following table:

Trial

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

# Heads
1. On any given experiment where the coin is flipped 10 times, how many flips would you expect to result in heads? Why?
2. How many times did you see exactly four heads? What about at least two heads?
Now set the number of flips to 100 and click the "Flip Coins" button.
1. What happens to the running proportion of heads as the coin is continually flipped?
2. Focusing your attention on the plot of the running proportion of heads, click the "Flip Coins" button several times. What do you notice about the plot? What characteristics of the plot stay the same and what differ?
For each of the following number of coin flips, perform one experiment and record the final proportion of heads you observed (e.g., 12 heads out of 20 flips = 12/20 =0.6).

3: 5: 10:

20: 100: 500:

1,000: 5,000: 10,000:
1. What happens to the proportion of heads you observed as the number of flips increased? How does this relate to the concept of long-run frequency?
2. How does the final proportion of heads observed relate to the probability of observing heads?

Definition 4.2

Probability: The fraction of times an event occurs when a random process is repeated indefinitely

4.3 Methods for Computing Probabilities

There are several methods that can be used to compute probabilities. We will introduce these methods in the context of coin flipping and seek to answer the question: if a coin is flipped three times, what is the probability that exactly two of the coins will be heads?

Enumeration Method: The enumeration method proceeds by listing all of the possible outcomes of the experiment and counting the total number of ways the event can be observed. Then, the probability is calculated as:

\[\text{Probability} = \dfrac{\text{Number of ways event can occur}}{\text{Total number of outcomes}}\] In our coin flipping example, we know the sample space includes eight outcomes:

\[ \mathcal{S} = \{HHH, HHT, HTH, HTT, THH, TTH, THT, TTT\} \]

Next, we count how many of those have exactly two heads:

\[ \mathcal{S} = \{HHH, \color{red}{HHT}, \color{red}{HTH}, \color{red}{THH}, TTH, THT, HTT, TTT\} \]

Dividing the events of interest by the total number of outcomes, we see that the probability of getting exactly two heads is \(P(\text{# Heads = 2}) = \frac{\text{# Heads = 2}}{\text{# Possible Outcomes}} = \frac38\).
That is, by enumerating the eight possible outcomes, we identified three of them in which the number of heads was two.

Probability function method: Often, a random process has an associated probability function that allows us to determine the probability of a set of outcomes. In this case, our coin flipping example follows what is known as a binomial distribution, which has the probability function:

\[ P(\text{# Heads = k}) = f(k; n,p) = \binom{n}{k}p^k(1-p)^{n-k}. \] For our experiment, the probability of heads is \(p = 0.5\), the total number of flips is \(n = 3\), and for our event, \(k = 2\) heads. Substituting these numbers, we have”
\[ \begin{align*} P(\text{# Heads = 2}) &= \binom{3}{2} (0.5)^2 (0.5)^{3-2} \\ &= 3 \times (0.5)(0.5)(0.5) \\ &= 0.375. \end{align*} \] We see, then, that a probability function is a function that takes an event as an argument and returns an associated probability. Just as in the enumeration method, we find that \(P(\text{# Heads = 2}) = 0.375 = \frac38\).
Simulation method: The previous methods are valid, but can quickly become impractical or even impossible once the problem grows becomes more complex. Even for an example as simple as coin flipping, the enumeration method can quickly become unfeasible. As the number of flips increases, the size of the sample space increases exponentially, making it challenging (or lengthy) to count all possible outcomes and all the ways the event can occur. If we seek to use the probability function method, we must know what the function is; for complex processes, these functions can be nearly impossible to construct, leaving statisticians and researchers with no clear way to identify the probability of events.

The simulation method allows us to mitigate both problems by using the computer to repeatedly perform an experiment (say 10,000 times), and computing the proportion of experiments in which the event was observed. We have previously defined probability as the fraction of times the event occurs if the random process is repeated many times. This is exactly what the simulation method does.

Mathematically, this looks like

\[ P(\# \text{ Heads} = 2) \approx \frac{\text{Number of times \{# Heads = 2\} } }{\text{# Experiments}} \] A careful reader might note that for a given random experiment (such as flipping a coin three times) and a specified event (exactly two heads), there is only a single correct answer to the question, “What is the probability of getting exactly two heads when flipping a coin three times?”, and this single correct answer is precisely what was found using the enumeration and probability function methods described above. By contrast, the simulation method provides an approximation to the probability as opposed to an exact calculation. However, as the number of repeated experiments increases, the simulation method will give a closer and closer approximation to the true answer.

Exercise 4.3

Definition 4.3

Enumeration method: A method to exactly compute probabilities by dividing the number of ways an event can occur by the total number of possible outcomes

Probability function method: A method to exactly compute probabilities by using a known function

Simulation method: A method to approximately compute probabilities by repeatedly simulating experiments and taking the proportion of simulations where the event occurs

Trial	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
# Heads

3:	5:	10:
20:	100:	500:
1,000:	5,000:	10,000: