Probability header

home button

We live in a stochastic (probability-driven) universe

The existence of gambling for many centuries is evidence of long-running interest in probability. But a good understanding of probability transcends mere gambling. The mathematics of probability are very important for understanding all kinds of important topics.

In this section we will consider probability for discrete random variables. Discrete in this sense means that a variable can take on one of only a few specific values. A good example is a coin. When laying flat, only one side can possibly be showing at a time. Another is a die (singular of dice), which can show numbers 1-6 only, and only one of those at a time. In the section on continuous probability we'll consider continuous random variables, but we're not there yet.


Why is probability important?

Our universe is driven mostly by random events, so it's very important to understand randomness and the probability of any event occuring in such a universe. Here are a few examples of where you might need to understand probability, but there are many, many others.

1. Experimental science

Any experimental measurement, no matter how carefully performed, is affected by random errors or “noise.” Shown at right is data from a high resolution far-infrared spectroscopy experiment. The "peak" represents absorption of a very small amount of far-infrared light by the C3 molecule. You can easily see the noise (roughness) in the signal. Random errors follow the laws of probability, which form the basis of how we estimate the effect of those errors on our results.

For example, we might measure a length and report it as 3.45 ± 0.03 meters, where the 0.03 is a measure of the “average” random error present in the measurement. Just how we estimate that average error comes from a study of probability.

APJ data figure

Source: Giesen et al., Astrophys. J., 551:L181-L184 (2001)

2. Chemistry & Physics

Reaction prob. figure

Whether a chemical reaction takes place depends on a number of factors, like whether reactants collide (necessary for a reaction to occur), with what kinetic energy they collide, and in what orientation they collide (see the illustration). Because in any ensemble (group) of reacting molecules there will be a wide and randomly-occurring range of speeds, paths and orientations, these processes are best understood using the laws of probability.

There is a whole field in physics/chemistry called statistical mechanics, based on probability theory, that derives the laws of thermodynamics from a study of the behavior of large ensembles of atoms & molecules.

3. Medicine

The laws of probability are crucial in the medical sciences. Among other things, they are important for developing effective tests for diseases and in testing for the presence of drugs and other substances.

In testing the effectiveness of drugs, researchers must carefully employ the laws of probability and statistics. There are legendary cases of reliance upon a drug to treat some disease which later was proven to be completely ineffectual by careful probability-based analysis.

Much of the work of public health professionals is backed up by a solid knowledge of probability to prove or disprove cause-and-effect relationships.


Image: Wikipedia Commons

A discrete random variable is one that can only take on one of a set of specific values at a time.

  • A die, for example, can land (randomly) showing the numbers 1, 2, 3, 4, 5 or 6 only
  • A coin can land with only one of two sides showing at a time.

Discrete Probability

Discrete events are those with a finite number of outcomes, e.g. tossing dice or coins. For example, when we flip a coin, there are only two possible outcomes: heads or tails. When we roll a six-sided die, we can only obtain one of six possible outcomes, 1, 2, 3, 4, 5, or 6. Discrete probabilities are simpler to understand than continuous probabilities, so that's where we'll begin.

Let's look at flipping a coin first. The probability, we'll call it P, of obtaining an outcome (heads or tails) is 1 chance in 2, or 1:2, or just ½.

The possible elementary outcomes of our experiment (coin flipping) form a set {H, T}. If we call P the probability function and H and T the two possible outcomes, then P(H) = ½, and P(T) = ½. When we flip a coin, we have to get either H or T, so the total probability is 1. Here, of course, we need to say that we're ruling out the unlikely event that the coin will land in such a way that it sticks on its edge. When we flip a coin, we make the reasonable assumption that there are only two possible outcomes, and the one we get can only be one of those, or ½ of the total.

Washington quarter

So the probability of obtaining either outcome H or outcome T from our experiment (flipping the coin) can be written:

P(H) + P(T) = 1

In other words, the sum of all possible discrete outcomes is one. Note that this is only true when outcomes H and T are mutually exclusive, i.e. when they can't occur at the same time. The story would be different if we could get heads and tails at the same time. (We disregard the very unlikely event that the coin lands on its edge.)

The sum of all possible discrete outcomes of a probability experiment is one.

Two important rules of probability

We can write down two important rules of probability now ( → ).

The first says that or an outcome of a probability experiment to be defined, it must have a finite positive probability. If the probability is zero, it can never happen and we don't have to worry about it, and negative probabilities don't make any sense.

If you haven't seen summation notation before, rule (2) translates like this: the sum of all n outcomes (from one to n, labeled with the index i) is one. It means that for a particular experiment, like flipping a coin, the sum of the probabilities of all outcomes (½ for heads, ½ for tails) must equal 1. It's another way of saying that something has to happen.

Probability rules 1 & 2

Now let's take a look at something more complicated, rolling dice ...

Let's now assume that we have “distinguishable dice,” one white and one black, so that there is a difference between a one on the first and a two on the second (12) and a two on the first and a one on the second (21).

Then let's define an event as some possible outcome or set of outcomes. Here are some examples of how we might define a few events for two dice:



Elementary outcomes



Dice add to three

{12, 21}



Dice add to six

[15, 51, 24, 42, 33}



White die = 1

{11, 12, 13, 14, 15, 16}



Black die = 5

{11, 21, 31, 41, 51, 61}


Events can be combined, and we'll need some notation to represent the combinations.

Combination of events A & B



A and B

Both A and B occur

A ∩ B

A or B

Either A or B occurs, or both

A ∪ B

not A

Event A does not occur

!A (read “not A”)

We'll make a lot of use of the notation !A, which means "not A" or "A didn't occur." Event A can either occur or not, so P(A) + P(!A) = 1. If we rearrange that to P(A) = 1 - P(!A), we call it the law of subtraction of probabilities.

The sum of the of the probabilities of an event (A) occuring and not occuring is one:

P(A) + P(!A) = 1

Here's a concrete example of combining events. Let A = {1 on the white die} and B = {1 on the black die}. Here's a figure that shows the situation:

Dice 1 and 1
In the top row, all of the gray dice show 1, and in the left column, all of the white dice show 1. The probability that the white die will show 1 is ⅙, and the probability that the left die will show 1 is ⅙. The probability that they both come up 1 together is the product, ⅙ x ⅙ = 1/36.

What is the probability that at least one 1 will be rolled on either die? It's easy to count the possibilities, but we need to take care not to double count the {1,1} event (yellow box). There are six ways to roll a one on the gray die (event A) and six ways on the white die (event B), but only eleven possibilities for rolling at least one 1. We write it like this:

P(A ∪ B) = P(A) + P(B) - P(A ∩ B),

where P(A ∩ B) is the probability that 1's are rolled on both dice, 1/36. So P(A ∪ B) = 11/36.

The situation is different if we define our events differently. Let A = {two dice sum to 3} and B = {two dice sum to 5}. Here's a diagram of the possibilities:

Dice figure 3 and 5

Now the probability that either A or B occurs is still P(A ∪ B) = P(A) + P(B) - P(A ∩ B), except this time P(A ∩ B) = 0 because there is no overlap between the two events. These are mutually exclusive events. If a three is rolled, a five cannot possibly be rolled, and vice-versa. Some graphical set interpretations of the A ∪ B events when A and B are and are not mutually inclusive are shown below.

Prob Venn 1

Conditional probability

Conditional probability is one of the most important concepts in probability. It lets us take some fairly simple data and extract a lot of meaningful information about cause & effect, and about risk. Conditional probabilities arise in many important ways. Consider these questions: What is the probability that a smoker will develop lung cancer? What is the probability that a non-smoker will develop lung cancer? How much more likely is a smoker to develop cancer than a non-smoker? The first two questions are conditional probabilities. A person is either a smoker or a non-smoker (with some probability) and each then has a certain probability of developing lung cancer. The third question involves taking a simple ratio, but gives the most valuable information.

Conditional probability is the probability that a second event will occur, provided that a first event has already occurred. Think about this experiment: What is the probability that the sum of two dice will be three?

We can run this experiment in two ways:

Method 1—throw distinguishable dice at the same time, and let A = {12} and B = {21}. P(A ∪ B) will be the sum of all of the probabilities of getting a sum of three.

Because A and B are mutually exclusive, P(A ∪ B) = P(A) + P(B) = 1/18.

Method 2—throw one die first, and suppose it comes up 1. Now what is P(A ∪ B)?

Now P(A ∪ B) is ⅙. This may seem trivial right now, but hang on ... it will become very important.

We will use this notation to denote conditional probabilities:


We define conditional probability that event B occurs after A as the probability that A and B occur together divided by the probability that event A occurs at all (right).

This definition reduces to what we call the multiplication rule of probability.

Think about this for a bit and convince yourself that it gets P(A|A) = 1 right, and also predicts that P(B|A) = 0 if A and B are mutually exclusive events.

Conditional prob. definition

Independence of events

Events may or may not be independent. We might want to know whether the occurrence of one event affects the occurrence of another. Two events, A and C, are independent if the occurrence of one does not affect on the probability of occurrence of the other. That means

P(A)   =   P(A|C) or P(C)   =   P(C|A).

Here are two examples of a two-dice experiment to illustrate how we can check for independence:

Probability Indepencence

Bayes' Rule

Bayes' rule is the root of so-called Bayesian probability. It's really just a rearangement of the multiplication rule we developed above. If we know that P(A ∩ B) = P(B ∩ A), then we have

P(A ∩ B) = P(A|B)P(B) = P(B|A)P(A).

The result is a valuable link between P(A|B) and P(B|A) called Bayes' rule:

Bayes' rule links conditional probabilities that go in "opposite directions". We will make a lot of use of it as we go on. To see how Baye's rule can clarify problems in probability, take a look at the example below and its solution.

Bayes rule

Example 1 – Drug testing



Suppose that a drug test is 99% sensitive (i.e. it will correctly identify a drug user 99% of the time) and 99% specific (i.e. it will correctly identify a nonuser as testing negative 99% of the time). This seems like a pretty reliable test. Assume a test group in which only 0.5% of members are drug users. Find the probability that, given a positive test, the subject is actually a drug user.


The first thing to do in a problem like this is to set up some variables so we can track what's going on. We'll let D and !D stand for "user" and "non-user," and we'll let "+" and "-" (some would use !+, and that's OK, too) stand for a positive and negative test result, respectively. Here's the list:

In this problem, we have a drug test that correctly identifies a user 99% of the time. That's a conditional probability: If a person uses (D), s/he will be caught 99% of the time, so we have P(+|D) = 0.99.

Likewise, we are given that if a person does not use (!D), then the probability of getting a negative test is also 99%. That translates to the conditional probability expression P(-|!D) = 0.99. Here again ar the conditional probabilities we know:

Finally, we assume that of all people, 0.5% are drug users, so that's P(D) = 0.005. Notice that that fact also gives us P(!D) = 0.995. That is,

Now what we are asking in this problem is: If a person gets a positive test, what is the probability that s/he is actually a user. If you think about it, that's really the most important question about such a test. We don't want to go around making a lot of mistaken accusations. The conditional probability we're looking for is P(D|+), and it's defined like this:

We already know the numerator because we organized ourselves at the beginning. The tricky part is the denominator: What is the probability of getting a positive test, P(+), at all ? Well, that probability is the probability that all users get positive tests plus the probability that all nonusers get positive tests, like this:

We don't know what's on the right side of that equal sign until we expand those "and" expressions with Bayes' rule:

So now we have P(+) and now it's possible to step back to the P(D|+) expression to calculate

so that's a pretty remarkable result. Even though this test seems very accurate – 99% accurate at identifying users and non-users, the probability that someone who receives a positive test result is actually a drug user is only about 1/3! There are two reasons for this:

(1) We don't get to know ahead of time who is a user and who is a non-user, so the 99% accuracies don't really help us there, and

(2) Only 0.5% of all people are actually users, so any number of false-positive tests can make a big difference in our overall accuracy. The 1% of all people left over from 99% accuracy numbers can be large compared to 0.5% of the population.

You can download a .pdf copy of this solution here:

Finally, it's often useful to write out a scenario like this in a tree diagram that shows all of the probabilities. You might even find that solving these problems is easier if you just write out the whole tree. Here's the tree for this scenario:

Tree diagrams

Sometimes it is helpful to arrange a scenario like the one in that last problem in a tree diagram like this. If every member of some population can fall into category A or !A (e.g. has a disease or doesn't), and we let + and !+ be positive and negative tests for the drug, then we can, in principle, calculate the probabilities of each branch of the tree. At each step, the probabilities should sum to one.

Probability tree

Creative Commons License   optimized for firefox by Dr. Jeff Cruzan is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. © 2012, Jeff Cruzan. All text and images on this website not specifically attributed to another source were created by me and I reserve all rights as to their use. Any opinions expressed on this website are entirely mine, and do not necessarily reflect the views of any of my employers. Please feel free to send any questions or comments to