= =0pt
We've studied 8 standard random variables, two discrete and six continuous. Here is a summary:
DISCRETE DISTRIBUTIONS
Binomial: The binomial distribution applies when you make n independent trials, each with a probability p of success and a probability q=1-p of failure. For example, you flip a coin n times, hoping for heads. Or you roll a die n times, hoping to get a 5. Or you interview n people, hoping to find a certain gene, or a supporter of Candidate A. Or your favorite batter comes to the plate n times, hoping each time to get a hit. The probability formula can be written in a few different ways:
The mean, variance and standard deviation are given by
Var(X)=npq;
but
there is no simple formula for the median
.
Note how
and Var(X) scale with n.
grows with
n, but the ratio
shrinks with n. If you're gambling
on flips a fair coin, the more coins you flip the more you are likely to win
or lose, but the less your net winnings will be as a fraction of the
total wagered.
Example: If you roll a fair die n=15 times, the probability of getting
a six exactly 3 times is
.
Poisson: When n is large and p is small, the binomial distribution
can be approximated by the Poisson distribution with
.
The Poisson distribution also applies to discrete events that have a large
number of independent ways they can occur, each of which is highly unlikely.
Thus the number of heart attacks on a given day,
the number of people hit by lightening, the number of typos
in a book, the number of clicks of
a Geiger counter, and the number of lottery winners are all described by
the Poisson distribution. To use the Poisson distribution you need to know
a single number, the mean
.
The formula for the Poisson distribution is
The mean, variance and standard deviation are
,
,
. Again, there is no simple formula for the median
. As before, note that as
grows,
grows, but the
ratio of
to
shrinks.
Example: If there are an average of 4.2 heart attacks per day in Austin, the
probability of there being exactly 3 heart attacks tomorrow is
.
CONTINUOUS DISTRIBUTIONS
There are three kinds of continuous distributions we have studied, the uniform, exponential and normal distributions. In each case, there is a standard version of the distribution, and a general version. If X follows the general distribution, then some function of X follows the standard distribution, so it is possible to convert any problem involving a general distribution into a problem involving the standard distribution. For the uniform and exponential distributions, this usually isn't necessary, as the general distributions aren't too hard to begin with. For the normal distribution this conversion (called "z-scores") is essential.
Standard Uniform: The standard uniform distribution is what you mean when you say ``pick a number at random between 0 and 1''. The probability density function (pdf) is
while the cumulative distribution function (cdf) is
The subscript su just stands for ``standard uniform''.
It is easy to see that F is the integral of f, and that f is the
derivative of F. As with any continuous distribution, the probability
of finding X between two numbers a and b is
, and the probability of X;SPMlt;a is F(a).
For example, the probability of being between 0.3 and 0.8 is F(0.8)-F(0.3)
= 0.8-0.3=0.5, the probability of being between 0.3 and 2.5 is F(2.5)-F(0.3)
=1-0.3=0.7, and the probability if being less that 0.3 is F(0.3)=0.3.
The mean, variance, standard deviation and median are
The first and third quartiles are
and
=3/4.
General Uniform: The general uniform distribution is what you mean when you say ``pick a number at random between a and b''. The pdf is
and the cdf is
In other words, if x is
uniform between a and b, then the quantity y=(x-a)/(b-a) is
standard uniform. Given a problem involving x we could convert
everything to y-scores and use our knowledge of the standard uniform
distribution to figure things out. For example, since the mean and
median of y are both 1/2, the mean and median of x are both
. To complete the list,
and the first and third quartiles are
and
.
(In practice, there's little use for y-scores,
since the general uniform distribution is easy enough to handle
without that trick. The term ``y-scores'' is my own invention; you
won't find it in any standard text.)
Standard Exponential: The exponential distribution describes the amount of time you wait between completely unpredictable events. Many kinds of hardware (e.g. light bulbs) typically break down because of freak events (e.g. voltage surges), rather than accumulated stress, so the lifetime of such an object is described by an exponential distribution. So is the amount of time you wait between clicks of a Geiger counter, or the time between wins at a slot machine, or the spacing between typos on a printed page. Notice that the number of freak events is described by the Poisson distribution, but the spacing is described by the exponential distribution. The standard exponential distribution has pdf
and cdf
The mean, variance, standard deviation and median are
The first and third quartiles are obtained by setting F(x)=1/4 or 3/4,
and the results are
and
.
General Exponential: The general exponential distribution is just like
the standard exponential, only stretched out by a factor
.
The general exponential distribution has pdf
and cdf
In other words, if x is exponential with parameter
, then
is standard exponential.
The mean, variance, standard deviation and median of X are
The first and third quartiles are
and
.
For example, if a light bulb has a mean lifetime of
months,
then the probability of it lasting between 3 and 6 months is
. This could also have been computed as follows:
x being between 3 and 6 is the same thing as y being between 3/5
and 6/5, so the probability is
.
As with the uniform distribution, conversion to y-scores isn't
necessary, but it's occasionally handy.
Standard Normal: At first glance, the standard normal distribution is crazy. There's a complicated formula for f(x), you can't do the integrals to get F(x), and it's all a big mess. However, we have no choice. Normal distributions appear incredibly frequently in the real world, and we need a way to deal with them. We do this in two steps. First we study the standard normal distribution. We can't compute F(x) in closed form, but we can compute it numerically (using Simpson's rule), and list the results in a table. Armed with our table, we are able to answer questions about the standard normal distribution. The second step is to convert questions about a general normal distribution to questions about the standard normal distribution. This is the method of ``z-scores''.
For the standard normal distribution, the pdf is
The cdf cannot be written in terms of exponentials, logs, trig functions, and so on, but we still can tabulate it and give it a name, erf(x).
The mean, variance, standard deviation and median are
To calculate anything with a normal distribution you need a table of
,
i.e. of the erf function. Table III in the book is almost such a table.
Table III gives, for z;SPMgt;0, the area under the graph of
between
0 and z. In other words, if we call the entry in the table Table(z),
when
.
The table doesn't list anything for z;SPMlt;0, but we can use the symmetry
of the normal distribution, i.e.
, to deduce that
, and therefore that
For example, Table(1.1)=0.3643, so
,
while
. Similarly, Table(0.3)=0.1179, so
and
. The probability of being
between 0.3 and 1.1 is
, while the probability
of being between -0.3 and 1.1 is
.
Some useful rules of thumb can be read off from the table. The probability
of -1;SPMlt;x;SPMlt;1 is a little over 68%, or roughly 2/3. The probability of
-2;SPMlt;x;SPMlt;2 is a little over 95%, and the probability of -3;SPMlt;x;SPMlt;3 is over
99%.
, the 95th percentile point, is at x=1.65, and the 99th
percentile point is
.
General Normal: The pdf for a general normal distribution with
parameters
and
is
The mean is
, the variance is
, the standard deviation
is
, and the median is
. The cdf is
where
. In other words, if x is distributed normally
with mean
and standard deviation
, then z is distributed
according to the standard normal distribution.
For example, suppose x is distributed normally with mean 3 and standard deviation 5. We want the probability that x is between -1 and 8. Since z=(x-3)/5, ``x=-1'' means the same thing as ``z=-4/5'', while x equalling 8 is the same thing as z equalling 1. Therefore
The rules of thumb for the standard normal distribution then translate into
the following statements:
The probability of x being within one standard deviation of the mean
is a little over 68%, or roughly 2/3. The probability of x being within
two standard deviations of the mean is a little over 95%, and the
probability of x being within 3 standard deviations of the mean is over
99%.
, the 95th percentile point, is at
,
and the 99th percentile point is
.