next up previous
Next: About this document

= =0pt

We've studied 8 standard random variables, two discrete and six continuous. Here is a summary:

DISCRETE DISTRIBUTIONS

Binomial: The binomial distribution applies when you make n independent trials, each with a probability p of success and a probability q=1-p of failure. For example, you flip a coin n times, hoping for heads. Or you roll a die n times, hoping to get a 5. Or you interview n people, hoping to find a certain gene, or a supporter of Candidate A. Or your favorite batter comes to the plate n times, hoping each time to get a hit. The probability formula can be written in a few different ways:

tex2html_wrap_inline159

The mean, variance and standard deviation are given by tex2html_wrap_inline161 Var(X)=npq; tex2html_wrap_inline165 but there is no simple formula for the median tex2html_wrap_inline167 .

Note how tex2html_wrap_inline169 and Var(X) scale with n. tex2html_wrap_inline169 grows with n, but the ratio tex2html_wrap_inline179 shrinks with n. If you're gambling on flips a fair coin, the more coins you flip the more you are likely to win or lose, but the less your net winnings will be as a fraction of the total wagered.

Example: If you roll a fair die n=15 times, the probability of getting a six exactly 3 times is tex2html_wrap_inline185 .

Poisson: When n is large and p is small, the binomial distribution can be approximated by the Poisson distribution with tex2html_wrap_inline191 . The Poisson distribution also applies to discrete events that have a large number of independent ways they can occur, each of which is highly unlikely. Thus the number of heart attacks on a given day, the number of people hit by lightening, the number of typos in a book, the number of clicks of a Geiger counter, and the number of lottery winners are all described by the Poisson distribution. To use the Poisson distribution you need to know a single number, the mean tex2html_wrap_inline193 .

The formula for the Poisson distribution is

tex2html_wrap_inline195

The mean, variance and standard deviation are tex2html_wrap_inline197 , tex2html_wrap_inline199 , tex2html_wrap_inline201 . Again, there is no simple formula for the median tex2html_wrap_inline167 . As before, note that as tex2html_wrap_inline193 grows, tex2html_wrap_inline169 grows, but the ratio of tex2html_wrap_inline169 to tex2html_wrap_inline193 shrinks.

Example: If there are an average of 4.2 heart attacks per day in Austin, the probability of there being exactly 3 heart attacks tomorrow is tex2html_wrap_inline213 .

CONTINUOUS DISTRIBUTIONS

There are three kinds of continuous distributions we have studied, the uniform, exponential and normal distributions. In each case, there is a standard version of the distribution, and a general version. If X follows the general distribution, then some function of X follows the standard distribution, so it is possible to convert any problem involving a general distribution into a problem involving the standard distribution. For the uniform and exponential distributions, this usually isn't necessary, as the general distributions aren't too hard to begin with. For the normal distribution this conversion (called "z-scores") is essential.

Standard Uniform: The standard uniform distribution is what you mean when you say ``pick a number at random between 0 and 1''. The probability density function (pdf) is

tex2html_wrap_inline219

while the cumulative distribution function (cdf) is

tex2html_wrap_inline221

The subscript su just stands for ``standard uniform''. It is easy to see that F is the integral of f, and that f is the derivative of F. As with any continuous distribution, the probability of finding X between two numbers a and b is tex2html_wrap_inline239 , and the probability of X;SPMlt;a is F(a). For example, the probability of being between 0.3 and 0.8 is F(0.8)-F(0.3) = 0.8-0.3=0.5, the probability of being between 0.3 and 2.5 is F(2.5)-F(0.3) =1-0.3=0.7, and the probability if being less that 0.3 is F(0.3)=0.3. The mean, variance, standard deviation and median are tex2html_wrap_inline251 The first and third quartiles are tex2html_wrap_inline253 and tex2html_wrap_inline255 =3/4.

General Uniform: The general uniform distribution is what you mean when you say ``pick a number at random between a and b''. The pdf is

tex2html_wrap_inline261

and the cdf is

tex2html_wrap_inline263

In other words, if x is uniform between a and b, then the quantity y=(x-a)/(b-a) is standard uniform. Given a problem involving x we could convert everything to y-scores and use our knowledge of the standard uniform distribution to figure things out. For example, since the mean and median of y are both 1/2, the mean and median of x are both tex2html_wrap_inline281 . To complete the list,

tex2html_wrap_inline283

and the first and third quartiles are tex2html_wrap_inline285 and tex2html_wrap_inline287 . (In practice, there's little use for y-scores, since the general uniform distribution is easy enough to handle without that trick. The term ``y-scores'' is my own invention; you won't find it in any standard text.)

Standard Exponential: The exponential distribution describes the amount of time you wait between completely unpredictable events. Many kinds of hardware (e.g. light bulbs) typically break down because of freak events (e.g. voltage surges), rather than accumulated stress, so the lifetime of such an object is described by an exponential distribution. So is the amount of time you wait between clicks of a Geiger counter, or the time between wins at a slot machine, or the spacing between typos on a printed page. Notice that the number of freak events is described by the Poisson distribution, but the spacing is described by the exponential distribution. The standard exponential distribution has pdf

tex2html_wrap_inline293

and cdf

tex2html_wrap_inline295

The mean, variance, standard deviation and median are

tex2html_wrap_inline297

The first and third quartiles are obtained by setting F(x)=1/4 or 3/4, and the results are tex2html_wrap_inline303 and tex2html_wrap_inline305 .

General Exponential: The general exponential distribution is just like the standard exponential, only stretched out by a factor tex2html_wrap_inline193 . The general exponential distribution has pdf

tex2html_wrap_inline309

and cdf

tex2html_wrap_inline311

In other words, if x is exponential with parameter tex2html_wrap_inline193 , then tex2html_wrap_inline317 is standard exponential. The mean, variance, standard deviation and median of X are

tex2html_wrap_inline321

The first and third quartiles are tex2html_wrap_inline323 and tex2html_wrap_inline325 . For example, if a light bulb has a mean lifetime of tex2html_wrap_inline327 months, then the probability of it lasting between 3 and 6 months is tex2html_wrap_inline329 . This could also have been computed as follows: x being between 3 and 6 is the same thing as y being between 3/5 and 6/5, so the probability is tex2html_wrap_inline335 . As with the uniform distribution, conversion to y-scores isn't necessary, but it's occasionally handy.

Standard Normal: At first glance, the standard normal distribution is crazy. There's a complicated formula for f(x), you can't do the integrals to get F(x), and it's all a big mess. However, we have no choice. Normal distributions appear incredibly frequently in the real world, and we need a way to deal with them. We do this in two steps. First we study the standard normal distribution. We can't compute F(x) in closed form, but we can compute it numerically (using Simpson's rule), and list the results in a table. Armed with our table, we are able to answer questions about the standard normal distribution. The second step is to convert questions about a general normal distribution to questions about the standard normal distribution. This is the method of ``z-scores''.

For the standard normal distribution, the pdf is

tex2html_wrap_inline347

The cdf cannot be written in terms of exponentials, logs, trig functions, and so on, but we still can tabulate it and give it a name, erf(x).

tex2html_wrap_inline351

The mean, variance, standard deviation and median are

tex2html_wrap_inline353

To calculate anything with a normal distribution you need a table of tex2html_wrap_inline355 , i.e. of the erf function. Table III in the book is almost such a table. Table III gives, for z;SPMgt;0, the area under the graph of tex2html_wrap_inline361 between 0 and z. In other words, if we call the entry in the table Table(z),

tex2html_wrap_inline367

when tex2html_wrap_inline369 . The table doesn't list anything for z;SPMlt;0, but we can use the symmetry of the normal distribution, i.e. tex2html_wrap_inline373 , to deduce that tex2html_wrap_inline375 , and therefore that tex2html_wrap_inline377

For example, Table(1.1)=0.3643, so tex2html_wrap_inline379 , while tex2html_wrap_inline381 . Similarly, Table(0.3)=0.1179, so tex2html_wrap_inline383 and tex2html_wrap_inline385 . The probability of being between 0.3 and 1.1 is tex2html_wrap_inline387 , while the probability of being between -0.3 and 1.1 is tex2html_wrap_inline389 .

Some useful rules of thumb can be read off from the table. The probability of -1;SPMlt;x;SPMlt;1 is a little over 68%, or roughly 2/3. The probability of -2;SPMlt;x;SPMlt;2 is a little over 95%, and the probability of -3;SPMlt;x;SPMlt;3 is over 99%. tex2html_wrap_inline397 , the 95th percentile point, is at x=1.65, and the 99th percentile point is tex2html_wrap_inline401 .

General Normal: The pdf for a general normal distribution with parameters tex2html_wrap_inline403 and tex2html_wrap_inline169 is

tex2html_wrap_inline407

The mean is tex2html_wrap_inline403 , the variance is tex2html_wrap_inline411 , the standard deviation is tex2html_wrap_inline169 , and the median is tex2html_wrap_inline403 . The cdf is

tex2html_wrap_inline417

where tex2html_wrap_inline419 . In other words, if x is distributed normally with mean tex2html_wrap_inline403 and standard deviation tex2html_wrap_inline169 , then z is distributed according to the standard normal distribution.

For example, suppose x is distributed normally with mean 3 and standard deviation 5. We want the probability that x is between -1 and 8. Since z=(x-3)/5, ``x=-1'' means the same thing as ``z=-4/5'', while x equalling 8 is the same thing as z equalling 1. Therefore

tex2html_wrap_inline451

The rules of thumb for the standard normal distribution then translate into the following statements: The probability of x being within one standard deviation of the mean is a little over 68%, or roughly 2/3. The probability of x being within two standard deviations of the mean is a little over 95%, and the probability of x being within 3 standard deviations of the mean is over 99%. tex2html_wrap_inline397 , the 95th percentile point, is at tex2html_wrap_inline461 , and the 99th percentile point is tex2html_wrap_inline463 .




next up previous
Next: About this document

Lorenzo Sadun
Fri Nov 6 14:29:08 CST 1998