Sunday, December 31, 2006

Mesmerism

Dr Anton Mesmer believed in the power of "animal magnetism", as a universal force that could cure illness. His patients would gather around a large enclosed wooden tub with iron wands protruding out of it. The patients would hold the wands and touch them to their afflicted parts. Mesmer indicated that the animal magnetism would flow from the tub through the wands to the patient. Patients often went into hysterics and thought they were cured. Mesmer became well known in the upper echelon in Paris and attracted the attention of King Louis XVI. The Parisian medical establishment was not pleased to see many of their highest paying patients leaving them to seek the excitement of Mesmer's salon. Thus, the King appointed a Royal Commission to study Mesmer’s theory.

The King chose the best scientific minds of the day to examine Mesmerism. Benjamin Franklin was chosen as chair the committee. Other well-respected members included Jean Bailly (1736–1793), an astronomer; Antoine de Jussieu, a botanist; Antoine Lavoisier, a famous chemist; Joseph Guillotine, an all-around scientist.

The Commissioners first learned all about the process before setting out to test the theory of animal magnetism. Mesmer described animal magnetism as "a fluid universally diffused, the vehicle of a mutual influence between the celestial bodies, the earth and the bodies of animated beings". Mesmer's animal magnetism cannot be seen, so the Commission had to find unique ways of measuring it. In an empty treatment salon the Commissioners put an electrometer and compass next to the mesmeric tub and found nothing. The Commissioners also underwent treatment themselves, separate from the crowds, but not one of the Commissioners felt any sensation.

They then conducted experiments based on the logic of Mesmerism. they performed a series of ingenious experiments to determine to what degree the power of the imagination can influence sensation and to demonstrate whether imagination can be the cause in whole (or could it interact with magnetisim to create the effect).

One experiment involved magnetizing a tree and determining whether a susceptible person would be affected by touching it. The person was brought before many trees with his eyes covered, and asked to embrace them. Each tree affected him and before he got to the fourth tree he fainted. Unfortunately, for mesmerism none of those trees had been mesmerized.

In another experiment a patient was seated next to a closed door and told that a magnetic operation was occurring right behind the door. The patient became hysterical, but there was actually nothing behind the door. Another patient was presented with several basins which she was falsely told had been magnetized. She became hysterical around these basins. When she drank from a basin that was magnetized, she drank from it calmly.

As a result of these blind trials, the Commissioners concluded that Mesmer's universal fluid had no existence and that imagination, imitation, and touch were the true causes of the effects of mesmerism. Response to the Commission’s report was rapid and furious from the backer’s of mesmerism. One profesional mesmerizer published a book of letters written by patients claiming that therapy worked for them. Eventually, mesmerism declined and to my knowledge is no longer practiced (Note: Many of the Commisioners, such as Bailly and Lavoisier, as well as the King all suffered much more violent ends in the French Revolution).

Does the legacy of mesmerism still exist? Much of the field of hypnotism was founded by Armand Puysegur, a student of Mesmer. Hypnotism has been doubted in similar ways as mesmerism. EMDR (Eye Movement Desensitization and Reprocessing) has also been accused accused of being a pseudoscience. Scott Lilenfeld, a psychologist who often writes about pseudoscience says: "you can usually tell (what is pseudoscience) because there's a lot of marketing around these treatments, but there's no controlled evidence. Support consists of almost all anecdotes and personal testimony."

The participant blind trials conducted by the Commission were ingenuous at the time. In my humble opinion, the spirit of scientifically solving a problem displayed by the Commission seems to be much more powerful than simply relying on rhetoric, bad science, and anecdotal evidence that is often used even today.

For a highly entertaining and well written article, please see this 1999 article from Richard McNally on the comparison between Mesmerism and EMDR.

Thursday, December 28, 2006

Ben Franklin's Virtues

Benjamin Franklin at the age of 20 created a self-improvement project. He sought to cultivate his character by attempting to follow thirteen virtues. His autobiography lists his thirteen virtues:

1. "TEMPERANCE. Eat not to dullness; drink not to elevation."
This virtue reminds me of the Dalai Lama’s idea that the cause of human suffering is either in the excessive pursuit of pleasures or the excessive avoidance of pain.

2. "SILENCE. Speak not but what may benefit others or yourself; avoid trifling conversation."
Benjamin Franklin wrote and spoke volumes. He seemed to benefit others quite a bit. I like adding this rule: listen to others at least twice as much as you speak.

3. "ORDER. Let all your things have their places; let each part of your business have its time."

4. "RESOLUTION. Resolve to perform what you ought; perform without fail what you resolve."
Franklin was sent to France by congress to secure a desperately needed loan to support the American Revolutionary War. The French Prime minister, Count Vergennes ignored his request for a meeting. Franklin learned that Vergennes had a rare collection of books. He wrote to the Count asking to borrow one written by the French philosopher Voltaire. Vergennes lent him the volume, which Franklin returned a few weeks later with a letter of thanks. This correspondence not only broke the ice, but also appealed to Vergennes vanity. Before long, Franklin had secured the loan.

5. "FRUGALITY. Make no expense but to do good to others or yourself; i.e., waste nothing."

6. "INDUSTRY. Lose no time; be always employ'd in something useful; cut off all unnecessary actions."
Franklin was never idle. Aside from all his other accomplishments he raised money for Philadelphia's first libraries, police forces and fire companies.

7. "SINCERITY. Use no hurtful deceit; think innocently and justly, and, if you speak, speak accordingly."

8. "JUSTICE. Wrong none by doing injuries, or omitting the benefits that are your duty."

9. "MODERATION. Avoid extremes; forbear resenting injuries so much as you think they deserve."
Franklin was considered a master at consensus building. He advised Thomas Jefferson "never to engage in personalities". Jefferson later wrote that he never heard Franklin directly contradict anyone. Franklin wrote, "The conversations I engaged in went more pleasantly. The modest way in which I proposed my opinions procured them a readier reception and less contradiction."

10. "CLEANLINESS. Tolerate no uncleanliness in body, cloaths, or habitation."

11. "TRANQUILLITY. Be not disturbed at trifles, or at accidents common or unavoidable."
Franklin didn’t let his critics upset him. He urged others to make full use of other peoples envy and gossip for self-improvement. We often interpret negative remarks as malicious. Franklin wrote, "Love your enemies, for they shall tell you all your faults!" He said, "The sting of gossip is the truth of it".

12. "CHASTITY. Rarely use venery but for health or offspring, never to dullness, weakness, or the injury of your own or another's peace or reputation."

13. "HUMILITY. Imitate Jesus and Socrates."
When Franklin was a diplomat, he still introduced himself as Benjamin Franklin, a Printer. Adding contemporary people of which to imitate, I might include Ghandi, Martin Luther King Jr., the Dalai Lama, Mother Theresa, and Barry Sanders.

Tuesday, December 26, 2006

Peer Effects

Slate.com discusses research by Alexandre Mas and Enrico Moretti, who studied behaviors of grocery store checkout clerks. Below is an excerpt from slate.com:

"What happens when an unusually hardworking (or lazy) worker joins a team?

The question is part of the broader study of "peer effects." When my neighbor, classmate, or housemate is particularly smart, dishonest, or lazy, what does that do to me? The question is tricky because most people can select their peers. For example, observing that many kids in a school play truant, we might conclude that they are a bad influence on one another, but we might also conclude that the school is in a deprived area where richer parents choose not to live.

Some economists have looked at situations where peers have been assigned randomly—to a college dormitory, for instance, or even (through a government housing program) to a particular neighborhood.

Mas and Moretti rely instead on scarily detailed data: having somehow sweet-talked a supermarket into cooperating, they compiled a data-set that tracks every single "beep," every transaction, for 370 workers in six stores, timed by the second, for two years. They can measure each worker's productivity by the second and note how it changes depending on who else is working at the same time.

It is not obvious what they should find. Since shoppers can and do move to fast-moving lines, a quick worker will tend to lighten the burden on their colleagues. That might encourage them to slack off, or it might encourage them to work harder. The positive effect dominates, according to Mas and Moretti: They find that a shop assistant sitting near someone who is 10 percent quicker than average will raise her own game by 1.7 percent.

This might be an illusory effect. Perhaps at busy times, all workers increase their speed and managers also throw on the fastest workers. What looks like a peer effect would be the coordination of two different responses to a rush of shoppers. But Mas and Moretti can tell which times are busy and which times are not; they also know that checkout staff, not managers, choose their hours (one of the few benefits of the job); and they are measuring productivity changes every 10 minutes, not over the course of an entire shift. They are convinced that the positive peer effect is real.

But why? There are, broadly, two explanations. One is that workers are spurred to greater efforts when contemplating the superior speed of their colleague. This is psychologically plausible but economically irrational. A more cynical explanation is that workers do not like it when faster colleagues are looking at them, because they fear being accused of slacking off."

Not Even Wrong

Two new books have been released recently that discuss problems with String Theory in the field of physics. One book, “Not Even Wrong: The Failure of String Theory and the Search for Unity in Physical Law” by Peter Woit discusses how String Theory is not scientific. String Theory is not scientific because it can not be tested in the way that other theories can be tested. The title comes from the idea that scientific ideas can only be disproven. That is, we can only find evidence against our ideas. String theory, in Peter Woit’s opinion can not even be proven wrong because it is untestable or unfalsifiable. In scientific circles, to call something “not even wrong” is derogatory (just a step below criticizing someone’s mother).

Lee Smolin has written, “The Trouble with Physics: The Rise of String Theory, the Fall of a Science, and What Comes Next,” in which he describes how theoretical physics has been overtaken by groupthink. That is, the excitement about unifying quantum mechanics and relativity theory led to a process of groupthink in which the field of physics agreed to pursue the goal of string theory even though individual members did not necessarily agree with those goals. Groupthink is thought to lead to failing to look at alternatives, selective bias, and not analyzing the objectives.

Both of these books call for the field of physics to step back and reanalyze objectives, provide alternative theories, and to provide more testable hypotheses.

Monday, December 25, 2006

Scientific American: Airborne Baloney
The latest fad in cold remedies is full of hot air

Sunday, December 24, 2006

Big Ten Confounds

In any experiment there is the possibility that extraneous variables, rather than the independent variable(s) are producing the change on the dependent variable (a confound or threat to internal validity). If the extraneous variable cannot be eliminated it must be constant across conditions (group equivalence by randomly assigning participants to groups). Each experiment has its own confounds, but most are examples of the Big 10:

1. History - Any of the many events that occur in the outside world other than the IV that occur before the measurement of the DV. For example, if you forgot to give one group a certain set of instructions or if a new technician or student took over the experiment in the middle.

2. Maturation - Any of the conditions internal to the individual that change as a function of the passage of time. For example, in a longitudinal study, people change in different ways. In an experiment, the participant could become bored, fatigued, or figure out the purpose of the experiment.

3. Instrumentation - Any changes that occur as a function of measuring the DV. Instruments are not perfect in that sometimes they change in some way through the course of an experiment. Calibration of the measurement device may change during the course of the experiment. For example, the sound used to generate a startle response may get louder or quieter during the experiment.

4. Testing or Practice Effect - Being exposed to the testing procedures, participants can learn and device different strategies for taking the test. Being exposed to a test once is sufficient to change on a second testing the way you take the test.

5. Statistical regression (regression toward the mean) - Any change that can be attributed to the tendency of extremely high or low scores to regress to the mean. Here is an article on why Tiger Woods has a difficult time being better than everyone all the time.

6. Selection - Any change due to the differential assignment procedures used in placing subjects in various groups. Whenever the participants are selected nonrandomly for participation in a group, there is always a chance that some unintentional bias may have occurred. For example, doing an experiment using existing groups may be affected by selection bias. This selection bias can interact with the treatment to amplify or decrease the treatment effect.

7. Mortality [attrition] - Any change due to the differential participant loss from the various groups. People may leave the experiment. If they leave or choose not to participate, it could be because of the experimental condition. This problem is often seen in treatment outcome studies. For example, a person may not like the side effects or think the procedures are too difficult in his or her condition. The reason why a particular participant dropped out is a form of self-selection. Thus, you may have experimental results which reflect only those participants who persevered or did not dropout for some reason.

8. Sequencing - Any change in the participant’s performance that can be attributed to the fact that the participant participated in more than one treatment condition. That is, the order of procedures can have an effect of outcome.

9. Participant Effects-
a. Demand Characteristics - Any bias produced by participants trying to be good participants and behave in a manner that helps the experimenter (helps achieve data in the expected direction). Ideally, in most experiments the intent is to have the participant blind to the purpose of the experiment and/or which condition they are in.
b. Hawthorne effect - Based on studies at the Hawthorne Works of the Western Electric Company. This effect is generally defined as the problem when participants’ knowledge that they are in an experiment modifies their behavior from what it would have been without the knowledge.

10. Experimenter Effects - Any effect related to the person conducting the study knowing about the study’s hypotheses. That is, the experimenter being aware of the research hypotheses may treat the participants in the different conditions differently. The cueing does not have to be conscious. There is the Clever Hans phenomenon, which is a form of unconscious cuing. The term refers to a horse who responded to questions requiring mathematical calculations by tapping his hoof. If asked what is the sum of 2 plus 2, the horse would tap his hoof four times. It was eventually discovered that the horse was responding to subtle physical cues from its owner. After the horse heard a question and started tapping, the owner would unconsciously give an almost imperceptible head movement, which was the horse’s cue to stop.

Saturday, December 23, 2006

Partial Correlation

In an earlier entry, I discussed types of correlation coefficients for variables with different scales of measurement. In this entry, I want to discuss correlation coefficients that are used with three or more variables. These coefficients are known as partial and semi-partial correlation coefficients.

A zero-order correlation coefficient describes how much variance overlaps between two variables ignoring the third variable. This zero-order correlation is the same as a Pearson correlation coefficient as we discussed earlier. The figure below shows a) three variables and their overlaps and b) the overlap of variance described by the zero order correlation coefficient between X1 and Y.
A partial correlation is when the influence of the third variable is removed from the two variables that we are correlating. The figure below shows a) three variables and their overlaps and b) the overlap of variance described by the partial correlation coefficient. Thus, the partial correlation coefficient has the influence of X2 removed from both X1 and Y.
A semi-partial (also known as part) correlation is when the shared variance between the two predictor variables is removed. That is, the shared influence of the third variable is removed from only one of the other variables. The figure below shows a) three variables and their overlaps and b) the overlap of variance described by the semi-partial correlation coefficient. Thus, the partial correlation coefficient has the influence of X2 removed from X1.

Friday, December 22, 2006

Mediators and Moderators

One of the most confusing concepts in research is the distinction between mediators and moderators.

A mediator ‘mediates’ the relationship between two variables (the independent and dependent variables). That is, the two variables are correlated, but their relationship completely depends on the mediator. That is, the mediator accounts for the relationship between the two variables. In a treatment outcome study, a mediator is what is thought to be responsible for the change (e.g., mechanism of change). Thus, the mediator causes the dependent variable (variable C), but it is itself caused by the independent variable (variable C). In the picture below, pathway C would equal zero in a fully mediated relationship.
A moderator is a variable that affects the relationship between two variables. That is, it interacts with the independent variable to create different effects on the independent variable. Thus, It modifies the relationship (whereas a mediator accounts for the relationship). In a treatment outcome study, a moderator is something that usually exists before the study begins. For example, IQ or gender might modify how a person does in treatment. In the figure below, there is a moderated relationship if pathway C is significant. That is, there is an interaction between the IV and the moderator.
In thinking about these concepts causally, a mediator is a variable that comes in between two variables. That is, it is caused by one and in turn causes the other. A moderator modifies the relationship between two variables and does not have to exist in the causal chain. Gender is a common example of a moderator in that people will behave differently depending on their gender.

For further reading:
Baron, R. M. & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182.
MacKinnon, D. P.., Lockwood, C. M., Hoffman, J. M., West, S. G., & Sheets, V. (2002). A comparison of methods to test mediation and other intervening variable effects. Psychological Methods, 7, 83-104.
http://davidakenny.net/cm/mediate.htm
http://www.nimh.nih.gov/scientificmeetings/interventions.cfm

Correlation Coefficients

There are many different ways to calculate how much two items are associated. The calculation of these correlation coefficients depend on how the items are measured. That is, the calculation depends in part on whether the items are measured on a nominal, ordinal, interval, or ratio scale.

Pearson r
The most common correlation coefficient is the Pearson r, which is named after Karl Pearson who developed it about 1900 (although some say that Galton should get credit for its creation). It is used to calculate the correlation between two continuous variables (ratio or interval scale).

Spearman ρ (rho)
The Spearman ρ is used to calculate a correlation for two variables that are ordinal (rankings). The greek letter ρ is used, which is an exception to the general rule that Greek letters are used for population parameters. Kendall τ (tau) is often used in place of Spearman ρ

Biserial r
The biserial correlation coefficient is used when one variable is continuous (ratio or interval scale) and the other variable is dichotomous (nominal). However, the dichotomous variable is thought to reflect an underlying continuous variable. For example, if we had high and low anxiety, we would expect that anxiety is actually a continuous variable. A similar correlation coefficient, called a polychoric correlation is often used.

Point-Biserial r
The point-biserial r is used when one variable is continuous (ratio or interval scale) and the other variable is dichotomous (nominal scale). An example might be depression and gender.

Rank-Biserial Correlation Coefficient
The rank-biserial correlation coefficient is used to determine the association between dichotomous (nominal) and ordinal (rankings) data

ϕ (phi) Coefficient
If both variables instead are dichotomous (nominal), the ϕ (phi) correlation coefficient is used. The greek letter ϕ is another exception to the general rule that Greek letters are used for population parameters. A tetrachoric correlation coefficient is often used in place of ϕ.

One and Two-Tailed Tests

When you conduct a test of statistical significance, whether it is a t-test, correlation, an ANOVA, or regression, you are given a p-value in the output. Almost always, this p-value is for a two-tailed test.

If you are using a significance level of .05, a two-tailed test divides this value in half, meaning that .025 is in each tail of the distribution (see picture below).
The splitting of the p-value into each tail makes it more difficult to achieve statistical significance. However, it also means that you do not have to make a prediction about the direction of the effect. In other words, the effect can be either positive or negative and still be statistically significant.

There is much dispute about whether one should use a two-tailed test when one has a prediction on which direction the effect will be. For example, one might expect that the new experimental treatment is better than no treatment at all. This researcher would not expect that the treatment could make people worse (although it is a possibility; hence part of the argument for always doing a two-tailed test).

One advantage of the one-tailed test is that it has more power than a two-tailed test. That is, the probability of a Type II error is reduced because you are less likely to miss a significant effect with a one-tailed test (assuming that you have accurately predicted the direction of the effect; the probability of a Type I error is the same because the same alpha level is used).

Each picture below represents a one-tailed test (but in opposite directions).

Mplus Commands

Mplus is becoming a commonly used statistical modeling program that is used for structural equation modeling. It has many unique features (e.g., easy syntax, item-response analysis, latent-class analysis, indirect effects calculations, and random coefficients) that make it highly desirable for researchers.

One common problem that people have is getting their data in a format that can be used by Mplus. That is, many researchers use data management programs (such as SPSS or Excel) to enter data. Thus, one has to provide Mplus with a data file that was created in another program. The UCLA Statistical Consulting page has some great resources for this difficulty (they are also the authors of the highly entertaining podcast, Stattalk).

The UCLA Statistical Consulting Group has also provided excellent FAQs on Mplus, which I quote below:

Is an Mplus .inp program case sensitive?

No. You can type commands in upper and lower case, e.g.

Data:
File is hsb2.dat ;

is the same as

DATA:
FILE IS hsb2.dat ;

Can a command span over more than one line?

Yes, for example these two commands are identical.

Variable:
Names = id female race ses schtyp prog read write math science socst;

and

Variable:
Names = id female race ses schtyp prog
read write math science socst;

How do I indicate a comment?

A comment is indicated with !

The rest of the line after the ! is commented, for example

Variable:
Names = id female race ses schtyp prog read write math; ! read the variables

or to comment out some

Variable:
Names = id female race ses schtyp prog read write math; ! science socst;

Do I have to include the full path when pointing to a data file?

It depends. If you are reading a data file that is in the same folder (directory) as the input file, you do NOT need to include the full path when pointing to a raw data file. For example, if your data file is in c:\mydata\test.dat , and your input file is c:\mydata\myinput.inp, then you can read the raw data file like this.

Data:
File is test.dat ;

If your input file was in a different folder, then you would need to specify the full path, for example.

Data:
File is c:\mydata\test.dat ;

Does the order of variables matter on NAMES= ?

Yes, this represents the order of the variables in your raw data file, like on an input statement in other stat packages. So, in the example

Variable:
Names = id female race ses;

It implies that the variables in the data file are id then female then race then ses .

Does the order of variables matter on USEVARIABLES?

No, this is simply a list of variables to be used for the analysis (kind of like a keep statement). By the way, this can be abbreviated to USEVAR . Here is an example.

Variable:
Names = id female race ses;
usevar = female ses race;

And in this analysis only female, ses and race will be included.

Can commands be abbreviated?

Yes, as you saw in the example above,

Can Mplus handle user missing values (numeric missing values)?

Yes, with the Missing are command.

You can give all variables the same missing value, e.g. Missing are all (-999999999) ;

You can give different values for different variables, e.g. Missing are x1 x2 (-1) y1 y2 (-5) ;

You can even have multiple missing values for a variable, e.g. Missing are x1 x2 (-1 -2) y1 y2 (-5 -9) ;

Can Mplus handle periods (dots) as missing values?

Yes, you can specify Missing are .; and it will understand a . to be a missing value.

What about the is and are and = (equal sign). Are they all the same?

Yes, they are all the same. For example, the following statements all have the same effect.

Variable:
Names are id female race ses

or

Variable:
Names is id female race ses

or

Variable:
Names = id female race ses

Type I and Type II Errors, Part 2

An earlier post tried to clarify Type 1 and Type II errors. However, more clarification may be needed.

Another way of thinking about Type I and Type II errors is to look at the experimenter’s conclusions.

TYPE I ERROR
If the experimenter concludes that there was a significant difference between the groups in her sample, she can only be making a Type I error or a correct decision. That is, she either found the effect (correct decision) or she thinks she found an effect that actually wasn’t there (Type I error).

An example would be a positive result on a HIV test. This person can either have HIV or have a false positive (Type 1 error).

TYPE II ERROR
If the experimenter concludes that no effect was found, she can only be making the correct decision or failing to find that effect. That is, she can either be correct in saying that there is no effect or be missing an opportunity to find that effect.

An example would be a negative result on a HIV test. This person could actually not have HIV (a correct decision) or be missing a chance to detect the disease (Type II error).

A Type II error is a lost opportunity to correctly reject the null hypothesis.

Type I and Type II Errors

One easily confused relationship is between Type I and Type II errors.

A Type I error is when we reject the Null Hypothesis when in fact their was no effect (The Null Hypothesis always states that there is no relationship or no effect). That is, we conclude we found something when we actually did not.

A Type II error is when we fail to reject the null hypothesis when an effect actually exists. That is, we should reject the null hypothesis, but we don’t. We miss an opportunity to find an effect.

The table below is helpful to see the errors:
We can directly affect the Type I error rate by changing the alpha level. That is, we can reduce our chances of making a Type I error by reducing our alpha level (thus, we will be more conservative with our decisions to reject the null hypothesis).

We can indirectly affect Type II error rate by changing alpha. By raising alpha, we reduce the chance of Type II error. Type II errors are also affected by sample size (larger sample sizes reduce Type II error) and by effect size (the larger the effect the easier it is to detect a difference in effect).

To summarize, Type II error is affected by:

1) Alpha Level - Raising alpha lowers the chance of making a Type II error but raises the chances of making a Type I error

2) Sample Size - A larger sample gives us more of a chance (or more power) to detect an effect, thus reducing our chances of a Type II error

3) Effect Size - A larger effect size also lowers the chances of making a Type II error. That is, the more things are different, the easier it is for us to find that difference.

This site has a great visual display of how Type II error is affected by effect size

Coding Gender in Statistical Programs

When looking at statistical databases, I have seen many elegant (and inelegant) methods of coding variables. One consistent coding question is how to code gender. Some people code 1 for male and 2 for female or the other way around. This means of coding leads to difficulties not just in political correctness (i.e., who is #1?; I’ve seen people code males and females as 8 and 9 to get away from the who is #1 question), but in remembering how these variables were coded.

One solution is to name your variable female. With this naming, you could then code males as 0 (lacking the quality of femaleness) and females as 1. This means of coding clarifies how the variable was coded for anyone who looks at the database.

Thursday, December 21, 2006

Al Gore at TheOnion.com

From TheOnion.com:
Al Gore Caught Warming Globe To Increase Box Office Profits

The Onion

Al Gore Caught Warming Globe To Increase Box Office Profits

Al Gore Caught Warming Globe To Increase Box-Office Profits

Tuesday, December 19, 2006

Archive of Norms, Stimuli, and Data

The Psychonomic Society Archive of Norms, Stimuli, and Data is an excellent resource for students looking for research tools. There are stimuli such as word lists. There are statistical macros. There are also preexisting programs to replicate previous studies.

Monday, December 18, 2006

The Value of Allen Iverson


With all the talk of trading Allen Iverson, it reminded me of an interesting book review by Malcolm Gladwell. The book, The Wages of Wins, was written by a group of economists. It uses regression analysis to determine how many wins each player is worth his team. As usual, Gladwell's book review is wonderful. Here is an excerpt (please read the full article to get Gladwell's perspective):

"The first player picked in the 1996 National Basketball Association draft was a slender, six-foot guard from Georgetown University named Allen Iverson. Iverson was thrilling. He was lightning quick, and could stop and start on a dime. He would charge toward the basket, twist and turn and writhe through the arms and legs of much taller and heavier men, and somehow find a way to score. In his first season with the Philadelphia 76ers, Iverson was voted the N.B.A.’s Rookie of the Year. In every year since 2000, he has been named to the N.B.A.’s All-Star team. In the 2000-01 season, he finished first in the league in scoring and steals, led his team to the second-best record in the league, and was named, by the country’s sportswriters and broadcasters, basketball’s Most Valuable Player. He is currently in the midst of a four-year, seventy-seven-million-dollar contract. Almost everyone who knows basketball and who watches Iverson play thinks that he’s one of the best players in the game.

But how do we know that we’re watching a great player? ...The fact that Allen Iverson has been one of the league’s most prolific scorers over the past decade, for instance, could mean that he is a brilliant player. It could mean that he’s selfish and takes shots rather than passing the ball to his teammates. It could mean that he plays for a team that races up and down the court and plays so quickly that he has the opportunity to take many more shots than he would on a team that plays more deliberately. Or he might be the equivalent of an average surgeon with a first-rate I.C.U.: maybe his success reflects the fact that everyone else on his team excels at getting rebounds and forcing the other team to turn over the ball. Nor does the number of points that Iverson scores tell us anything about his tendency to do other things that contribute to winning and losing games; it doesn’t tell us how often he makes a mistake and loses the ball to the other team, or commits a foul, or blocks a shot, or rebounds the ball. Figuring whether one basketball player is better than another is a challenge similar to figuring out whether one heart surgeon is better than another: you have to find a way to interpret someone’s individual statistics in the context of the team that they’re on and the task that they are performing.

In “The Wages of Wins” (Stanford; $29.95), the economists David J. Berri, Martin B. Schmidt, and Stacey L. Brook set out to solve the Iverson problem. Weighing the relative value of fouls, rebounds, shots taken, turnovers, and the like, they’ve created an algorithm that, they argue, comes closer than any previous statistical measure to capturing the true value of a basketball player. The algorithm yields what they call a Win Score, because it expresses a player’s worth as the number of wins that his contributions bring to his team. According to their analysis, Iverson’s finest season was in 2004-05, when he was worth ten wins, which made him the thirty-sixth-best player in the league. In the season in which he won the Most Valuable Player award, he was the ninety-first-best player in the league. In his worst season (2003-04), he was the two-hundred-and-twenty-seventh-best player in the league. On average, for his career, he has ranked a hundred and sixteenth. In some years, Iverson has not even been the best player on his own team."

Perception Bias

The processing of the world is conducted through many levels. In humans, the first level of information processing is done through the senses. The 'raw information' is then processed and forms our mental image of the world. Cognitive biases are known to commonly occur (see risk assessment, decision making, predicting the future, taxi-cab problem for a few cognitive biases).

An analogy is the camera. Sensory information from the world goes through the lens and then the camera processes the information into the format that it is programmed to think of the world (further refinement is done with image manipulation programs).

Perception or sensory information has often thought to be raw information (like the camera lens). However, new research has shown that even our sensory information is being processed and being biased.

Denny Proffitt and colleagues from the University of Virginia have been studying how sensory information is processed and biased. Specifically, they have found that perception is biased by emotional and fatigue factors. Here is an excerpt from the APS Observer:

"A series of studies conducted in Proffitt's laboratory have shown that the amount of effort required to walk to a destination or throw an object to a specific target affects how far away the destination or target is perceived to be. Apparent distance decreases with fitness and increases with fatigue. A destination will also appear farther away after a person walks on a treadmill (due to the temporary illusion, produced by treadmill walking, that it takes effort just to stay in the same spot). Similarly, a person throwing a heavy object will perceive the target to be farther away than a person throwing a light object.

This finding also extends to the perceived slant of a hill: A person wearing a heavy backpack will view a hill as steeper than will someone who is unencumbered."

Of interest to fear and anxiety research is a study done in the Proffitt lab by Jeanine Stefanucci (now at The College of William and Mary). In this study, people estimated steepness of a staircase while standing on either a skateboard or a box (same height as skateboard) at the top of that staircase. People rated the steepness of the staircase much higher if they were standing on the skateboard (perception of height measured by multiple methods). The results indicate that people overestimate the height if they are standing on a skateboard. Thus, fear seems to be changing perception.

Sunday, December 17, 2006

File Drawer Problem

The File Drawer Problem is a publication bias in which results that are significant are published more that results that are not statistically significant. That is, non-significant findings end up in the file drawer.

The file drawer problem can come from many sources. One potential source could be a confirmation bias of the researcher. The researcher may only report significant findings that confirm his or her research hypotheses and leave nonsignificant results out of the published report. Another case might be a researcher who does not find significant results and thus, not publishing anything from that study.

Another potential reason for this bias is that journal editors may not want to publish results that are not significant. Journal editors, to some extent, want to make their journal more important and read by more people. Nonsignificant findings rarely make the headlines and don’t really add to the popularity of the journal. To take an extreme example, I cannot come up with one example of an article in the journal Science that focuses on a nonsignificant finding.

Why is the File Drawer Problem a problem? If positive results are the only results that are published, we provide only part of the picture. That is, the positive result featured in one article may have been a Type I error. It could be balanced by ten other studies which did not find a significant effect, but these ten studies were not published and thus can not be used as a comparison to this one significant effect.

Furthermore, if one wanted to compute an average effect size from the existing literature, this effect would be positively influenced by the lack of published studies without a positive effect. For example, if we wanted to compute the effect of hypnosis on smoking cessation, we would look at the existing literature. However, studies that did not find hypnosis decreasing smoking use may not be in the published literature. Thus, the effect of hypnosis on smoking cessation may be overestimated because there are few published studies which show no effect of hypnosis on smoking cessation.

A great article on this topic comes from Robert Rosenthal:
Rosenthal, R., 1979, The "File Drawer Problem" and Tolerance for Null Results, Psychological Bulletin, 86, p. 638-641.

Thursday, December 14, 2006

Conjunction Fallacy

The conjunction fallacy is credited to the work of Amos Tversky and Daniel Kahneman. It is a mistake that people make when they assume that a specific condition is more likely than a more general one. A specific event will have many conjoined events, while a single general condition will have only one event. The probability of any two events occurring together is always much less than either one happening apart.

An example similar to one used in the original research by Tversky and Kahneman may be helpful:

Lucas is 28 years old, single, outspoken, and very bright. He majored in humanities. As a student, he was deeply concerned with issues of discrimination and social justice. Which statement about Lucas is more likely?

1. He is a barista.
2. Lucas is a barista and is active in the anti-war movement.

People who make the conjunction fallacy chose option 2. However, mathematically, the probability of two events occurring together (in "conjunction") will always be less than or equal to the probability of either one occurring alone. That is, Lucas could be a barista and he could be involved in the anti-war movement, but it is much less likely that he is both.

Tversky and Kahneman might argue that people make the conjunctive error because option 2 seems more "representative" of Lucas based on the description of him, even though it is mathematically less likely.

Wednesday, December 13, 2006

Taxi Cab Problem

In a study conducted by Tversky and Kahneman, participants were asked about the probability of a witness correctly identifying the right color of a cab that was in an accident. The participants were given the following information:

A cab was involved in a hit and run accident at night.
There are two cab companies in the city, one with green cars and the other with blue cars. 85% of the cabs in the city are green and 15% are blue.
An eyewitness identified the cab as being blue. The court tested the reliability of the witness under the same circumstances that existed on the night of the accident and concluded that the witness correctly identified the colors 80% of the time and failed 20% of the time.
What is the probability that the cab involved in the accident was blue?

Almost all participants provided a probability greater than 50%. Many gave answers of 80% or higher.

Based our decision on only the base rate of cabs in the city we would guess that there is an 85% chance that it was a green cab involved in the accident rather than a blue cab.

However, people relied on the testimony of the witness to a much greater degree than on the base rate information even with the information about the unreliability of the witness. This problem is known as the base rate fallacy. That is, people will often ignore the base rate when making decisions.

Knowing how good the witness is in identifying the different color cabs at night and the base rates of blue and green cars, we can use Bayes Theorem to calculate how likely it is that the car is in fact blue.

There is a 12% chance of the witness correctly identifying a blue cab (True Positive; the base rate of blue cabs, 15% multiplied with the probability of the witness correctly identifying the color of a cab, 80%) .

There is a 17% chance of the witness incorrectly identifying a green cab as blue (False Positive; the base rate of green cabs, 85% multiplied with the probability of the witness identifying the wrong color cab, 20%).

There is a 29% chance (17% added to 12%) that the witness will identify any cab as blue.

This results in a 41% chance (12% divided by 29%) that the cab identified as blue is actually blue (Positive Predictive Value).



See also this entry about base rates and diagnostic decisions for more information about Bayes Theorem.

Tuesday, December 12, 2006

Undervalued College Prospects

Slate magazine has an interesting article on how George Mason University attracts undervalued prospects in their economics department and on their basketball team. Here is a quote that summarizes much of their article:

“GMU lacks the resources and reputation to recruit McDonald's All-Americans or Alan Dershowitzes. So instead, GMU has hunted for inefficiencies in its markets. Coach Jim Larranaga follows the Moneyball model of recruitment: hunting for the undervalued players—the ones who everyone else thought were too short, too thin, or too fat—and then building them into a team. In its astonishing defeat of UConn, GMU's players were giving away 4 inches at nearly every position...

One reason is that coaches who take chances on oddball players risk making themselves look foolish. A coach who goes after the same jock that everyone else wants, or an investment analyst who picks the same stock that everyone else recommends, at least can't be made to look worse than average. Herd behavior means that unpopular opportunities remain unexploited. An unusual coach who's willing to look unfashionable with the in-crowd has a chance to excel.

This is also the idea behind GMU's free-market-oriented economics department. The department got started with a heretical premise: The academic market is inefficient, so how can we exploit it? GMU knew it couldn't afford to be a first-class MIT and didn't want to be a second-class MIT, so successive chairs of the department, backed by entrepreneurial university presidents George Johnson and Alan Merten, looked for unexploited opportunities.

James Buchanan, GMU's first Nobel Prize winner, has never had an Ivy League position and indeed he has never taught above the Mason-Dixon Line. Gordon Tullock, a potential future Nobelist, has no degree in economics and took only one class in the subject. Vernon Smith, who moved his team from the University of Arizona (again, no Harvard) to GMU in 2001, had to fight to get people to treat experimental economics as more than a cute parlor game.”

Malcolm Gladwell, writing for the New Yorker, discusses how the Ivy League schools keep their outstanding reputation:

Social scientists distinguish between what are known as treatment effects and selection effects. The Marine Corps, for instance, is largely a treatment-effect institution. It doesn’t have an enormous admissions office grading applicants along four separate dimensions of toughness and intelligence. It’s confident that the experience of undergoing Marine Corps basic training will turn you into a formidable soldier. A modeling agency, by contrast, is a selection-effect institution. You don’t become beautiful by signing up with an agency. You get signed up by an agency because you’re beautiful.


At the heart of the American obsession with the Ivy League is the belief that schools like Harvard provide the social and intellectual equivalent of Marine Corps basic training—that being taught by all those brilliant professors and meeting all those other motivated students and getting a degree with that powerful name on it will confer advantages that no local state university can provide. Fuelling the treatment-effect idea are studies showing that if you take two students with the same S.A.T. scores and grades, one of whom goes to a school like Harvard and one of whom goes to a less selective college, the Ivy Leaguer will make far more money ten or twenty years down the road.


The extraordinary emphasis the Ivy League places on admissions policies, though, makes it seem more like a modelling agency than like the Marine Corps, and, sure enough, the studies based on those two apparently equivalent students turn out to be flawed. How do we know that two students who have the same S.A.T. scores and grades really are equivalent? It’s quite possible that the student who goes to Harvard is more ambitious and energetic and personable than the student who wasn’t let in, and that those same intangibles are what account for his better career success. To assess the effect of the Ivies, it makes more sense to compare the student who got into a top school with the student who got into that same school but chose to go to a less selective one. Three years ago, the economists Alan Krueger and Stacy Dale published just such a study. And they found that when you compare apples and apples the income bonus from selective schools disappears.


“As a hypothetical example, take the University of Pennsylvania and Penn State, which are two schools a lot of students choose between,” Krueger said. “One is Ivy, one is a state school. Penn is much more highly selective. If you compare the students who go to those two schools, the ones who go to Penn have higher incomes. But let’s look at those who got into both types of schools, some of whom chose Penn and some of whom chose Penn State. Within that set it doesn’t seem to matter whether you go to the more selective school. Now, you would think that the more ambitious student is the one who would choose to go to Penn, and the ones choosing to go to Penn State might be a little less confident in their abilities or have a little lower family income, and both of those factors would point to people doing worse later on. But they don’t.”


Krueger says that there is one exception to this. Students from the very lowest economic strata do seem to benefit from going to an Ivy. For most students, though, the general rule seems to be that if you are a hardworking and intelligent person you’ll end up doing well regardless of where you went to school. You’ll make good contacts at Penn. But Penn State is big enough and diverse enough that you can make good contacts there, too. Having Penn on your résumé opens doors. But if you were good enough to get into Penn you’re good enough that those doors will open for you anyway. “I can see why families are really concerned about this,” Krueger went on. “The average graduate from a top school is making nearly a hundred and twenty thousand dollars a year, the average graduate from a moderately selective school is making ninety thousand dollars. That’s an enormous difference, and I can see why parents would fight to get their kids into the better school. But I think they are just assigning to the school a lot of what the student is bringing with him to the school.””


These articles provide interesting commentary on individual differences, selection bias, and ‘correlation does not equal causation’ issues. There are many individual difference issues that one might not ever be able to predict outcome of a college experience. Selection effects bias the data even more. Another question is whether our graduates are good because they graduated from here or are they good because they were good to begin with (probably both are correct to some extent with variation dependent on individual differences).

Monday, December 11, 2006

Urban Dictionary vs. Wikipedia

Wikipedia, an online community encyclopedia, is getting some competition from the Urban Dictionary.

I can't find Hasselhoffing on Wikipedia, but I can find it on the Urban Dictionary (It is the act of changing a colleagues desktop wallpaper to the image of David Hasselhoff).

If you are Food Horny and you buy 'Food of shame,' you will need to go to the Urban Dictionary to describe your experience.

Google drift is drifting aimlessly among subjects on the internet (also not on Wikipedia, but can be accomplished using Wikipedia).

However, if you would like to learn about Rosalind Franklin, or space shuttle Columbia you will have to still go to Wikipedia.

Sunday, December 10, 2006

Donald Rumsfeld or Rob Zombie?

Below are some quotes from Donald Rumsfeld and from Rob Zombie. Can you tell the difference?

1) Great things come out of being hungry and cold.

2) Learn to say "I don't know." If used when appropriate, it will be often.

3)You can't worry about a plane crash. What are you going to do? Complain all the way down to the ground? Might as well just enjoy the roller-coaster ride. It doesn't pay to worry.

4) Every day is filled with numerous opportunities for serious error. Enjoy it.

5) Embrace strange obsessions. Today's weirdo is tomorrow's mogul.

6) It's nice to be important. It's more important to be nice.

7) If you foul up, tell the president and correct it fast. Delay only compounds mistakes.

8) Anger is all about management. You choose to manage it, or you don't.

9) Don't think of yourself as indispensable or infallible. As Charles de Gaulle said, the cemeteries of the world are full of indispensable men.

10) Once you're pampered, you get lazy.

11) Be yourself. Follow your instincts. Success depends, at least in part, on the ability to "carry it off."

12) If you are not criticized, you may not be doing much.

13) If you get too comfortable, you lose your edge. A little pain keeps you striving.

14) Know that the amount of criticism you receive may correlate somewhat to the amount of publicity you receive.


Rumsfeld 2, 4, 7, 9, 11, 12, 14
Zombie 1, 3, 5, 6, 8, 10, 13


Friday, December 08, 2006

How Many Lottery Tickets?

Let's say that your company is having a holiday party. As part of the festivities, they give gifts of lottery tickets. In Florida, we have a game called Lotto. The Florida Lotto has a person pick 6 numbers (numbers can be 1 to 53). A person wins a prize if he or she matches 3, 4, 5, or 6 of the numbers. If the person matches all 6, he or she wins the major prize, which is determined by how many people purchase tickets (usually at least 3 million dollars). If a person matches 5 numbers, he or she wins a smaller prize (around $3000). If a person matches 4 numbers, he or she wins $70. The lowest prize is for matching 3 out of the 6 numbers and that prize is $4.50.

The Lotto odds of winning are listed on the Florida Lotto website. However, your company is buying multiple tickets, so what are the odds of at least one person winning a prize (all tickets are purchased with random numbers)?

To take one example, the probability of matching 3 out of the 6 numbers is about .014. Thus, you only have a 1.4% chance that your one ticket matches 3 out of 6 numbers. If there are 72 employees, then the probability that not one of those tickets matches 3 out of 6 numbers is .359 (.014 multiplied by itself 72 times). Thus, there is a 64% chance that someone will win $4.50.

Here are the chances for at least one person winning the prizes if your company buys 72 tickets:
3 out of 6 numbers: 64.1%
4 out of 6 numbers: 4.96%
5 out of 6 numbers: .088%
6 out of 6 numbers: .00031%
Someone winning at least one prize: 65.93%

Thursday, December 07, 2006

Swivel - Share and Explore Your Data

There is great potential for a new site called Swivel that allows people to upload and explore data. For example, if you are an Orlando Cepeda fan (played baseball from 1958 to 1974), you can download his player statistics for his career. There currently seems to be many random data sets, but I imagine that the data will continue to grow. The ability to merge data sets allow for many possibilities. One can only imagine how Orlando Cepeda's batting average and the Gross Domestic Product of Paraguay relate.

Tuesday, December 05, 2006

One's Enemy is the Best Teacher

As part of writing a syllabus for statistics courses, I attempt to inspire people who may be more afraid of statistics. One such inspiration I confabulated is that a Nobel Prize winner once said that we should buy gifts for our enemies because they teach us so much about ourselves. In my attempt to actually find this quote, I found the original quote from the Dalai Lama. It is somewhat close (?):

"In the practice of tolerance, one's enemy is the best teacher."

BCS College Football Bowls

Slate magazine has a great story on the BCS College Football Bowl Championship. An earlier post on this blog discussed the error in all Sports' playoff systems.

Here is an excerpt from the Slate article:
"No. The fact that the Wolverines are probably the second-best team in the country doesn't mean they've earned the right to play in the national championship game. In fact, it means the exact opposite: Michigan's No. 2 status is why they shouldn't be playing for the title.

Playoff systems are designed to determine, in a fair manner, which is the single best team in a particular sport. Their purpose is not to pit the two finest teams against each other in a season-ending game. The Yankees and Red Sox do not play annually in the World Series. The Indianapolis Colts will never be given a chance to play the New England Patriots in the Super Bowl. When the two best college basketball teams in the country face off, as they routinely do, in a Final Four semifinal or even in the round of eight, does anyone think that the loser deserves a rematch?

Take this example: Does anyone think the Seattle Seahawks were the No. 2 team in the NFL last year? No. Likewise, will anyone think the NFC champion who makes it to this year's Super Bowl is the second-best team in football? Of course not. Will the best team in the NFL still win the Super Bowl? Yes. Even if it's an NFC team!

Unlike TV commentators and sports columnists, the college-football voters understand, at least implicitly, that the season-long playoff that is the college football season should determine the single best team, not the best two teams. That's why the voters in the Harris poll and coaches' poll have consistently voted against a Michigan-Ohio State rematch. The voters cast their ballots for "not-Michigan" when they voted for USC, and they've cast their ballots for "not-Michigan" by voting for Florida.

Do we know if Florida is the second-best team in the country? Of course not. Here's what we do know: Michigan is not the best. How do we know that? By the traditional criterion: They scored fewer points in a football game than Ohio State did. The only team that has the "right" to play in the BCS championship game is the best team, Ohio State. And the only teams that should be scratched without question are teams that have already been determined to be "not the best," like Michigan."

Monday, December 04, 2006

Wide and Long Data Formats

When creating a database of variables in a longitudinal study, there is often a question of how to format the data.
Each row can be a person and for every new time he or she is tested a new variable is created (wide format):

Or, a new row can be started for each new time point measured (long format):

Some types of analyses require the wide format and others require the long format. One particular problem people run into is how to convert one format to the other. In SPSS, SAS, Stata, and JMP, the procedure is relatively simple (simple once you have figured it out). In SPSS, the following syntax will turn the wide formatted file into a long formatted file (see pictures; of minor note, this syntax file would not actually work because of the length of my variable names. However, substituting your 8 character variable names into this syntax will work just fine.):
varstocases
/make Depression from DepressionWeek1 DepressionWeek2
/make Anxiety from AnxietyWeek1 AnxietyWeek2
/index = Week.

To convert this long formatted file back to wide formatting, the following SPSS code can be used:
casestovars
/id=Subject
/index=week
/drop id.

In SAS, syntax for reshaping the data from wide to long can be found here and here; reshaping from long to wide can be found here and here.

In Stata, syntax for reshaping the data from wide to long can be found here and reshaping from long to wide can be found here.

In JMP, the transpose, sort, and stack options can get the desired results.

Sunday, December 03, 2006

Insurance and Risk Assessment

Having recently rented a rental vehicle, I had to decide whether I wanted to purchase the rental insurance. For only $32, I could have a $1,000,000 insurance policy (can I really do that much damage?). My choice was to not buy it, although I felt a great amount of fear and guilt in doing so. Rationally, I know that if I was in that great of a risk, then the cost of the insurance would be much greater. I also know that emotions play a great role in decision making and that companies use this to their advantage.

The role of probability also comes into play. In the last ten years, I have had zero accidents. That is probably in the normal range of accidents (I don’t feel that my driving skills warrant a job change to race car driver). So what is the probability that I get in an accident driving three hours? Probably not so high. In the long run, buying risk insurance is similar to gambling at the casinos. I might win a little, but in the long run, the casino and the insurance company has the better odds (or else they would not be able to operate).

Another interesting perspective comes from Daniel Kahneman, a Nobel Prize winner who describe risk aversion:
“If you think in terms of major losses, because losses loom much larger than gains you tend to be very risk-averse. When you think in terms of wealth, you tend to be much less risk-averse. I'll give you an example: Suppose someone offered you a gamble on the toss of a coin. If you guess right, you win $15,000; if you guess wrong, you lose $10,000. Practically no one wants it. Then I ask people to think of their wealth, and now think of two states of the world. In one you own [your current assets] minus $10,000 and in the other you own [your current assets] plus $15,000. Which state of the world do you like better? Everybody likes the second one. So when you think in terms of wealth--the final state--you tend to be much closer to risk-neutral than when you think of gains and losses.”

Based on this explanation, the rental company is framing my decision on what I could lose in an accident.

Slate.com has an article on the topic of rental car insurance. Tim Hartford also writes about the field of economics relating to inside information.
The Aplia Econ Blog has an article on the inside information that insurance companies (or insurance consumers) may have when buying a policy.

StatTalk

I really enjoy the podcast from UCLA on statistics. They give interesting tidbits and hints on statistical programs. They are actually quite entertaining.

There are currently 8 episodes. I am not sure whether they plan more, but sending them nice emails telling them how great their podcast is may help push them to renew making the podcasts.

To subscribe:
Open iTunes
Select Podcasts in the Source window
Select Subscribe to Podcast... from the Advanced menu
In the URL type (or paste) http://www.ats.ucla.edu/stat/stattalk/stattalk.xml
Click OK

Friday, December 01, 2006

Somebody Has Got to Win

Someone has always got to win the lottery.

It is a very true statement that someone does win the lottery. However, the odds of me winning the lottery are extremely small. Another way to think about it is that it is highly unlikely that I will win the lottery, but it is less unlikely that I and two million other people won't win the lottery.

In an earlier post, I discussed the probability of no one sharing the same birthday in a class of 28 students. There is only around a 35% chance that no two people will have the same birthday. The same logic can be applied to the lottery.

If the overall odds of winning the one million dollar Florida lottery is 1 in 125,000, then the probability of me winning (buying only one ticket) is .000008. It is sometimes helpful to imagine it in another way. The probability of me not winning is .999992 (1-.000008) or there is a 99.99% chance of me not winning the lottery.

However, someone has to win. When we think about the probability of me and one other person both not winning we still have a 99.99% chance of the two of us not winning (my odds of not winning, .999992 multiplied by her odds of not winning, .999992). Still not such a good gamble if we both played. However, if one hundred thousand people buy tickets, there is only a 45% chance that no one will win (.999992 multiplied by itself 100000 times). Thus, when so many people play, the chance that not one of them will win becomes very small. With the millions of tickets sold there is an extremely high probability that someone has to win.

I think I will save my dollar and let someone else win.