Friday, September 28, 2007

Publication Bias

Below is an excerpt from Wired magazine, also see this earlier entry on the 'file drawer problem' discussing this same problem:

In 1981, the New England Journal of Medicine published a Harvard study that showed an unexpected link between drinking coffee and pancreatic cancer. As it happened, researchers were anticipating a connection between alcohol or tobacco and cancer. But according to the survey of several hundred patients, booze and cigarettes didn't seem to increase your risk. Then came a surprise: An incidental survey question suggested that coffee did increase the chances of pancreatic cancer. So that's what got published.

Those positive results, alas, were entirely anomalous; 20 years of follow-up research showed the coffee-cancer connection to be bunk. Nonetheless, it's a textbook example of so-called publication bias, where science gets skewed because only positive correlations see the light of day. After all, the surprising findings are what makes the news (and careers).

So what happens to all the research that doesn't yield a dramatic outcome — or, worse, the opposite of what researchers had hoped? It ends up stuffed in some lab drawer. The result is a vast body of squandered knowledge that represents a waste of resources and a drag on scientific progress. This information — call it dark data — must be set free.

For the past couple of years, there's been much talk about open access, the idea that more scientific publications should be freely available — not locked behind firewalls and subscriptions. Thanks to the Public Library of Science (PLoS) and other organizations, that notion is making headway. Liberating dark data takes this ethos one step further. It also makes many scientists deeply uncomfortable, because it calls for them to reveal their "failures." But in this data- intensive age, those apparent dead ends could be more important than the breakthroughs. After all, some of today's most compelling research efforts aren't one-off studies that eke out statistically significant results, they're meta-studies — studies of studies — that crunch data from dozens of sources, producing results that are much more likely to be true. What's more, your dead end may be another scientist's missing link, the elusive chunk of data they needed. Freeing up dark data could represent one of the biggest boons to research in decades, fueling advances in genetics, neuroscience, and biotech.

So why doesn't it happen? In part, it's a logistics problem: Advocating the release of dark data is one thing, but it's quite another to actually collect it, juggling different formats and standards. And, of course, there's the issue of storage. These days, an astronomical study of quasars or an ambitious bioinformatics project can generate several terabytes of data. Few have the capacity to store that, let alone analyze it.

Google, among others, is lending a hand with its Palimpsest project, offering to store and share monster-size data sets (making the data searchable isn't a part of the effort). As storage costs drop, similar data banks will emerge, along with format standards, and it should become ever easier to share results, good or bad.

Technology is actually the simple part. The tougher problem lies in the culture of science. More and more, research is funded by commercial entities, which deem any results proprietary. And even among fair-minded academics, the pressures of time, tender, and tenure can make openness an afterthought. If their research is successful, many academics guard their data like Gollum, wringing all the publication opportunities they can out of it over years. If the research doesn't pan out, there's a strong incentive to move on, ASAP, and a disincentive to linger in eddies that may not advance one's job prospects.

Wednesday, September 26, 2007

Data or Datums

An excerpt from the Social Science Statistics blog:

I was reminded again the other day that the word “data” is plural, since it means more than one “datum”, and thus “data” requires a plural verb. The Economist style guide says so, as does the European Union translation manual. The Oxford English Dictionary doesn’t even have an entry for “data,” subsuming it under “datum,” and it identifies sentences with singular constructions as “irregular or confused usage.”

End of story, right? Maybe, maybe not. There are a couple of problems with the “data is the plural of datum” story. (These have been discussed widely on the web, and I’m drawing freely on those discussions). First, it is not quite right even in Latin to say that “data” is the plural of the singular count noun “datum”; both are conjugations of the verb dare, to give. Second, in English, we hardly ever refer to one piece of data as a datum; at least in political science it is an observation, a case, or perhaps a data point. When the word datum is used, it usually has a specialized meaning and takes the plural form “datums.”

The bigger problem, from my perspective, is that fully adhering to “data” as a plural count noun forces you into constructions like

How many data are enough?

instead of

How much data is enough?

The first of these “How many data are…” is correct for a plural count noun, while the second, “How much data is…” is appropriate for a mass noun such as “gold” or “water.” The second sentence sounds much better to me. It also wins on a Google Scholar search by a margin of 10 to 1 (2120 to 198). There are also about 400 hits for “How much data are…”, no doubt from those who want to treat “data” as a mass noun but have been reminded that “data is plural.” It seems to me that data has come to be like the mass nouns described in this post from Language Log:

A great many M nouns denote collectivities of things, but small things, especially small things whose indivual identities are not usually important to us: CORN, RICE, BARLEY, CHAFF, CONFETTI, etc. Some of these contrast minimally with C nouns of similar denotations, like BEAN, PEA, LENTIL. In any case, it would be easy to think of barley in "The barley was almost cooked" as "meaning more than one" in much the same way as lentils in "The lentils were almost cooked" does -- and in fact, every so often someone misidentifies little-thing M nouns as "plural".

I kind of like the idea of data as a collection of small things that aren’t that important to us as individual objects but that are meaningful when taken together.

Tuesday, September 25, 2007

Conspiracy Theories

An excerpt from livescience.com:

Forty-three years after that Friday in Dallas, JFK is still the victim of a massive conspiracy, Elvis is still alive and presumably eating chocolate-covered fried chicken, and Paul McCartney is dead.

For those who don't know, Paul supposedly died in an automobile crash in 1966 and was replaced by a double. The new Paul—or "Faul" (for "Fake Paul")—has been impersonating him for, lo, these many years.

One Web site, at www.uberkinder.5u.com/paul/, not only has all the usual clues from the Beatles's album covers and music but compares voices and superimposes pictures of Paul at various times during his career to contrast noses, chins, bone structure, etc. There really is a lot of evidence to support the theory that McCartney is dead. The idea is presented cogently and backed by mountains of evidence. Like so many theories, it is neat and plausible, but, nonetheless, wrong.

The reason conspiracy theories are such an important subject is that we exist in a world awash with them: Kennedy, the origin of the AIDS virus, our supposedly faked moon landing, the death of almost anybody famous (Lennon, Princess Diana, John Kennedy Jr., Elvis, etc.), the government being involved in inner-city drug conspiracies, and on and on it goes.

But some might inquire what difference it makes if people believe in cloyingly clever canards.

We dwell in a world where fairy tales and fictions are already the norm. According to articles I've read, 70 percent of the public believes that there was a vast conspiracy to kill JFK, 80 percent believes in the existence of UFOs, and approximately 95 percent believes in supernatural beings such as ghosts, gods, devils, angels, and poltergeists.

In effect, everything in the world of conspiracies is the opposite of what it is in reality. Paul McCartney, who is really alive, is dead; Elvis, who is really dead, is alive. Since so many people saw JFK being fatally injured, you can't say he's alive, so they go for the next best thing: massive conspiracy. Whatever is . . . isn't. Whatever reality you don't like, you can change with the handy eraser on the end of your pencil-like head.

So what difference does it make? I maintain that one of the reasons the world is in the jolly shape it's in is that we have many people believing in and, more significantly, acting upon things that are simply not true. When we believe in fairy tales, we keep ourselves timorous children. We lose our individual strength and begin looking to things outside of ourselves for that strength and guidance.

Let's look briefly at the most famous conspiracy theory, JFK's assassination in Dallas. [See also Massimo Polidoro's column "Facts and Fiction in the Kennedy Assassination," Skeptical Inquirer, January/February 2005.] Seventy percent say it was a conspiracy. This makes not believing it was a conspiracy seem naïve, gullible—well, let's face it, downright jug-headed.

Much is made of the grassy knoll, located in front and to the right of Kennedy when the shooting occurred. Many people have opined that the shots came from there. Oliver Stone, in his movie JFK, suggests that the horrific head shot that killed Kennedy came from there.

As a young boy, my father took me out hunting probably hundreds of times. I shot hundreds if not thousands of animals (something I no longer do). All of the animals that met their deaths from my gun (deer, rabbits, squirrels, birds) died the same way: small holes where the bullet entered, perhaps a minuscule trickle of blood, and, if the bullet hit bone, massive, craterlike holes in their bodies where the projectile exited. It's simply a case of physics. No bullet ever made has the ability to tear large holes where it enters; it can only create them where it exits. This is because the bullet is intact when it enters and then explodes or fragments upon hitting bone.

And the same is true of the bullet that struck JFK. As anyone who has ever hunted could tell you, the head shot that took the front right part of Kennedy's head off could have come only from behind. There is no other possibility, so scratch the grassy knoll.

The second area of doubt I would like to broach is the following, which I have never, by the way, seen addressed or answered in all my years of reading about the Kennedy case. If Oswald was part of a grand conspiracy and was ordered, commissioned, and paid to execute the president, is the method of getting a job in a building and then hoping you get lucky with the target actually driving by a viable hitman strategy? Obviously, Oswald didn't go to Kennedy—as all other hitmen do; Kennedy came to Oswald. Does this really sound like a conspiratorial plot to you? Or does it sound like what it was, not a crime of conspiracy but a crime of opportunity?

By the way, Kennedy wasn't scheduled to go to Dallas until just a couple of days earlier. His staff made a last-minute change for him, so he could go stumping (i.e., campaigning) for Democratic congressional candidates. Nowhere in the years since the shooting have I ever read or heard that anybody has ever suspected Kennedy's own inner staff of setting him up. They question every other aspect of the case from A to Z, but I don't know of anybody, official or unofficial, who has suspected the staff within the White House of being part of a grand conspiracy. His being rescheduled for a visit to Dallas was just one of those things that comes up all the time in politics.

The thing that caught my attention about the McCartney conspiracy case was that it had so much detail, just like the Kennedy case. There are literally hundreds of pieces of "proof" that McCartney is dead. His height changing in different pictures, voice analysis, picture analysis, and on and on. As with the Kennedy case, there are endless pieces and tidbits here and there that can be combined to make it look like a strong case.

However, anything, anywhere or anytime, can be made to look like a conspiracy, if that is what your agenda is at the outset. There are such things as inductive and deductive reasoning. In deductive reasoning, you start with a premise or hypothesis (e.g., Kennedy's assassination was a conspiracy) and then you look for all the pertinent information, modify it to suit your hypothesis, and throw out all that doesn't fit.

It's what we use in our adversarial legal system. One side scours for what it wants to find and so does the other, to support their diametrically opposed theories of the case. What it really comes down to is modern-day sophism; you have parties spinning theories and then finding, spinning, or cooking up evidence that supports what they want to believe. (It is, in my always humble opinion, one of the principal flaws in our legal system.) Deductive logic is by far the most prevalent way of thinking in our society. The other method is using inductive reasoning. Withholding judgment or theory, looking at all the evidence, and then formulating your belief or theory based on all of the available evidence—regardless of what you may prefer the evidence to say. This is actually one part of the scientific method.

Tuesday, September 18, 2007

Gut Reactions vs. Deliberations

An excerpt from the Wired Science Blog:

The capacity for cluelessness of the clever was the subject of an Idea Festival talk by journalist Laurence Gonzales, who in Deep Survival examined the question of why some people survive crises and others die. The two questions, he said, overlap: survivors are often those who think deliberately under pressure, while deliberation is what helps people avoid stupid mistakes.

Such mistakes originate, he believes, in the tendency of people to instinctively and thoughtlessly follow already-established mental scripts rather than addressing reality directly. Of course, such patterns of behavior are what let us move through life without re-learning to tie our shoes every time we leave the house; to some extent they're necessary. And so long as the present resembles the past, this works fine in more complicated scenarios; but add a few wrinkles, and things go wrong.

Gonzales gave a personal anecdote of piloting a plane on a route he'd taken often before, feeling so comfortable that he didn't register until the last moment a looming thunderstorm that would have destroyed his plane but for a fortuitous radio message that disturbed his reveries.

"We use models and scripts, not the real world; we operate on the basis of what we learned in a slightly different situation," he said.

Tuesday, September 11, 2007

Regression to the Mean

An excerpt from Livescience:

A common media myth that sprung up after the attacks was that American tastes in entertainment would be forever changed. After seeing real-life horrors, the experts claimed, Americans would yearn for non-violent, wholesome family fare. Pundits filled pages second-guessing America’s taste in entertainment—nearly all of which turned out to be overstated or flat-out wrong.

Entertainment Weekly magazine, for example, devoted much of its Sept. 28, 2001, issue to, as the cover put it, "The challenge to our culture." The magazine joined in the media chorus talking about the death of irony and the dramatic impact terrorism would have on the entertainment industry. Writer Jeff Gordinier wrote that "it’s hard to believe that we’ll ever see anything the same way....it took only an instant of excruciating reality to render our old [entertainment] appetites moot, piddling, even nauseating." The effect was so profound, Gordinier wrote, that "the mere glimpse of a quippy sitcom was enough to induce a sour grind of physical revulsion."

That effect, if it was ever true, seems to have been short-lived.

Within months, American tastes in entertainment returned to "normal" and in fact grew even more gory, sadistic, and horrifying than before 2001. "Torture porn" films such as "Saw" and "The Hills Have Eyes" were so successful that they spawned dozens of sequels and imitators. ("Saw," which features victims being tortured to death in creative, sadistic ways, grossed more than $100 million in box office sales worldwide.) Quippy sitcoms are everywhere, and more Americans can name Britney Spears's ex-husband than the prime minister of Iraq.

Claims that tragedies fundamentally change the American character are not new, of course. Similar pronouncements followed the Columbine shootings and the Oklahoma City bombing, as well as the 1993 World Trade Center bombing. Certainly, the September 11 attacks were of a different scale, but the "everything has changed" motif has been disproven over and over again.

Americans are much more resilient than they are given credit for.

America will always live with the legacy of the September 11 attacks, in myriad ways ranging from airport security to annual memorials. But there's little evidence that the average American's life or character has been changed forever.

Monday, September 10, 2007

Citation Indices

Here is an excerpt from this article by Peter Lawrence:

Modern science, particularly biomedicine, is being damaged by attempts to measure the quantity and quality of research. Scientists are ranked according to these measures, a ranking that impacts on funding of grants, competition for posts and promotion. The measures seemed, at first rather harmless, but, like cuckoos in a nest, they have grown into monsters that threaten science itself. . . . The journals are evaluated according to impact factors, and scientists and departments assessed according to the impact factors of the journals they publish in. Consequently, over the last twenty years a scientist's primary aim has been downgraded from doing science to producing papers and contriving to get them into the "best" journals they can. Now there is a new trend: the idea is to rank scientists by the numbers of citations their papers receive. Consequently, I predict that citation-fishing and citation-bartering will become major pursuits. . . .

Thursday, September 06, 2007

Satisfaction

An excerpt from Mahalonbis' Distance:
The book Satisfaction by Gregory Berns is a quick and interesting read. There are a few profound nuggets in here, basically, that much of life is centered not on pleasure or happiness, but rather satisfaction. In fact, seeking pure pleasure or happiness is counterproductive, as is obvious to anyone slightly familiar with the virtues of saving, honesty, discipline and hard work. But Berns helpfully focuses on 'satisfaction' as the following: that which comes from novel insights (aha! moments). The key is to put one in situations where you are constantly getting aha! moments, situations that are not so difficult that you are clueless (ie, me in a super-string lecture), but not so easy that you see everything coming. Thus, I prefer Jeopardy! to Wheel of Fortune because Jeopardy! is hard, but not too hard, and don't like Thomas Pynchon or James Joyce novel because they are too difficult to generate epiphanies of satisfaction. He vaguely ties this to 'meaning', and I wish he did more there.

I always find it interesting that the keys to life, happiness, pleasure, satisfaction, don't have majors in college. Why waste time on math or politicial science if what we want is The Meaning of Life, or Sum(Pleasure from t=0 to inf)? Why don't corporations have 'departments of maximizing investor returns'? Because life is mainly about means to an end, which however vaguely defined (Subjective Well Being? Satisfaction?) is obviously acheived indirectly. And so with poverty reduction, and all sorts of vice reduction. You don't give money to poor people to get rid of poverty, nor outlaw guns to get rid of violence. It's no paradox, just the way life works, which is always a series of indirect proxies for things of interest.