Monday, February 14, 2005

What's in a Number

In an effort to justify the blogosphere's reputation as a "self-correcting" medium, I offer you the following.

An entry I posted last Thursday included this juicy morsel:
They were showing support for the policies of George W. Bush -- the same policies that have led to the deaths of as many as 100,000 innocent Iraqis.
A few days later, during one of my semi-regular excursions onto the dark side, an intellectual jouster followed me home and came upon the phrase quoted above. He then chastised me for advocating the "long debunked" 100,000 Iraqi civilian death figure.

Given where I was lurking and who was reprimanding me, I wasn't terribly concerned. But, I realized that, in truth, I didn't really know all that much about where that figure came from or what it really meant. In the context of the post in which it appeared, the validity of the figure was not terribly consequential (accepting the 25,000-30,000 figure proposed by my critic would not have substantially undermined my argument). Still, I felt it was worth my while to research the issue so that I might approach it in the future from a position of authority.

And, as it turns out, a correction is in order.

The figure (which turns out to be, in fact, 98,000) arises from a study (pdf) that was published this fall in The Lancet. The study was an epidemiological investigation that used a sampling methodology to determine the death rate in Iraq before and after the American invasion. The study did generate a fair amount of controversy and condemnation from both the left and the right. Unfortunately for the naysayers, much of the criticism is nothing but -- well -- horseshit.

So much for it being "long debunked."

That said, there are some clarifications that should be made.

First, as I said above, the study was attempting to determine the net change in the death rate in Iraq before and after the start of the war. The 98,000 figure, therefore, was an extrapolation of the study's results and not an actual count. Since this is a completely valid statistical method to employ, this is a rather nuanced distinction to make. It is worth knowing, though.

More important than that, however, is what the study is not trying to say. Due to the macroscopic focus of the study, the authors made a deliberate decision to record all deaths, regardless of causality. No attempt was made to differentiate between combatants and noncombatants. Therefore, it is not accurate to say, as I did, that the 98,000 figure refers to civilians. Likewise, the data was not restricted to deaths caused by coalition forces. Deaths that occurred as a result of Iraqi military, the insurgency, and of natural, nonviolent events were all included in the study's data set. Therefore, one must be careful not to imply that 98,000 deaths resulted directly from actions taken by coalition forces (although, it is fair to argue indirect responsibility).

Finally, it is important to acknowledge that this is a single study and that it should not be the final word on the subject. If we truly care to know the answer to this question, other research using different methodology must be applied to the situation. This study takes an important first step, but it hardly puts the issue to rest.

OK -- enough with the caveats. Let's dig in to the actual statistics.

One point that is frequently misinterpreted by both sides is the nature and meaning of the study's confidence interval. For those of you who are not conversant in the details of research statistics, a confidence interval is a statistical calculation which reveals how certain researchers are that a measured value represents the actual value of the phenomenon being studied. Traditionally this is expressed by citing a range of values between which the actual value is believed to reside. The size of this range is determined by the methods used in the study and by the degree of confidence (hence the name) that the authors are attempting to express. Generally the level of confidence must be 95% before the results are considered statistically significant.

For many, this confidence interval is where the validity of the study appears to break down. Due to the methods employed (most notably, the small sample size), the 95% confidence interval for the study is 8000-194,000 postinvasion deaths. This range of values causes otherwise reasonable people to lose their minds. Case in point:
Readers who are accustomed to perusing statistical documents know what [that range] means. For the other 99.9 percent of you, I'll spell it out in plain English—which, disturbingly, the study never does. It means that the authors are 95 percent confident that the war-caused deaths totaled some number between 8,000 and 194,000. (The number cited in plain language—98,000—is roughly at the halfway point in this absurdly vast range.)

This isn't an estimate. It's a dart board.

Cute. According to Mr. Kaplan, any value between 8000 and 194,000 is equally likely. For everyone out there who hasn't taken Introduction to Statistics, that assertion is ridiculous. Despite this wide range of possible values, 98,000 is still the most likely value. It is considerably more likely than values that exist closer to the outside of this range.

However, the range does demonstrate the existence of uncertainty with respect to the 98,000 figure. Due to the sample sizes involved (and unavoidable consequence of performing research in a war zone), it is difficult to say exactly how many excess deaths have occurred postinvasion. It could be 8000 or less (a 2.5% chance exists for this possibility). It could also be 194,000 or more (again, 2.5% probability for this outcome). Honesty requires that we acknowledge this reality.

That said, the study makes one inarguable point that is frequently lost upon those who seize upon the implied uncertainty in the confidence interval. Here's Daniel Davies to educate us:
Although there are a lot of numbers between 8,000 and 200,000, one of the ones that isn’t is a little number called zero. That’s quite startling. One might have hoped that there was at least some chance that the Iraq war might have had a positive effect on death rates in Iraq. But the confidence interval from this piece of work suggests that there would be only a 2.5% chance of getting this sort of result from the sample if the true effect of the invasion had been favourable. A curious basis for a humanitarian intervention; “we must invade, because Saddam is killing thousands of his citizens every year, and we will kill only 8,000 more”.
I lose track of the current justification for the Iraq invasion. But it seems to me that we floated the "improving lives of the Iraqis" rationale at some point. Nothing improves one's life as little as death does, so death reduction appears to be fair metric by which to measure how we are faring in this regard. Unfortunately, this study unequivocally demonstrates our failure to accomplish this goal. A 97.5% probability exists that at least 8000 people have died who otherwise would not have if we had not invaded.

Personally, on issues as complicated as the justification for war, I am reluctant to employ cost/benefit analysis. There is simply no rational way to evaluate, in the moment, the costs and benefits of such an action. However, my reluctance is shared by few of those who would justify its cost by pointing to the tyranny of the previous regime. To these individuals we can say, with near certainty, that death is on the rise now that we are on the scene.

And so, I stand corrected. Never again shall I imply, directly or indirectly, that 100,000 civilians have died in Iraq. I will be clear that we do not know how many deaths we are responsible for. I have learned my lesson. I will, from this point forward, merely state that it is very likely that we have greatly inflated the ranks of the Iraqi dead.

I guess he showed me.
Weblog Commenting and Trackback by