## Statisticians Examine the Mitchell Report’s Findings

Earlier this week, I posted a link to a study published in the Milwaukee Journal-Sentinel that looked at the changes in performance by players discussed in the Mitchell Report. Frank Stephenson took the study to task for not properly interpreting the data.

In today’s New York Times, two professors with strong backgrounds in statistics, Jonathan Cole (sociologist, Columbia) and Stephen Stigler (statistician, University of Chicago), report their analysis of players mentioned in the Mitchell Report.

For pitchers identified by the report, we looked at the annual earned run average for their major league careers. For hitters we examined batting averages, home runs and slugging percentages. We then compared each player’s yearly performance before and after he is accused of having started using performance-enhancing drugs. After excluding those with insufficient information for a comparison, we were left with 48 batters and 23 pitchers.

For pitchers there was no net gain in performance and, indeed, some loss. Of the 23, seven showed improvement after they supposedly began taking drugs (lower E.R.A.’s), but 16 showed deterioration (higher E.R.A.’s). Over all, the E.R.A.’s rose by 0.5 earned runs per game. Roger Clemens is a case in point: a great pitcher before 1998, a great (if increasingly fragile) pitcher after he is supposed to have received treatment. But when we compared Clemens’s E.R.A. through 1997 with his E.R.A. from 1998 on, it was worse by 0.32 in the later period.

Hitters didn’t fare much better. For the 48 batters we studied, the average change in home runs per year “before” and “after” was a decrease of 0.246. The average batting average decreased by 0.004. The average slugging percentage increased by 0.019 — only a marginal difference. So while some batters increased their totals, an equal number had falloffs. Most showed no consistent improvement, several showed variable performance and some may have extended the years they played at a high level, although that is a difficult question to answer.

This confirms Stephenson’s simple analysis. I’m sure it will be easy to find quibbles and possible alternate explanations for these results; but please, keep in mind that the authors are limited by what they can say in 800 words.

Aside: Stephen Stigler is the son of Nobel Prize winning economist George Stigler.

Thanks to Repoz for the pointer.

### 9 Responses “Statisticians Examine the Mitchell Report’s Findings”

1. Tom says:

I’d argue that the players who used PED’s were trying to either maintain their performance after recovering from a recent injury or simply trying to find a fountain of youth. It would be foolish to compare Roger Clemens’ stats after the alleged doping (agesag 34-44) to before (ages 21-34) because this is assuming that the level of performance should be held constant over an entire career– which is inaccurate. Looking for performance improvement relative to his prime years isn’t very telling. However it would be telling if there was some way to look at what Clemens would have done without the PED’s.

I think the only way we can do that is to speculate how his career would have depreciated by comparing him to similar pitchers in their careers. But even then, we can only conclude so much.

It is never easy to value opportunities not taken– ie. Clemens staying clean for the “twilight of his career.”

2. Richard Pollock says:

Did the professors take into the possibility (and probable likeliness) of these players declining with age and, therefore, the success they maintained may have been the same as earlier in the career, when normally their stats would/should deteriorate?

3. John Beamer says:

A valiant study but one that will yield few interesting results. In my mind the conclusions are more or less worthless.

For a start we don’t know the length and intensity of the supposed steriod/ HGH/ whatever plan.

Also when comparing performance one should really compare projected performance to actual performance, that is the correct way to do it. Also, if you are comparing before or after you’ll have to adjust for age (as Richard says). With only 10s of players in the sample the size I suspect will be too small to draw much inference.

4. JC says:

Here is how science works.
1. Develop hypothesis.
2. Test hypothesis.
3. Test hypothesis again [repeat if necessary].

Of course, there are many possible problems with what the authors have provided within the 800 words allotted to the authors by the New York Times. There are also many other way to test the hypothesis that PEDs have no effect on performance. If you want to offer a criticism, proceed to step 3. Until I see another study, I feel safe operating on the assumption that the null hypothesis has not been rejected.

5. John Beamer says:

JC — agree completely. Sorry if my remark came across as snarky.

I just don’t believe that the study these profs have done adds to the evidence either way. Based on the methodology we are no closer to knowing if there was or wasn’t a performance change.

Just because the authors only had 800 words to play with does not mean they couldn’t have done a more comprehensive statistical analysis ie, adjusting for age.

6. trert says:

I always figured that Clemens started roiding up in 1997. I know it’s not in the mitchell report but that is my opinion.

7. I am shocked, although it is obvious the two men are more statisticians then baseball fans. While they managed to appropriately use RA (unless that is a typ-o) the use of batting average and slugging percentage is a little strange.

While the age argument is valid, consider that the majority of player start their careers off relatively slow taking upwards of 900 at bats to finally get into their major league groove. That said, while some may have been trying to fetch the fountain of youth, I gather that could be proven incorrect by looking at their post PED statistics – ie when testing occurred.

8. Mike says:

This is a really, really bad study. Certainly there’s the age effects there, which others have underscored. Then there’s the length-of-treatment effect – why should Brian Roberts or Andy Pettitte have years of baseball fall under the “with steroids/HGH” if they are telling the truth when they say they only used them once or twice?

I mean, this study absolutely requires some kind of baseline or expectation with which to measure players against. Without it, you learn absolutely nothing.

I love the part about Babe Ruth too. His first 6 years he was a PITCHER, so that kind of back-end loads his HR production anyways. Also, his last 6 years were all 2 years younger than Bonds’ last 6 years. Not a huge difference, but imagine if his last 6 years were age 24-29. Obviously you expect someone to produce more HR relative to their career the closer they are to their “prime”.

Anyways, I realize this is a newspaper article and not a peer-reviewed journal article. But if these guys are going to put their titles at the end of it and represent their (good!) schools, I think they should probably make sure their work is methodologically sound. I don’t understand how you can feel comfortable with their findings, given the fact that they controlled for NOTHING in their experiment.

9. JC says:

This study is not “really, really bad.” It is simple, possibly too simple, but parsimony has its advantages. And this is normally where we begin. Do the authors claim that this is an exhaustive study of the subject?

Our results run contrary to the prevailing wisdom. One reason might be that most baseball skills depend primarily upon reaction times and judgments, factors unaffected (or even degraded) by these drugs. Also, in a team sport like baseball, other variables affect individual performance: quality of one’s teammates, home ballpark, changes in the strike zone, injuries and pitching. These factors could mask very slight performance changes like the ones we found.

It is possible (but not addressable by these data) that one effect of drugs is to help players compensate for decline as they age, and thus to extend their careers. But there is no evidence in these data for performance enhancement above previous levels.

And for all of the outrage I keep reading about this article, I’ve yet to see someone correct for these obvious problems. In any event, just from eyeballing the players’ statistics listed in Mitchell Report, I see no obvious impacts.