A Response to My Critique of the Mitchell Report Study

I have received a response to my critique of Did Steroid Use Enhance the Performance of the Mitchell Batters? The Effect of Alleged Performance Enhancing Drug Use on Offensive Performance from 1995 to 2007 by Brian J. Schmotzer, Jeff Switchenko, and Patrick D. Kilgo.

What follows is the authors’ response. I will comment on this response within the next few days (Response now available). I thank the authors for responding to me.

— — —
Dear J.C.,

We read your review of our paper on your sabernomics.com website. In the interest of the pursuit of the truth, we have constructed a reply. We would be grateful if you would post it on your site. We have pasted our reply below. Please let us know if a different format would be helpful.

Thanks for your consideration.

-Brian Schmotzer, Pat Kilgo, Jeff Switchenko

Thank you for your recent detailed examination of our study “Did Steroid Use Enhance the Performance of the Mitchell Batters? The Effect of Alleged Performance Enhancing Drug Use on Offensive Performance from 1995 to 2007,” published in the Journal of Quantitative Analysis in Sports. We would also like to thank you for giving us this opportunity to respond to your concerns on your blog. You state that your goal is to find the truth. We share that goal and hope that this discussion will help push us all closer to that ideal.

To start, we want to point out a few important aspects of our study that were not relayed to your readers in your initial review. First, the coding of seasons as PED seasons was made prior to any data analysis. We did not make any decisions with an eye toward a certain final result. All of the authors began this study agnostic about the effect of steroids in baseball – we underwent the study because we wanted to know what the answer actually is. Second, our study was designed to be ultra-conservative and in almost all cases our data was literally transcribed from the Mitchell Report. Only the cases that were not entirely obvious were listed in the “adjudication” portion of our article. Third, our choices were made with full knowledge that there are hundreds of other ways this could be done. Indeed, the cases you remark upon (Barry Bonds, Mark McGwire and David Segui) were three of the most difficult, with Segui certainly being the most frustrating case for reasons we will explain.

Because of its conservativeness, we envisioned our analysis as assigning the “lower bound” estimate of the steroid effect. (If we can agree that HGH is ineffectual, we will concentrate on steroids henceforth.) Certainly we could have taken more liberties with the text and designated many more seasons as steroids seasons; in fact, we could easily have referenced other sources of steroid abuse data. But this, for obvious reasons, is a slippery slope. The Mitchell Report was not designed as a data source so its conclusions are already limited. Further, the ambiguity surrounding some of the dates for a few players (David Segui among them) further limits the analysis when conservative criteria are in place.

Thus, in the interest of full disclosure, we included a very detailed Methods section describing some of the harder choices that had to be made. Also, we wrote an extensive section in our Discussion that plainly describes the limitations of our approach.

Your characterization of some of our decisions as “obvious factual errors” warrants special attention. In fact, these are not coding errors but simply differences of opinion. Because we wanted to avoid the aforementioned slippery slope, we coded strictly by the dates listed in the report as corresponding to alleged steroid use. In some cases, however, this simple approach couldn’t adequately handle the ambiguities in the text.

This is perhaps best illustrated in the case of David Segui. He clearly is accused in the Mitchell Report of being a steroid user. There is no question about it. However, our task was to designate seasons of abuse – simply recognizing a player as an accused abuser was not good enough. Because of our strict criteria, we did not denote any of Segui’s years as steroid seasons. (We note here an actual mistake in our manuscript. We said 1995 and 2004 were denoted as steroid seasons when we actually denoted them as HGH seasons.) Consider the quotes from the Mitchell Report that you referenced:

“In 1999 or 2000, Chuck Hawke, an attendant working in the visiting clubhouse in Kansas City, found syringes and vials that were hidden in an Oakley sunglasses bag when he was unpacking luggage for David Segui” (p. 110).

This statement does not specify a specific steroid season because a) no direct steroid use is alleged (remember that Segui is an admitted HGH user as well) and b) it is clear only 1 year is implicated but that year is not specified – there is no way to choose between 1999 and 2000, so we were not able to assign either.

“According to Radomski, Deca-Durabolin was Segui’s steroid of choice in the 1990s because it was safe, did not expire for three to four years, and was thought to help alleviate joint pain. Deca-Durabolin, however, stays in the body for up to a year or more and therefore is easily detectable in tests. Radomski said that Segui paid for the steroids by check although Radomski never asked him to pay for them. Radomski produced six checks drawn on David Segui’s checking account that were deposited into Radomski’s checking account….Radomski said he engaged in more than twelve transactions with Segui and dealt with Segui more than any other player. Toward the end of his career, Segui told Radomski that he had a growth hormone deficiency and was getting human growth hormone from a doctor in Florida (p. 151).

Again, we cannot implicate any specific season – the “1990s” was not specific enough for us to code a certain set of years. Also, there is the similar problem as before – the checks could have been written for HGH. In terms of the checks, we viewed this as weaker evidence since many unnamed players presumably wrote checks to Radomski for innocent items. Also, several of the checks are illegible with respect to dates.

“McNamee first learned about Kirk Radomski through David Segui during the 2000 season” (p. 170).

“Kirk Radomski recalled meeting McNamee through David Segui. Radomski confirmed that he supplied McNamee with human growth hormone and anabolic steroids from 2000 to 2004” (p. 174).

“Radomski believed that Santangelo was referred to him by David Segui when both played for the Expos between 1995 and 1997” (p. 182).

“According to Radomski, he was introduced to Lansing by David Segui while Segui and Lansing played together with the Expos….Radomski produced two $1,000 money orders from Lansing, retrieved from his bank, made payable to Radomski; both were dated February 5, 2002” (pp. 196-197).

“Hairston was referred to Radomski by David Segui, his teammate on the Orioles from 2002 to 2004. Radomski said that he sold human growth hormone to Hairston on two or three occasions during 2003 and 2004” (p. 207).

While none of these quotes are particularly flattering for Segui, none of them define a steroid season allegation.

In conclusion, Segui is a frustrating case because he is clearly accused in the Report, but we found we just couldn’t assign any particular season to him based on our strict a priori conditions for deciding steroid seasons. It is easy to disagree with the outcome (no steroid seasons for Segui) because it doesn’t match common sense. However, our decisions were not made lightly, and we hope you can see the merit of our methodology even if you don’t agree with it.

Next consider McGwire. The discrepancy here is not about the designation of the 1998 season, it is about the designation of “andro”. Should it be classified as a steroid or not for these purposes? We freely admit to not being experts in the field of PEDs. We adhered to the layman’s explanations given in the media that andro is a precursor to steroids. Though it is a banned substance, the rationale behind our inclusion of this season is certainly debatable.

Finally consider Bonds. First, you suggest that 2001 should be labeled a steroid season. But this is based on the BALCO evidence and the Game of Shadows (or more precisely, the Mitchell Reports references to those sources). While these sources may be reliable, they are irrelevant for the present discussion because we used the seasons denoted by the Mitchell Report as our sole data source. This is obviously suboptimal (as we acknowledge in our paper) but based on this methodology the 2001 season should not be labeled a steroid season. To bring in other sources would be to slide down that slippery slope headfirst with no chance for objectivity to survive. Second, you suggest that 2004 should not be labeled a steroid season because the BALCO mess occurred in 2003. However, the Mitchell Report states that Anderson was removed from the clubhouse in 2004 but continued to work with Bonds after that. Further, the Giants asked Bonds to have no contact with Anderson early in 2005. Although we cannot confirm that Bonds did stop dealing with Anderson at that time, our conservativeness suggested that 2005 should not be a steroid season. But it seems that it is still reasonable to call 2004 a steroid season under that scenario (pages 126-127). This, too, is debatable.

As you can see, the denoting of steroid seasons in some cases is a complicated task. We have made every effort to be conservative in our designations and to base them (to the extent possible) on strict evidence from the Mitchell Report. Luckily, the majority of cases were straightforward and are highly unlikely to contain any mistakes of note. Unluckily, a few of the cases were more difficult. We hope the above explanations illuminate more fully our rationale on those more ambiguous adjudications.

In addition to the steroid season designations, you made several other statements about our paper that we feel deserve attention.

First, you suggest that the results are “fragile”. Quite the contrary. Under a variety of analysis assumptions, the steroid effect was always positive and nontrivial.

Second, you note that when Bonds is excluded from the model, the steroid effect is “not statistically significant”. It is understandable to infer that we did not focus our discussion on statistical significance because the p-values for some models were large. However, this is not the case. (We could have easily omitted p-values from our paper or omitted models with large p-values from our paper if we felt we had something to hide.) In fact, we presented p-values because it is traditional to do so for statistical models. But in reality, they are largely (one could argue, completely) irrelevant for this study. This is a census, not a sample. There is no sampling variability. The effect we observed in our models is the true effect by definition.

Third, you state that you “don’t like the method used to account for aging”. Based on the data, the age effect is small. Our simple method is clearly not the best option available (as we acknowledge in our paper), but it should be sufficient to get the job done. There is certainly no evidence that our simple method would “bias” the results. The worst that it could be accused of is adding extra variance to the proceedings (which would weaken, not strengthen, our results).

Fourth, you state that a mixed model “isn’t one that I would choose”. We would be interested to hear your reasons for this. The mixed effects model is the gold standard for repeated measures data with both fixed effects (PED use) and random effects (player). We are aware of no other analysis method that would be sufficient for the problem at hand.

Lastly, the most common request we hear when we talk to people about our study is for a new analysis with more liberal criteria employed for determining steroid seasons (you made the same request). We resist the urge to comply because of the slippery slope. It would be easy to call many of McGwire’s best seasons (not just 1998) steroid seasons. Similarly for Bonds or even Sammy Sosa (although he’s never been formally accused). There is little doubt that we could “cherry-pick” seasons to label as steroid seasons that would lead to a massive estimate of the steroids effect. However, this would be an egregious example of selection bias on our part and is not befitting our goal of remaining impartial researchers investigating the truth.

Another arena we want to avoid is providing a new analysis every time we hear about a critic who disagrees with our steroid season designations. While you disagree with us regarding Bonds, McGwire, and Segui, the next critic will invariably disagree with us both. To look at every conceivable combination of steroid seasons is simply not possible. Nevertheless, we have attempted to provide an analysis that closely matches what your review suggests. To that end, we modified our database in the following ways (with quotes from your review):

— Bonds: We added 2001 and 2002 as steroid seasons and removed 2004 (“The Mitchell Report does not pinpoint any PED use in 2004; however, it does document his first visit to BALCO in 2001”)

— McGwire: We removed 1998 as a steroid season (“if McGwire’s hitting success was fueled by PED use, andro was not the culprit”)

— Segui: We added all of his seasons as steroid seasons (“There appears to be strong evidence that Segui was an active user of steroids continuously during his major-league career”)

— Age Adjustment: We applied an age adjustment based on your latest study on aging in baseball – specifically we used the OPS column from Table 7, which shows the percent difference from peak performance by age (OPS is highly correlated with RC27, it seemed like the best choice) (“I don’t like the method used to account for aging, and I believe the method might bias the results”)

Below is the table of results from our paper versus the new results for this analysis.

Percent increase in offensive performance due to steroids:

# 1 2 3 4 5 6 7 8 9 10 11 12
Old 12.6 7.2 9.1 18 7 3.9 4.9 11.9 11.3 7.7 6.2 4.2
New 16.8 8.1 10.6 22.6 7 3.4 4.4 12.5 14.9 9 6.1 3.8

As you can see, the results are striking. In 9 out of 12 models, the estimated effect is the same or larger using your suggestions. The other 3 models see a modest decrease of about half a run.

(As a reminder, models 1-8 include all player seasons with at least 50 PA, models 9-12 include player seasons only from players mentioned in the Mitchell Report. Models 1-4 and 9-10 include Bonds, the other don’t. Model 1 includes a fixed effect for Mitchell players, model 2 centers each player on his own average, model 3 makes both adjustments, model 4 makes neither, and models 5-8 follow the same pattern. Models 9 and 11 are uncentered, models 10 and 12 are centered.)

To sum up, we feel that many of the issues you have raised as “obvious factual errors” should more fairly be described as differences of opinion. Our methodology, while not perfect, is defensible and leads to the best estimate of the steroids effect that has yet been published. Furthermore, the essential conclusion of our study does not change even when incorporating your suggested changes, which shows just how robust the result is. Based on the information that is out there in the Mitchell Report, we don’t see how you can come to any conclusion except that there is a substantial positive effect on offensive performance due to steroids.

Thanks again for the opportunity to respond. We hope this has been interesting for you and your readers; certainly your comments have been interesting and thought provoking for us.

Brian Schmotzer

Patrick Kilgo

Jeff Switchenko

— — —

My response.

2 Responses “A Response to My Critique of the Mitchell Report Study”

  1. A.West says:

    I just looked at the original paper. Is it possible that this just proves that the authors of The Mitchel Report focused their investigations on players who had good seasons, because they assumed that a good seaon was presumably associated with steroid use. If they had focused their investigations on weak teams with losing records and poor player performance, they would have still found some steriod use, and the report would have shown steriods hurting performance.

    However, I’m quite fond of panel data/mixed effects models, and think it ought to be used more in baseball statistical analysis.  Seems a natural match for the structure of the data – tracking multiple individuals over time. Why don’t you like it, JC?

  2. JC says:

    I don’t have a problem with panel data/mixed effects. I run panel data models  frequently, I just don’t normally employ mixed effects.  In my critique I stated that I believed that mixed effects would work. I was noting that I would not address whether or not this method would work, but to assume that it would work without agreeing that it would work. The confusion is understandable, I probably should have excluded the sentence.

    I’ll try and post my full comments on Monday. This Gwinnett County business is taking up quite a bit of time.