My response to Brian J. Schmotzer, Jeff Switchenko, and Patrick D. Kilgo’s reply to my criticism of their study follows. I would like to thank the authors for offering their response; however, I do not think their explanations succeed in validating their study.
First let me address a few minor issues about which I will not go into significant depth. I have no problem with mixed effects, it just isn’t the model I would have used. In fact, in my initial critique I stated, “I suspect that it ought to get the job done”. Aging also is not a big issue, but I am appreciative that the authors took the steps to re-estimate their model according to my previous analysis. It appears that aging adjustments do not make much difference.
I see that our disagreements boil down to two points on which I will focus my remarks: coding and the statistical significance of the estimates.
On coding, the authors main argument is that coding is difficult in some cases, and that their coding choices “are not coding errors but simply differences of opinion.” We certainly do share a difference of opinion on this matter—after all, all disagreements come down to opinion—but, the authors claim that the coding that I previously believed resulted from human error was the result of conscious action. Thus, in this sense, their designations are deliberate choices, not coding errors. I believe that several of the designations discussed reveal a pattern of inconsistent assignment of steroid use that does not follow from their stated methodology. There are two main player examples at issue: David Segui and Barry Bonds.
In terms of Segui, the authors state the following.
He clearly is accused in the Mitchell Report of being a steroid user. There is no question about it. However, our task was to designate seasons of abuse – simply recognizing a player as an accused abuser was not good enough. Because of our strict criteria, we did not denote any of Segui’s years as steroid seasons. (We note here an actual mistake in our manuscript. We said 1995 and 2004 were denoted as steroid seasons when we actually denoted them as HGH seasons.)
First, the authors did make an error. 1995 and 2004 were not steroid seasons. This explains my confusion when I stated “I can find no explanation for the authors’ chosen steroid designations of 1994–1995 and 2004–2005.” Thus, the error was not in the coding, but in the mis-reporting of the coding. But it creates new confusion: why were 1995 (especially) and 2004 coded as HGH seasons? I can find no supporting documentation for these designations in the Mitchell Report. The authors did not provide an explanation in their response.
Second, the authors then include a list of quotes of accusations in the Mitchell Report to demonstrate ambiguities in the accusations. I appreciate the authors’ difficulty, and do not have a problem with employing a conservative standard. I stated in my initial critique, “This is certainly a defensible method—though, I would like to have seen results with “a more liberal reading” as well—however, I lack confidence that the authors employed their designation properly.” Upon further review, I still find their coding choices to be strange, in that the coding does not appear to form a consistent pattern.
The Mitchell Report includes four legible checks from Segui to Kirk Radomski from 2002–2004. The authors state:
the checks could have been written for HGH. In terms of the checks, we viewed this as weaker evidence since many unnamed players presumably wrote checks to Radomski for innocent items.
On page 151 of the Mitchell Report, Radomski states that he sold steroids to Segui, even going so far as to say that the growth hormone he was getting came from a doctor in Florida. The checks themselves are the most damning evidence in the entire report. This is physical evidence, backed with testimony from both parties that Segui’s relationship with Radomski was one of steroid seller and steroid purchaser. I believe it is unreasonable to assume that these checks could be for “innocent items.” If this is not considered “Other ancillary items including the source of the allegations and whether there is a paper trail of evidence”, then I don’t know what standard the authors have set, especially when viewed with flimsiness with which Segui and Bonds were identified as users at other times in their careers.
The authors state on page 3 of their paper Segui’s “paper trail of evidence with Radomski begins in 2004 and lasts through 2005.” Where does the 2004 date come from? David Segui didn’t even play baseball in 2005. Where did the 1994 and 1995 designations (see above) come from? I understand the difficulty in the decision to designate other years as steroid (or HGH) years, which is why I stated, “At the minimum, 2002 and 2003 should be listed as dirty.” I was acknowledging by their own standards he should have been declared a user at these times.
The case of Segui becomes more problematic when compared to Bonds’s case. If the physical evidence regarding Segui doesn’t meet the standard of “Other ancillary items including the source of the allegations and whether there is a paper trail of evidence”, then Bonds should have been listed as clean for the entire sample.
First, you suggest that 2001 should be labeled a steroid season. But this is based on the BALCO evidence and the Game of Shadows (or more precisely, the Mitchell Reports references to those sources). While these sources may be reliable, they are irrelevant for the present discussion because we used the seasons denoted by the Mitchell Report as our sole data source. This is obviously suboptimal (as we acknowledge in our paper) but based on this methodology the 2001 season should not be labeled a steroid season. To bring in other sources would be to slide down that slippery slope headfirst with no chance for objectivity to survive. Second, you suggest that 2004 should not be labeled a steroid season because the BALCO mess occurred in 2003. However, the Mitchell Report states that Anderson was removed from the clubhouse in 2004 but continued to work with Bonds after that. Further, the Giants asked Bonds to have no contact with Anderson early in 2005. Although we cannot confirm that Bonds did stop dealing with Anderson at that time, our conservativeness suggested that 2005 should not be a steroid season. But it seems that it is still reasonable to call 2004 a steroid season under that scenario (pages 126-127). This, too, is debatable.
First, I don’t necessarily think 2001 should be coded as a steroid season. As readers of this website know, I have been hesitant to condemn Bonds. He did first visit BALCO in 2001, but it is the start of the steroid designations in 2004 that make no sense to me. And the designation does not fit with with the excessive conservativeness for designating Segui and Bonds’s prior use.
— Four checks and testimony from the parties involved that these facilitated the purchase of steroids does not constitute a paper trail for Segui.
— Leaked grand jury testimony (which has since been released to the public) in which Bonds admits taking substances in 2002 and 2003 that prosecutors identified as performance-enhancing drugs (Bonds did not believe “The Cream” or “The Clear” were steroids) is not sufficient evidence of use. The Mitchell Report relies much on the investigation of Bonds, just as it relies heavily on the government investigation of Kirk Radomski.
— The Giants dismissing Anderson from the clubhouse in 2004 at a time when Jeff Novisky is monitoring Bonds’s every move and MLB has instituted testing is considered to be evidence of use.
In their rebuttal, the authors conclude “our decisions were not made lightly, and we hope you can see the merit of our methodology even if you don’t agree with it.” I see merits to a conservative “paper trail” methodology, but in practice the designations do not make sense. The implementation is seriously flawed. If I arranged the above choices on a spectrum of conservative and liberal designations, the one that gets coded dirty is the most liberal of the three. The coding of steroid use is not consistent, and I do not see how these designations can be defended as reasonable choices.
As you can see, the denoting of steroid seasons in some cases is a complicated task. We have made every effort to be conservative in our designations and to base them (to the extent possible) on strict evidence from the Mitchell Report. Luckily, the majority of cases were straightforward and are highly unlikely to contain any mistakes of note. Unluckily, a few of the cases were more difficult. We hope the above explanations illuminate more fully our rationale on those more ambiguous adjudications.
I disagree. In fact, when I re-read portions of the Mitchell Report after reading their paper, I found identifying specific years of use for most players listed in the report to be a difficult task. Segui and Bonds actually seem to be two of the simpler cases, and there does not appear to be a consistent standard for coding their potential use of steroids. And here is the real problem with the coding: where I can see what is being done, I observe it is being done incorrectly. This leads me to believe that coding is being done incorrectly with other players as well. I have no faith in the analysis, especially considering this study’s findings are at odds with the findings of other studies of Mitchell Report players by Cole and Stigler and The Milwaukee Sentinel (which interpreted its own findings incorrectly).
The authors also take the step of re-estimating the effect of designations on performance while adjusting the designations according to my suggestions. The new estimates are superior to the old estimates; however, this doesn’t allay my fears that coding errors may exist elsewhere. In addition, the authors do not report standard errors for determining statistical significance for their new estimates; therefore, it is impossible to know if “dirty” players improved their performances relative to “clean” players. This leads to the next point of contention: statistical significance.
I want to address the response to the robustness of the the results.
First, you suggest that the results are “fragile”. Quite the contrary. Under a variety of analysis assumptions, the steroid effect was always positive and nontrivial.
I did not dispute that the reported coefficient estimates are consistent across many specifications. I state the results are fragile, because when Bonds is removed, “dirty” players do not perform better than “clean” players at a statistically significant level. The fragility of the results comes from the fact that when Bonds is removed from the sample, the coefficient estimates of steroids use are no longer statistically different from zero. For readers not familiar with statistics, this means that the impact for those steroid-designated seasons is not meaningfully different from non-steroid seasons, given the typical deviations in performance for all players in this sample.
Second, you note that when Bonds is excluded from the model, the steroid effect is “not statistically significant”. It is understandable to infer that we did not focus our discussion on statistical significance because the p-values for some models were large. However, this is not the case. (We could have easily omitted p-values from our paper or omitted models with large p-values from our paper if we felt we had something to hide.) In fact, we presented p-values because it is traditional to do so for statistical models. But in reality, they are largely (one could argue, completely) irrelevant for this study. This is a census, not a sample. There is no sampling variability. The effect we observed in our models is the true effect by definition.
This is a curious response. The authors reported one significant p-value estimate (Model 1) in the paper’s abstract as evidence of a performance effect. “The effect of steroid use was an additional 0.58 ADJRC27, an increase in production of 12.6% (p=0.0108).” At one time, the authors believed the p-value to be relevant, and they were correct in their belief.
Now, they suggest that the p-values are not relevant because “This is a census, not a sample. There is no sampling variability. The effect we observed in our models is the true effect by definition.”
First, this is a sample that includes 1336 players had 50 plate appearances in a season from 1995–2007. Second, I am confused as to why the authors think this would be relevant if it was true. The p-values of the steroid coefficient reveal the likelihood that the two cohorts (because this is a binary variable) performed differently from one another. That is why the statistical programs the authors employed reported p-values along with the coefficient estimates. When Bonds is excluded from the sample, the p-values are greater than 0.05 in five of the six models that exclude Bonds (5,6,7, 11, and 12; 9 is the exception). The variance of the performances (which the p-value measures) is key. The standard errors indicate that though the estimate of change in performance in steroid seasons is positive, it is not outside the normal variation in player performance. The fact that the authors did not provide p-values in the revised tables leads me to believe that the new estimates are not statistically significant, either.
In summary, though the authors have offered responses to my criticisms, their rebuttal falls short of rectifying the problems that I previously identified. The coding choices are not consistent with the stated methodology, which makes it reasonable to assume that other coding problems exist. Also, estimates do not indicate that players coded as steroid users perform better than non-steroid users at statistically significant level. I thank the authors for their reply, but I do not agree their conclusions.