Archive for February, 2008
Justin Inaz interviews me over at On Baseball and the Reds. Here is an excerpt.
Question: I’ve seen reports critiquing the use of public funds for MLB stadium projects. Hamilton county (through a half-cent sales tax increase) has invested an enormous amount of money into revitalizing the Riverfront area, which has included the construction of the Reds’ and the Bengals’ new stadiums. How do you view these sorts of stadium projects, from the perspective of economic return to the cities or counties that pay for them?
JCB: The economic return is zero. I have not seen a single study that shows a positive impact from such efforts. The state of the research is such that if an economist attempted to publish another economic impact study, that journal editor would reject the study for being redundant. Spending on sports replaces spending that would have happened anyway. The idea that these projects confer financial benefits is a myth. Any positive return to taxpayers is non-monetary.
And you don’t need studies to understand this. I go to about five Braves games a year. It’s been 10 years since the Braves started playing at Turner Field (which was the 1996 Olympic Stadium). The area surrounding the stadium is a dump. It’s the type of place where at night, you run red lights just to get out of the area faster. There are no restaurants, bars, or shops. Those things are inside the stadium. The same is true for many stadiums around the country.
I have no problem with a community reaching a political decision that it will chose to raise taxes because its citizens value having a sports team. But, it really bothers me when people argue that these projects are beneficial. It is time to be honest and say, “if you want to pay $5 more in taxes a year, we can host a team or have a nicer stadium.” As long as everyone agrees that this will cost money, not make it, I have no problem with government choosing to fund such projects.
Thanks to Justin for asking me the questions.
Last week, I became aware of a study by economists Eric Gould and Todd Kaplan that evaluates at the impact of Jose Canseco on his teammates. They examine the belief that Canseco distributed his knowledge about steroids throughout baseball by introducing many of his teammates to performance-enhancing drugs. If this was the case, the authors hypothesize that he ought to have left a trail of improved performance among teammates in his wake.
The authors look at the careers of Canseco’s teammates to investigate this claim. Their method is to examine players to see how well they perform as a Canseco teammate and afterwards, relative to the years preceding involvement with Canseco. The idea is somewhat similar to what I did with my analysis of Leo Mazzone’s impact on pitchers (see Chapter 5 of my book).
After reading the study, I am not convinced by the authors’ conclusions. It’s not just one thing, but a collection of issues that form my opinion. I have problems with both the study’s design and the interpretation of the reported results. My disagreement does not mean that the effect does not exist, only that I do not see a pattern consistent with Canseco spreading steroids to his teammates.
First, I want to start with the sample. The authors look at players from from 1970–2003. I find this an odd range of seasons to select. Canseco’s career spans from 1985–2001. Why start a decade and a half before Canseco enters the league and stop two years after he exits? The asymmetry bothers me largely because the run-scoring environment preceding Canseco was much lower than it was during the latter part of his career. But even without this, it is a strange choice to make. I can only guess that there is some teammate of Canseco’s whose career extends back this far, but I still don’t agree with the choice. And why not extend the sample until the present?
Next, the authors set the cutoffs for examining player performance at 50 at-bats for hitters and 10 games for pitchers. These minimums are far too low even when stats are normalized for playing time, but the impact is much worse when looking at absolute statistics like total home runs, which the authors do. For pitchers—who I will not examine here—it’s possible to get pitchers who pitched very few innings.
The authors also make a strange choice to break hitters into two classes: power and skilled players. The idea is that we might see different effects on the different styles of play. I don’t agree with this, but that is not the weird part. The way they differentiate power and skilled players is by position played, weird but moderately defensible. The power positions are first base, designated hitters, outfielders, and catchers. The skilled positions are second base, shortstop, and third base. And it becomes clear that the authors are not all that familiar with baseball. Catcher is a “power” position? Third base is a skill position? I suspect that the catcher and shortstop positions produce the least offense of all the positions. Sure, you can point to a power-hitting catcher like Mike Piazza, but you can also point to a punchless first basemen like Doug Mientkiewicz, but in general catcher and first base are at the opposite ends of defensive skill with very different offensive expectations. Center field is also a defensive position that should not be lumped in with the corner positions. This highlights the problem of separating power potential by position. And, it’s not so much that the way that the sample spliced—which don’t like—but the fact that it is being spliced at all makes me suspicious.
The choice of dependent variables is also bit strange. While the authors are mainly looking for changes in power, they pick only a few metrics that measure power: HR, SLG, and HR/AB. The other statistics include AVG, RBI, K, BB, IBB, at-bats, fielding percentage, errors, and steals. I have no problem with AVG. RBI is completely useless since it is largely dependent on teammates. K, BB and IBB are chosen because they correlate with home run hitting. But, performance in this area is also correlated with other things such as plate discipline, and the authors are already looking at home runs. This just adds columns to the regression table, that would have been better-used doing robustness checks on the sample and control variables. I would have liked to have seen isolated power (SLG–AVG), HR/H, OBP, and OPS.
As for the control variables, many of the choices are not intuitive. The batting average of the division (subtracting out own-team performance), the manager’s lifetime winning percentage, the batting park factor, years of experience (listed as a continuous variable in the text, but reported as a matrix of dummies in the regression tables), year effects, and dummies for each division. Also, the equation is estimated with fixed effects to control for individual player attributes.
I wouldn’t have chosen some of these same variables, but I don’t think they make much difference. However, I am perplexed by the inclusion of manager’s winning percentage and division dummies. I don’t see any obvious potential bias from the quality of the manager. In any event, managerial dummies are probably the better choice. Mangers with players who perform better will have higher winning percentages, so a positive correlation is to be expected, but the causality is difficult to determine. However, this isn’t a huge issue.
The division dummies make no sense. The divisions changed their compositions at several points during the sample—the most extreme change occurs when a Central Division was added to both leagues in 1994—and there are no common rules or kinds of play that are really unique to any division. If there was such an effect, the batting average of the division and year effects should catch this. It would have made more sense to include league dummies, because of the significant differences in play between the leagues after the introduction of the DH in 1973. In any event, the authors state that the control variables do not alter the results. I would have liked to see some results with different controls.
Now, to the variable(s) of interest. When I initially looked at the study, flipped to the regression tables first and noticed that there did not appear to be a “Canseco effect,” because the estimate on playing with Canseco was not statistically significant. But, that is not what the authors use to quantify Canseco’s impact; we are supposed to look at a second variable that identifies the seasons after playing with Canseco. The intuition is that “even if he did learn steroids from Canseco, we do not know when he learned about it during his time with Canseco, but we can be sure that he already acquired the knowledge after player with Canseco” (p. 10). I just don’t buy this. I understand that it might take a while for the effect to kick in, but this should still manifest itself in the “played with” variable, especially because many players played with Canseco for multiple seasons. At best this story makes sense only for guys who might have played one season with Canseco (more on this below). Second, anabolic steroids work quickly, so it’s unlikely that there would be a delayed effect.
After reading the paper, I came to the conclusion that the results are probably fragile. So, I designed a similar, but not identical, dataset. I did almost everything the authors did, except I did not break the sample into power and skilled players, and I included league dummies instead of division dummies, because I feel this is a superior choice. I also kicked out some partial seasons when guys switched teams to make life easier in developing the dataset. Thus, what I am doing is “replication” in the sense of looking for a similar result in the data, rather than trying to recreate the previous estimates. If the result is real, then I should find something similar. Here is what I found looking at raw home run totals (control variable estimates not reported).
HR HR HR/AB HR/AB AR(1) 50 AB 200 AB 50 AB 200 AB Corrected With Canseco -0.297 -0.199 -1.28E-03 -9.39E-04 -0.449 [0.66] [0.35] [1.41] [0.93] [0.87] After Canseco 0.667 0.737 3.49E-04 6.28E-04 -0.204 [1.58] [1.34] [0.41] [0.65] [0.34] Observations 15,644 9,234 15,644 9,234 12,759 Players 2,885 1,717 2,885 1,717 2,265 R-squared 0.13 0.14 0.09 0.13 0.08 Absolute value of t statistics in brackets
The coefficient on for playing with Canseco is negative and insignificant and the after Canseco coefficient is positive with a p-value of 0.12, which is above the standard (0.05) and lenient (0.1) thresholds for statistical significance. That is the best that I could get. When I up the at-bat minimum to the more appropriate 200, normalize home runs for at-bats, and both, “played with” is negative and never significant, and “after’s” p-value is never as low as it was in the specification that most-closely resembles the study. Another potential problem that I encountered was serial correlation in the data. This is sometimes difficult to detect, and it is possible that it is a problem unique to my sample. However, when I correct for the problem, both Canseco variables consistently have high insignificant p-values. So, though the authors find some evidence of an effect in the after variable in their sample, the finding appears not be all that robust.
The one thing that bothers me most about this study is that we have to interpret why the “after Canseco” variable is important, but the “during” variable is not as important. And I think the author’s story really only applies to players who are with Canseco for one season. So, I ran some regressions using players who played with Canseco for only one year.
One-year One-year 10+ Career With Canseco -2.656 -3.450 [3.02]** [3.17]** After Canseco -2.562 -3.027 [2.84]** [2.95]** Observations 1,200 940 Players 186 100 R-squared 0.18 0.23 Absolute value of t statistics in brackets * significant at 5%; ** significant at 1%
The effects of during and after playing with Canseco are strongly negative, about 2.5 less homers. However, if they only played on year with him it could reflect that these players were not very good and were on their way out of the league. So, I limited the sample to players with careers of 10 or more seasons; and, the result is a decline in homers of about 3 HRs both with and after.
My point of offering this “replication” isn’t so much to say that my specifications are superior. I just want to show that the findings do not appear to be robust. To concur with the conclusions presented in the study you have to interpret the findings in a way that I do not believe is correct. Upon further examination, I believe the significant effect on home runs after playing with Canseco identified in the Gould and Kaplan study is a product of spurious correlation, and thus this tells us little about Canseco impact on disseminating steroids throughout baseball.
When word of the Gwinnett Braves deal broke, I asked my friend Frank Stephenson what he thought of the deal compared to the deal in Rome. Frank is the chair of the economics department at Berry College, which is located in Rome. You may also know Frank from his blog contributions at Division of Labour.
Frank did some digging and now has a post on the Rome Braves (Low-A) stadium agreement.
— — —
The interest in the Gwinnett Giveaway raises an obvious question: How has the Macon Braves’ 2003 move to Rome worked out? That’s the question I’ll address in this guest post.
The Rome Braves play in a $14.9 million stadium (including $2.2 million for the land upon which it sits). Construction was financed by a special purpose local option sales tax effective from April 1, 2002 to June 30, 2003. (The SPLOST referendum passed with 50.5% support.) The Braves contributed an additional $850,000 to pay for upgrades including 6 additional suites, the restaurant and kitchen build-out, fixed seats, and additional concession areas. Hence, we see an immediate contrast between the Rome and Gwinnett stadiums—the Rome stadium was much less expensive.
Not surprisingly, one factor in the cost difference is the higher land price ($5 million vs. $2.2 million) in more densely populated Gwinnett County. The primary difference, however, is the construction cost of the stadium—approximately $13.5 million in Rome (including the add-ons paid for by the Braves) versus an anticipated $40 million in Gwinnett. Some of the cost difference arises from the Gwinnett stadium having 10,000 seats (plus an outfield area) while the Rome stadium seats 4,000 (plus an outfield berm). However, the number of suites is similar (16 for Gwinnett vs. 14 in Rome), and other stadium features (e.g., team offices) should also be similar for the two projects. Part of the cost difference for the Gwinnett deal is also the increased price for construction material such as steel and cement over the past five years (data here; click on chart labeled “Producer Price Index”).
Let’s turn now to the Braves lease provisions. The team agreed to an 18 year lease with three options to extend the agreement for four additional years. The lease has no provisions for a 31st season; presumably the team and Floyd County would negotiate an extension at or near the end of the 30 years if there is mutual interest in having the team continue to play in Rome.
All payments made by the Braves go into a capital maintenance fund for stadium repairs (e.g., HVAC, plumbing, etc.). The Braves payments for use the stadium consist of a $15,000 per year fee, a share of their season ticket revenue, and a share of naming rights. The season ticket share payment is calculated as 35% of any season ticket revenue between $175,000 and $350,000 per year and 25% of any season ticket revenue between $350,000 and $525,000 per year. (The thresholds are adjusted annually using the CPI.) The Braves keep all season ticket revenue below $175,000 and above $525,000. The season ticket share payments for the team’s first five years in Rome have ranged from $105,000 to $117,003 with a total of $554,999. Note that there is a potential loophole in the season ticket revenue sharing provision of the lease—the Braves could game the system by enticing customers to buy single game rather than season tickets. This possibility, however, seems rather unlikely as the team probably has a strong preference for the stable revenue streams generated by season ticket commitments.
Regarding naming rights, the stadium name was purchased by locally-headquartered State Mutual Insurance Company. The lease calls for naming rights revenue, net of any perks such as tickets and advertising provided by the Braves to State Mutual, to be shared 60/40 between the team and the county’s maintenance fund. The county’s 40% share of the net has been $8,286 for each of the first five years. Media reports (the Rome News-Tribune, July 1, 2003) indicate that State Mutual pays $65,000 per year for naming rights with a net of $20,714 after perks. The $65,000 per year amount seems reasonable in comparison to a $100,000 per year rights deal for Chattanooga’s stadium; it even seems lucrative compared to a $20,000 per year deal for the naming rights for the Kannapolis NC ballpark. (Note that these figures suggest that Gwinnett’s hope for $850,000 annually in naming rights might be overly optimistic.) However, the provision for net revenues to be split 60/40 is problematic. This provision essentially allows the Braves to capture a large portion of the naming rights by providing State Mutual with complimentary tickets and other items. While the team does incur costs for some of these items (e.g., $5,000 per year of giveaways and $1,486 for signage), others are pure profit (e.g., tickets that would have been unused otherwise). Stated differently, the naming rights deal can be viewed as naming the stadium for State Mutual in exchange for State Mutual making a large, multi-year purchase of tickets, advertising, and a suite. Floyd County might have been better served by a 60/40 split of gross revenue.
Adding together the $15,000 annual fee, the season ticket revenue split, and the naming rights, the Braves have paid $671,429 for their first five years in the stadium. A sensible way to compare this figure to the $14.9 million cost of the stadium is to think about the opportunity cost that taxpayers incur by having their $14.9 million tied up in a stadium rather than having it available for other uses. Economists typically measure such opportunity costs via an interest rate. Taking a conservative interest rate of 3%, taxpayers are sacrificing about $450,000 per year in interest in order to construct the baseball stadium. (Higher interest rates would imply a larger opportunity cost.) Since the Braves annual payments have averaged about $134,000 per year, they’ve paid a bit under one-third of the taxpayers annual cost of the stadium. Of course, the taxpayers aren’t really being repaid since the Braves payments go into the capital maintenance fund.
Let me touch on a few other issues. First, Floyd County is allowed to use the stadium up to 30 days per year (vs. 10 days for Gwinnett); however, all net proceeds of county sponsored events are deposited into the capital maintenance fund. (My impression is that Floyd County has used the stadium much less than 30 days per year and that few of the events had significant revenue potential.) Second, what about the stadium’s economic impact? Anecdotes about Camden Yards and other ballparks notwithstanding, the sports economics literature provides strong evidence that stadiums are not engines of economic development. (Phil Porter of the University of South Florida, in a recent interview on NPR, noted that the Super Bowl leaves “no footprint … in any of the measurable statistics.”) There’s no reason to think Rome’s experience has been any different; a Fuddrucker’s has been built across from the stadium but all of the outparcels at the stadium site remain empty (the local Hooters plans to relocate to one of the stadium parcels). Finally, what about the benefits Romans derive from baseball consumption? Measuring such benefits is notoriously speculative, but it’s possible that constructing the stadium and bringing the Braves to Rome has generated consumption benefits greater than the $316,000 average annual subsidy to the team. For example, the team has averaged over 225,000 in annual attendance. If each of these fans received about a dollar and a half in consumer surplus from attending, their aggregate benefit would outweigh the annual subsidy. Such reasoning, of course, begs the question of why taxpayers should fund a stadium that customers would be willing to pay for in the form of higher ticket prices.
— — —
Thanks to Frank for an informative comparison.
I still haven’t completely formed my thoughts on everything, so here are my jumbled impressions from the hearing.
— Brian McNamee is a worm. There is no way Roger Clemens will ever be convicted of perjury. The guy wouldn’t even admit to being a drug dealer. “That’s your opinion,” was his response when one congressman called him that. He’s a liar and con man. This doesn’t mean he’s lying in this instance, but the government can’t go forward with a perjury case with this guy as the star witness.
— The committee did not handle the hearing well, and Henry Waxman did a horrible job. He was rude, partisan, and injected far too much opinion. When I see grand-standing, it’s very hard for me to gain sympathy for your point of view. In several cases, Tom Davis (my former representative, of whom I have never been a big fan) was left to clean up his mess on several occasions, adding to the partisan tone of the hearing. Seriously, who votes for Waxman?
— Clemens did a good job. He was confident, and adeptly balanced emotion and restraint. He answered many tough questions and never seemed to stumble.
— I expected more discussion with Scheeler. Mitchell should have been there to defend his report. I certainly wouldn’t feel comfortable allowing anyone else to defend a report with my name on it.
— The committee was wrong to let Andy Pettitte skip the hearing, and this should have been obvious. I don’t think Pettitte came off as a bad witness in his deposition, as reports have stated. He did seem shy and quiet. My guess is that Pettitte is not a talkative fellow, and I got the impression that he has no confidence. His relationship with McNamee appeared to be very different than Clemens’s, with McNamee being the dominant personality and Pettitte being a bit too trusting.
— The partisan nature of the hearing was annoying. I guess it’s hard to prevent that from happening, though.
— Though a lot of my comments may seem pro-Clemens, I think the hearing was damaging for Clemens, overall. It goes to show why you should never want to testify in front of Congress. We really don’t have much more information to confirm guilt or innocence, but the media reaction seems to be leaning against Clemens.
— What’s next? This feels like the day after the 2000 presidential election, except that we knew that the conflict was going to have a resolution. But, I have a feeling that there is more to come.
I’ll have some more comments on the hearing later, but I have one thing I want to put out there. I really wish Andy Pettitte had testified today. And in light of how much weight several representatives put on Pettitte’s deposition, especially Elijah Cummings, he should not have been excused.
I just read through the entire deposition, and Pettitte’s recollection, while not 100% supportive of Roger Clemens, is not totally damning.
Q What was your reaction to what he said?
A Well, obviously I was a little confused and flustered. But after that, I was like, well, obviously I must have misunderstood him.
Q But he had never told you before that his wife had used HGH, that was the first you’d heard of that, is that right?
Q Did you understand that he was saying that as a way or sort of a strategy to handle the press inquiries? I mean, was that the nature of your conversation?
A Not really. The conversation wasn’t very long. That was really the end of the conversation. Just when he said that, I was like, oh, just kind of walked out. I wasn’t going to argue with him over it. You know.
Q It sounds like when you — it sounds like your recollection of the conversation you had with him in 1999, you are fairly certain about that, that he told you he used it. Do you think it’s likely that you did misunderstand what Clemens had told you then? Are you saying you just didn’t want to get into a dispute with him about it so you dropped the subject?
A I’m saying that I was under the impression that he told me that he had taken it. And then when Roger told me that he didn’t take it, and I misunderstood him, I took it for that, that I misunderstood him. (p. 27–28, emphasis added)
Later in during the deposition he was asked about the events again.
Q And you said when we were talking this morning that you thought maybe you misunderstood —
Q — and I thought that was almost another word for being polite. Do you — today, as you look back, do you think you misunderstood?
A I don’t think I misunderstood him. Just to answer that question for you when it was brought up to me, I don’t think I misunderstood him. I went to Mac immediately after that. But then, 6 years later when he told me that I did misunderstand him, you know, since ’05 to this day, you know, I kind of felt that I might have misunderstood him. I’m sure you can understand, you know, where I’m coming from with that conversation. (p. 90-91, emphasis added)
It sounds like he is firm in what he remembers—he thought Clemens said he used HGH in ’99/’00—but is satisfied that his memory of the event is hazy enough that he acknowledges that Clemens could be correct. I think he somewhat grants that Clemens’s version of the conversation is no less relevant of his own, possibly superior.
I would have liked to have had him clarify his opinion of Clemens misunderstanding him. Had Pettitte been at the hearing he could have commented on Clemens’s character and why he might be willing to believe his own recollection is mistaken.
Addendum: A few further thoughts on Pettitte.
Andy Pettitte’s wife’s affidavit confirms what her husband told her. This isn’t useless information, but it’s also not all that supporting. If Andy misunderstood Clemens, then when talking to his wife he would tell her what he thought he heard. So, I don’t think her testimony corroborates much more than what her husband recollects.
Also, the notion that Clemens contradicts himself by saying that his wife used HGH when Pettitte revisited their ’99/’00 conversation, isn’t necessarily so. Debbie Clemens admitted using steroids in 2003. In 2005, Pettitte broaches the subject with Clemens, who does not remember the conversation. If he does not remember the conversation, but he does know that his wife used the drug in 2003, it is not surprising that he would say this.
Update: Apparently , Pettitte’s motive for finally revealing his 2004 HGH use was not so innocent. It looks like the story was going to come out anyway.
A month-long investigation by the Daily News has found that Tom Pettitte received performance-enhancing drugs from a trainer at a gym near Deer Park, and provided them to his son as recently as 2004. In numerous interviews with associates of the gym, on several trips to the Deer Park area, reporters from the Daily News discovered that Tom Pettitte, who has serious medical problems, obtained the human growth hormone from the muscle-bound owner of the gym, who is close to the Pettitte family. Based on information from two sources, Koby Clemens, Roger Clemens’ oldest son, also has worked out at the same gym.
Stadium naysayer needs reality check
The op-ed piece questioning the Gwinnett County administration’s fiscal stance in building a minor-league baseball stadium with public funds is one of the best examples of ivory tower academia passing for expertise (“Braves stadium deal taxes fiscal realities,”).
Those of us who have run businesses understand the value of such a venture in creating a small-business ripple effect that will generate a tax-revenue stream to the benefit of all. It wasn’t so long ago that the same kind of naysaying surrounded construction of the now hugely successful Gwinnett Center and Arena. In other words, a classroom construct doesn’t approach the real world.
AVRUM FINE, Lawrenceville
It’s hard to be a naysayer when you make your case using the other side’s most-optimistic estimates and the projected revenue still covers only half of the projected cost. And if the Arena at Gwinnett is such a success, why were two-thirds of its construction costs funded through hotel taxes? When you throw out a gigantic subsidy and the interest on the debt, it is not difficult to make something look profitable. Of course, it’s easy to get away with this when the local paper does nothing but run puff pieces that spew their own representative’s propaganda (here, here, and here).
Today is the day, and I am intrigued as to how this is all going to go down. There are two late-breaking developments in the case.
From the New York Times:
Roger Clemens will be confronted with a new and damaging affidavit from Andy Pettitte when he appears before the House Committee on Oversight and Government Reform on Wednesday to testify about allegations that he used performance-enhancing drugs, two lawyers familiar with the matter said late Tuesday.
Clemens will also be asked about corroborating information that committee staff members developed on their own that ties Clemens to such drugs, the lawyers said. That information, they said, stands separate and apart from the assertions made about Clemens by his former personal trainer, Brian McNamee, who contends that he injected Clemens with steroids and human growth hormone from 1998 to 2001.
Given the way news breaks, I would not be surprised if the two developments are closely connected if not the same. Supposedly, this is the revelation from Andy Pettitte’s testimony.
From the Associated Press:
Roger Clemens told Yankees teammate Andy Pettitte nearly 10 years ago that he used human growth hormone, Pettitte said in a sworn affidavit to Congress, the Associated Press learned Tuesday.
Pettitte disclosed the conversation to the congressional committee holding Wednesday’s hearings on drug use in baseball, a person familiar with the affidavit said. The person spoke to the AP on condition of anonymity because the document had not been made public.
According to the person familiar with the affidavit, who said it was signed Friday night, Pettitte also said Clemens backtracked when the subject of HGH came up again in conversation in 2005, before the same House committee held the first hearing on steroids in baseball.
Pettitte said in the affidavit that he asked Clemens in 2005 what he would do if asked by the media about HGH, given his admission years earlier. According to the account told to the AP, the affidavit said Clemens responded by saying Pettitte misunderstood the previous exchange in 1999 or 2000 and that, in fact, Clemens had been talking about HGH use by his wife in the original conversation.
If you thought Clemens showed his anger before, imagine what his demeanor will be after having his wife dragged into this whole mess.
I predict that Mitchell’s representative Charles P. Scheeler is going to get a good deal of attention from the committee.
I may “live blog” this, but depending on other factors I may not be able to do so. Please, check back later in the day. At the minimum I’ll have some comments after the hearing.
It’s nice to see the scientific consensus on human growth hormone (HGH) finally reach the general public.
The House Committee that on Wednesday is expected to hear the differing viewpoints of Roger Clemens and Brian McNamee did its pharmacology homework Tuesday, holding a hearing on the “Myths and Facts about Human Growth Hormone, B-12, and Other Substances.”
The consensus from the four doctors who testified: Neither HGH nor vitamin B-12 appears to help athletic performance very much, although much more research is needed on HGH, which also has a litany of unappealing side effects.
“There is no credible scientific evidence that growth hormone substantively increases muscle strength or aerobic exercise capacity in normal individuals,” said Dr. Thomas Perls, director of the New England Centenarian Study at the Boston University of Medicine.
It’s only been ten months since I started my campaign.
In an effort to further our debate over what the statistics say about Roger Clemens’s possible steroid use, Dave Berri asked Justin Wolfers and I to address our disagreement on Wages of Wins.
So here we have two of my friends appearing to have a very public disagreement. And this led me to think of my role in life as a uniter (yes, I have always thought of myself as a uniter, not a divider). So last night I sent the following e-mail to both Bradbury and Wolfers.
Would each of you agree with the following statements?
Justin and company are arguing that the statistics do not show Clemens is innocent.
JC is arguing that the statistics do not show that Clemens is guilty.
Both Bradbury and Wolfers graciously responded to my inquiry. And each also agreed to let me post their responses.
Here is an excerpt from Justin.
Beyond what the data don’t “prove” (both guilt and innocence), there is a tougher intermediate question: Are Clemens’ career statistics better thought of as evidence for the prosecution, or evidence for the defense? We see enough unusual patterns in his career trajectory that we think of them as being more persuasive for the prosecution than for the defense. Different approaches yield slightly different conclusions, but enough of them look somewhat odd that it is hard to see an honest presentation of the data helping Clemens’ case.
Here is an excerpt from me.
I agree that the statistics cannot exonerate Roger Clemens nor any other baseball player accused of using steroids. I also think they cannot convict…. In Clemens’s case, especially considering the specificity of Brian McNamee’s allegations, I don’t think swings in the data support the current allegations…. So, to put it in Justin’s terms, I think the evidence supports the defense.
Justin has also added another post at Freakonomics (here is the initial post), where he walks through the data analysis process. I think this analysis puts too much weight on the WHIP statistic. WHIP suffers from the same malady that ERA does: it is highly variable because it includes fielder contributions from hits on balls in play.
Generally, one way we can look at metrics to see if they measure skill or if they are just reflecting random fluctuations is to see how individuals perform over time. If the skill is real, then pitchers ought to perform similarly from season to season. Here are the year-to-year correlations for pitchers throwing back-to-back 100+ innings seasons from 1980–2006.
Metric Correlation Strikeout Rate 0.79 Walk Rate 0.64 WHIP 0.42 ERA 0.37
All measures are correlated, but the correlation is lower for the metrics that include fielder contributions. The season-to-season correlation between individuals pitchers’ WHIP and ERA are quite similar. Also, both metrics vary similarly: the average coefficient of variation (mean/standard deviation) for the pitchers in the sample is 2.46 for WHIP and 1.99 for ERA.
Here is a graph of ERA and WHIP by age for Roger Clemens on that using connected scatter plots and quadratic fit curves.
The metrics tend to move in concert (correlation = 0.9), and the small difference in quadratic fit seems to be explained by a few more-extreme deviations in WHIP.
Thus, if WHIP has any advantage over ERA, it is slight; and I prefer to concentrate on the individual metrics. I think using WHIP to examine Clemens’s career is especially problematic because the reduction in walks was largely responsible for his late-career success, and it is his walks that cause his career WHIP to be upside down. I don’t view walks as a good marker for steroid use. Thus, I interpret the same data to support rather than damage the case for Clemens’s performance being natural.
Thanks to Dave for setting this up, and thanks to Justin for participating. It is a pleasure to discuss a disagreement cordially—a rarity on the internet.
If you haven’t followed the debate, here are some relevant links.
Bradbury: What Do the Statistics Say about Roger Clemens’s Steroid Use?
Wolfers, et. al.: Report Backing Clemens Chooses Its Facts Carefully
Bradbury: A Critique of the Clemens Report
Wolfers: Breaking Down the Clemens Report: A Guest Post
Hendricks: Official Clemens Response to the NY Times Article
Wolfers: Analyzing Roger Clemens: A Step-by-Step Guide
I have received the official response to the NY Times study I discuss below.
— — —
Hendricks Sports Management Response to New York Times Article
Dated February 10, 2008 by Bradlow, Jensen, Wolfers, and Wyner
The most important statements made by the four professors who authored the New York Times article are these: “Our reading is that the available data on Clemens’s career strongly hint that some unusual factors may have been at play in producing his excellent late-career statistics. In any analysis of his career statistics, it is impossible to say whether this unusual factor was performance-enhancing drugs.”
The Clemens Report does not state that the statistics “prove” anything, something missed by the four professors. The purpose of the report is to provide the statistical background of Roger Clemens’ career and to correct misconceptions about his career in the public forum. For example, it was being widely reported that Clemens was “washed up” when he left Boston in 1996. In fact, Clemens led the American League in strikeouts in 1996, tied his record of 20 strikeouts in a single game, and was a leader in many pitching categories.
* Criteria: The criteria the authors of the Clemens Report used to select pitchers for comparison were 2,000 innings pitched, high strikeout rates and high-quality performance as a starting pitcher. The Wharton professors, in their selection of pitchers to analyze, make the fundamental assumption that all pitchers with 10 or more starts for 15 years and 3000 innings pitched are roughly equal. Roger Clemens is not like every other pitcher in this group. He is considered perhaps the best pitcher of his generation. The professors make the mistake of thinking that his career arc should look like the arc of every other pitcher in their selected group.
Clemens, Curt Schilling, Randy Johnson, and Nolan Ryan were all highly successful in the second halves of their careers, and cannot properly be compared to pitchers who did not pitch effectively into their late 30’s and 40’s. The professors readily admit that Schilling, Johnson, and Ryan pitched well late in their careers. The professors state that there is no way to relate career performance trends to performance enhancing drugs. But they state that Clemens’ late-career performance ‘raises suspicion’. Therefore these ‘statisticians’ are engaging in precisely the kind of insinuation with their words that they say cannot be proven by statistics.
* Variables: There are many variables at work that affect a starting pitcher’s longevity. For example, Roger Clemens¹ workout regimen, which has been often cited as a significant factor in his success, has certainly extended his career. Nolan Ryan was also known for his dedication to a challenging workout regimen, and, like Clemens, he enjoyed late career success. Just because it is difficult to measure the impact of a challenging workout regimen does not mean it does not favorably impact performance. Another factor that helped Clemens remain effective was his ability to adjust his pitching style over time, something the professors choose to disregard because pitch selection is not quantified in the report. Factors like Clemens’ workout regimen and his effective use of the split-finger fastball are not subject to easy statistical analysis. This does not mean that these factors should be ignored. Clemens’ intense workout regimen and his use of the split-finger fastball have been extensively observed and commented on over the course of his career. This is why baseball clubs employ scouts in addition to statisticians – because there are elements of the game of baseball that are extremely relevant to performance, even if they are not easily reduced to statistics.
* ERA: The professors say ERA can be unreliable as a basis for analysis because of the impact defense has on ERA. First, the Clemens Report uses ERA Margin, which is an advanced version of ERA that takes into account league differences. Second, ERA Margin and similar versions of ERA are widely accepted throughout baseball as superior measures of the quality of starting pitchers, something ignored by the Wharton professors. Third, the Clemens Report additionally provides thorough analyses of strikeouts, innings pitched, and pitch counts.
In using hits plus walks per innings pitched, the professors substitute a less comprehensive measure for ERA-based statistics by choosing to analyze just one of the many subcomponents of ERA. Furthermore, they make the mistake of not recognizing that hits are more dependent on defense than any other subcomponent of ERA. Hits are heavily dependent on the skills of the fielders, especially their range in the field. A shortstop with more range will reach more balls and prevent more hits than a fielder with poor range. So the statistic the professors choose to apply in their analysis is, ironically, more affected by the very factor they criticize in ERA.
Additionally, ERA Margin adjusts for the changes in the game over time by comparing a pitcher’s performance to the rest of the league at the time he played. The professors make no adjustments for any of the changes that have taken place in baseball over the last forty years, treating every hit and walk exactly the same, despite the lowering of the pitching mound, the tightening of the strike zone, the changes in equipment, the addition of the designated hitter, the introduction of modern ballparks, and other factors that have affected the game over the years. As a result, the professors are not correctly evaluating the statistics they have chosen to use for their comparisons.
* Roger’s age: The Wharton professors state that “his performance declines as he enters his late 20’s.” This statement is demonstrably false. After the 1990 season, at age 28, Clemens was second in voting for the A.L. Cy Young Award behind Bob Welch, a season in which Clemens’ ERA was 1.93. The next year, at age 29, he won the Cy Young Award. Pitching from the age of 27 to the age of 30, Clemens was an All Star in 1990, 1991 and 1992, and he achieved an ERA below 3.00 in each year. He turned 30 in August of 1992. These are clear indications that Clemens was not in a ‘decline’ in his late 20’s, as asserted by the professors.
As Bill James stated in a salary arbitration case while working with Hendricks Sports Management, “Anyone can make a chart.” The professors have proven this axiom, but they have not added anything substantive to a discussion of Roger Clemens’ career.