Archive for Pitching
More Testing of the Verducci Effect
After doing my analysis of the Verducci Effect yesterday, I became aware of Jeremy Greenhouse’s analysis on the subject. He uses a different method, but also finds little support for the Verducci Effect. His analysis pointed me to Josh Hermsmeyer’s Free Player Injury Database, which is valuable new resource. The database contains injury information dating back to the 2002 season. Because the Verducci Effect is largely about predicting injuries I wanted to see how player workloads predicted time on the Disabled List (DL). If significantly increasing pitcher workloads raises the incidence of future injuries, then pitchers who meet Verducci’s criteria should be more likely to get injured.
The table below lists the estimates for the impact of the Verducci Effect on DL stints. I estimated several models (including continuous estimates of pitcher workload), but I report only four specifications below because the results are consistent with the unreported estimates. I looked at the number of days on the DL (continuous) and whether or not a player ended up on the DL (discrete) using random-effects estimation models, least-squares for the former and logit for the latter. I also included the number of days on the DL in the preceding seasons in two specifications to control for the natural injury propensity of players.
DL Days DL Days On DL On DL Verducci 4.27 -1.89 0.28 0.06 [0.76] [0.59] [0.66] [0.12] Mean IP -0.19 -0.16 -0.006 -0.003 [9.33]** [12.75]** [4.69]** [2.01]* DL Days (t-1) 0.64 0.10 [54.16]** [14.06]** Constant 37.67 29.48 -0.50 -1.68 [14.77]** [18.60]** [3.23]** [8.69]** Observations 1428 1428 1428 1428 Overall R2 0.04 0.63 -- -- Absolute value of z statistics in brackets * significant at 5%; ** significant at 1%
Again, the results do not support the existence of the Verducci Effect. The estimates are smal and not statistically significant. A change in workload by more than 30 innings for pitchers under 26 is not associated with more days on or trips to the DL. I would like to reiterate that there needs to be further testing of Verducci Effect, but so far there doesn’t appear to be much empirical support for the hypothesis.
Testing the Verducci Effect
For some reason, the Verducci Effect seems to be getting a lot of attention right now. I recall it being mentioned in the past, but I haven’t paid much attention to it. The effect is named for Sports Illustrated writer Tom Verducci, who came up with the concept but didn’t pick the name. Verducci uses a set of criteria to identify pitchers who are at risk for injury due to a significant increase in workload. He describes the criteria for selection and rationale in an article published this week.
More than a decade ago, with the help of then-Oakland pitching coach Rick Peterson, I began tracking one element of overuse which seemed entirely avoidable: working young pitchers too much too soon. Pitchers not yet fully conditioned and physically matured were at risk if clubs asked them to pitch far more innings than they did the previous season — like asking a 10K runner to crank out a marathon. The task wasn’t impossible, but the after-effects were debilitating. I defined an at-risk pitcher as any 25-and-under pitcher who increased his innings log by more than 30 in a year in which he pitched in the big leagues. Each year the breakdown rate of such red-flagged pitchers — either by injury or drop in performance — was staggering.
I figured now would be as good a time as any to put off the other important things I should be doing in order to find out if the Verducci Effect is real. I used a sample of major-league pitchers from 1998–2007 to estimate the impact of ratcheting up pitching loads on performance on innings pitched and era, using both their recent major-league and minor-league workloads to predict performance. In some specifications I included the average between the present and past seasons’ performances (Mean IP or mean ERA) to peg a typical performance level for each pitcher. The Verducci Effect was considered to be in force if a pitcher was under 26 had increased his workload by more than 30 innings in the previous year. I also measured the Verducci Effect continuously using the actual number of innings pitched increased before the preceding season. I only looked at performance in the majors, but minor-league workload totals counted toward the Verducci Effect. I estimated the impact using a random-effects estimation technique that controlled for detected serial correlation. The regression estimates are below, but if you’re not familiar with reading such tables you can skip over them and read my write-up that follows.
IP Change IP Change IP Change IP Change Verducci 19.07 22.17 [3.18]** [3.73]** IP Change * Under 26 0.23 0.21 [3.37]** [3.15]** IP Change -0.25 -0.17 [10.41]** [7.22]** Under 26 14.89 17.04 [4.46]** [5.29]** Mean IP 0.06 0.13 [3.96]** [6.98]** Constant -12.23 -4.83 -21.97 -6.61 [5.78]** [4.90]** [8.83]** [5.98]** Observations 2383 2383 2316 2316 Overall R2 0.0122 0.0058 0.0379 0.0257 Absolute value of z statistics in brackets * significant at 5%; ** significant at 1%
ERA Change ERA Change ERA Change ERA Change Verducci -0.09600 -0.10295 [0.21] [0.22] IP Change * Under 26 -0.00391 -0.00386 [0.78] [0.77] IP Change 0.00611 0.00609 [3.71]** [3.74]** Under 26 -0.24738 -0.25085 [0.93] [0.95] Mean IP 0.47554 0.00684 [13.67]** [0.17] Constant -1.90261 0.49064 0.36013 0.39538 [8.05]** [2.98]** [1.50] [2.86]** Observations 2380 2380 2313 2313 Overall R2 0.0707 0.0000 0.0034 0.0038 Absolute value of z statistics in brackets * significant at 5%; ** significant at 1%
The first row of each table measures the straight-up Verducci effect. If you increased your workload by more than 30 innings in the preceding season and are under the age of 26, then we should expect to see a decline in innings pitched and ERA. However, it turns out that this is not the case. In terms of workload, Verducci Effect pitchers actually increased their innings pitched between 19 to 22 innings. In terms of performance quality, pitcher ERAs declined by an average of 0.1 runs; however, the effect was not statistically significant, which means it’s probably best to say there is no effect.
The last two columns of the tables represent attempts to quantify the Verducci effect as a continuous phenomenon; that is, the more your workload increases the stronger the effect ought to be. To do this I used three variables: the change in workload (measured by innings pitched), an indicator of whether or not the player was under 26, and an interaction term that multiplies the change in workload times the under 26 indicator. The interaction term (listed on the second row of each table) captures any difference in performance from workload by Verducci Effect pitchers. For innings pitched, Verducci Effect pitchers increased the number of innings pitched by about 7 innings for every 30 innings pitched. In addition, being under 26 increased expected innings by 15 innings, while the change in workload tended to lower innings pitched for all pitchers by about 8 innings. Thus, the net result for an under 26 pitcher increasing his workload by 30 innings is an increase of about 7 innings pitched. Note these results are all statistically significant, but this was not the case for ERA.
So, where are we? The results do not bode well for the Verducci Effect. Pitchers who were predicted to decline actually improved. One potential problem with this study is that pitchers who pitched no innings at all in a season were not included; however, I think this bias is slight since this number is small, as even injured pitchers normally get in a few innings every season. Frankly, this is about as quick and dirty as you can get with a test; but, it’s a starting point, and I’d like to see others examine the effect further. While appreciate the intuition behind the Verducci Effect, I don’t see much evidence for it.
Tim Hudson’s Hometown Discount
The long-awaited announcement of Tim Hudson’s new contract with the Braves has finally come. The terms guarantee Hudson $9 million a year over the next three seasons, plus a $1 million buyout of a team option for a fourth year. The fourth-year option also pays out $9 million, so the total value that could be paid out is $36 million over four years. The contract voids a $12 million option for 2010, that the Braves were likely going to buy out for $1 million.
Hudson is an interesting player. He’s ranged from good to dominant. He was really pitching some of his best baseball as a Brave right before his injury. The good news is that he pitched well in his return through 42 innings. With a full offseason to recover, I think there is good reason to believe that he will be back to normal; however, the injury risk may have reduced his value somewhat. I proceed to my valuation with this caveat.
If Hudson pitches as he did in 2007 and 2008 over the course of a full season, then he’ll be worth about $12.5 million per year over the next three seasons. Thus, it appears that Hudson is giving the hometown discount that he promised—smart move by Frank Wren and the Braves. This allows the Braves to trade one of its other starters (who will it be?) and still have pitching stability going into the future.
If you see Hudson out and about in the Atlanta area, be sure to say “thanks”—but, please, don’t pester him. Or, maybe throw a little support to the Hudson Family Foundation. He wants to be in Atlanta, and he has strengthened his club by doing so. It’s nice to have you on board for the long haul, Tim.
Question of the Day
If the Yankees end up losing the World Series because they can’t get good production out of a starter for the final three games, how will this affect the machismo argument regarding pitcher rest?
Even if the Phillies come up short, Charlie Manuel made the right call to give his pitchers four days of rest. It’s an issue of physiology: the body needs time to recover from strenuous activity.
The Duncan Effect
At Inside Pulse Sports Pip of Fungoes asks:
it’s time for a comprehensive study of whether there is a “Duncan Effect” on pitchers, like the one that JC Bradbury did on Leo Mazzone. Until then, no one knows for certain what kind of an impact (if any) Duncan has on pitchers.
Well, because you ask so nicely, I’d be happy to oblige.
Actually, it’s easy because I already did the study.
Two years ago, Sports Illustrated asked me to look into the question, and I ran a study similar to the one I did for Leo Mazzone in The Baseball Economist. I looked at how pitchers performed with and without Duncan, controlling for factors such as age, parks, and pitcher quality. I found that Dave Duncan’s pitchers improved their ERAs by about 0.35 runs—yeah, he’s pretty darn good.
If you haven’t seen this before, it’s because the estimate buried on page 60 of the September 27, 2007 issue of SI in a story about Duncan and his sons. I meant to write about it at the time, but I never got around to it.
UPDATE: Pitchers Hit Eighth posts the link to the SI article in the comments. Thanks!
Does Clutch Pitching Exist?
As a follow-up to my little clutch hitting study, I thought it would be interesting to look at clutch pitching using the same methodology. Though I don’t believe there is good reason to expect clutch performance among hitters, I think it’s plausible that pitchers may have some clutch skill. Pitchers have to regulate their effort throughout the game and often change the way they pitch with runners on base (employing the stretch). Theses factors leave room for pitchers to perform differently when the stakes of the game change. Pitching better with runners in scoring position (RISP) may not be “clutch” in the Platonic sense of rising to the occasion, but it’s a skill worthy of examination.
I looked at individual RISP plate appearances in 1992 and estimated the impact of past clutch performance controlling for the overall pitcher performance in each area (allowed AVG, OBP, SLG, strikeout rate, walk rate, home run rate), the skill of the batter in each area, and the platoon effect (platoon = 1; 0 = otherwise). I used RISP performance in 1989–1991 to proxy clutch ability—if pitchers have clutch skill, past clutch performance should correlate with present clutch performance.
The table below lists the coefficients (reported as marginal effects) and robust z-statistics of regression estimates in seven performance areas. I used the probit method to estimate binary outcomes (outcome = 1 if an event occurred, and 0 otherwise) of individual plate appearances for hits, on-base (hits + walks + hbp), strikeouts, walks, non-intentional walks, and home runs. I used the negative binomial method to estimate the impact of the variables on the number of total bases resulting from a plate appearance.
Hit On Base TB K BB BB-IBB HR Overall 1.1801 0.8815 0.9702 0.8695 0.7657 0.6890 0.5808 [8.61] [7.20] [8.06 ] [12.34] [8.78] [8.66] [6.31] RISP -0.0239 -0.0022 -0.1192 0.0845 0.1305 0.0797 -0.0383 [0.29] [0.03] [-1.69] [1.39] [2.16] [1.43] [0.73] Batter 1.0148 1.0737 0.9816 0.9242 1.1108 0.9558 0.5831 [13.61] [17.42] [15.62] [25.56] [23.54] [22.49] [16.13] Platoon 0.0165 0.0332 0.0428 -0.0246 0.0266 0.0039 0.0071 [2.77] [5.35] [4.29 ] [5.30] [6.73] [2.79] [1.96] Obs. 21,096 23,872 21,096 23,872 23,872 23,872 23,329 Probit Probit Neg Bin Probit Probit Probit Probit
Past RISP performance is not a statistically significant predictor of 1992 RISP performance. Walk rate appears to be an exception—with pitchers consistently performing worse with runners on base (and having a z-stat > 2)—but the higher probability of walks seems to be caused by the increase in intentional walks issued with the hope of turn out a double play. When IBBs are removed, pitcher RISP walk performance loses its statistical significance.
The results do hide one thing: pitchers perform better in RISP than non-RISP situations, except when walks are involved. The table below shows the average of outcomes for all events. All differences are statistically significant.
No RISP RISP Hit 0.255 0.249 TB 0.380 0.368 BB 0.075 0.089 On Base 0.315 0.321 K 0.150 0.157 HR 0.021 0.019
The numbers remove intentional walks, therefore the worse performance in preventing walks, which also shows up for on-base probability, could represent “intentional unintentional-walks” or pitchers losing control a bit when runners are in scoring position. But if the latter were true, I would expect the numbers to be worse in the other areas. Also, because the numbers below are the percentages of all outcomes, the better numbers in RISP may also reflect better relievers entering the game for such situations.
The main story here is that the regression estimates indicate that after controlling for several relevant factors pitchers don’t appear to have any special skill over other pitchers in performing in RSIP situations. A pitcher’s overall performance level does a fine just of predicting performance, and knowing past clutch performance doesn’t appear to add useful information.
Rebunking “Wins”
I happened to run across a BTF link to a post at Rays Index that I had to read because of its curious conclusion.
In fact, in the absence of other stats, Wins is a very good, if not great, indicator of a pitcher’s value. So next time you hear somebody say Wins is a crappy way to evaluate a pitcher, throw a drink in their face and then make them read this post.
As someone who would need a towel if readers followed this advice, I believe a response is in order. Now, the author Cork Gaines (“The Professor”) does acknowledge that Wins is not the best statistic to use for evaluating pitchers, but that’s not really news. When ever is there a situation when anyone is going to have to choose using Wins or nothing to value a pitcher? After reading the post, I maintain that Wins is a poor statistic to use for valuing pitchers. In fact, the statistical evidence used in the article shows the opposite of what the author thinks it shows.
Gaines uses regression estimates of Wins and Win% on ERA+, finding R2 values of 0.51 and 0.54 to justify the usefulness of Wins.* Those values are indeed statistically significant and reveal a real positive correlation between Wins and run prevention. But more so, they reveal why Wins are such a bad statistic to use for valuing pitching quality. How is showing that good pitchers get more Wins than bad pitchers busting a myth? Greg Maddux didn’t luck his way to 355 Wins, and no one who pooh-poohs Wins thinks his Win total is a result of randomness, unrelated to his ability. It’s the magnitude of the correlation that is important here.
The R2 reveals the percentage of the change in the dependent variable (ERA+ in this example) explained by changes in the independent variable(s) (Wins or Wins%). The remainder is due to explanatory factors not included in the model. Now, R2 can be tricky to interpret and it is sensitive to sample size; but, in general, the results indicate that 50% of the difference in ERA+ across pitchers can be explained by differences in Wins. That’s the problem, not evidence to the contrary. The main knock against Wins is that pitchers have control over only one half of the game: half the game is defense (50%) and the other half is offense (50%). An R2 of close to 0.50 confirms rather than debunks this notion.
When choosing performance metrics, it is important to use three criteria:
1) How well does it correlate with output? — Wins doesn’t do so bad here: Wins are correlated with run prevention. Still, other metrics of pitcher performance are far superior, and the life-boat circumstances when someone might need Wins to value a pitcher don’t happen. Why bring this up? No one has suggested that Wins and ability are uncorrelated.
2) How well does it measure ability? — It measures ability, but it is heavily polluted by outside factors (offense and fielding). This is the criterion used to justify using DIPs over ERA. If you want to know the statistic that most strongly correlates with run prevention for pitchers, it’s ERA by a longshot. It is almost a pure recording of the runs pitchers give up, so of course the correlation will be strong. The problem is that pitchers themselves don’t have much control over a major component of ERA: balls that are put into play. ERA fluctuates significantly from season to season for pitchers because it is so dependent on balls in play. DIPS measures are preferred over ERA because they more accurately capture actual pitcher contributions to run prevention, not because they correlate more strongly with run prevention. Similarly, Wins capture some aspects of pitcher ability, but a huge chunk of the contribution is determined by something beyond pitcher control. And the regression estimates that the explained variance of ERA+ are consistent with Wins reflecting half of what pitchers contribute to generating this metric.
3) How well does it match our intuition as to what matters? — -This criterion isn’t all that relevant in this case, and is reflected in the analysis in criterion 2. I use this rule in situations where correlations yield counterintuitive values. For example, strikeouts and home-run hitting are positively correlated; however, suggesting that a hitter should strike out more to increase his power would be wrong.
Gaines is right that Wins includes some useful information regarding pitchers, but the pollution impacts of outside factors are so large that in cases where we see Wins deviate from ERA or DIPs performance expectations that it is Wins that contains the misleading information. There is no reason to use Wins to evaluate pitcher ability. It is neither a very good nor great indicator of a pitcher’s value.
* A footnote to the article states that R2 ranges from -1 to 1 with greater positive (negative) values indicating a stronger correlation. This is incorrect. R2 ranges from 0 to 1. I was curious if the author was using a correlation coefficient R, which does range from -1 to 1 but has a different interpretation in terms of measuring explained variance. However, the graphs and intuition make it look as though the descriptive footnote is incorrect, not the main text of the analysis.
This Is Getting Ridiculous
Kerry Wood was in Cleveland on Wednesday night to take the physical needed before he can finalize a two-year deal with the Indians worth about $20 million.
$10 million a year for Kerry Wood?! And I thought the K-Rod contract was excessive. Kerry Wood was a more valuable pitcher ($5.72 million) than Francisco Rodriguez ($4.62 million) in 2008, but he was limited by injuries during the four previous seasons. Assuming that Wood pitches exactly as he pitched in 2008, I estimate he will be worth approximately $6.5 million per season for the next two years. I reiterate: this assumes that one of the most injury-prone players in the league performs as he did last year. He made $4.2 million pitching for the Cubs last season, and the Cubs didn’t even offer him arbitration. Given his injury history, I think that was probably the right call.
I consider the Indians to be one of the smartest organizations in baseball—in my book I rate Cleveland to be the best-managed franchise in the American League—therefore, this move shocks me even more. What could be going on? I can think of only one explanation: Kerry Wood has been able to demonstrate such good health that teams think he can start. If Kerry Wood can get back to his 2002–2003 form he would be a $14 million/year pitcher over the next two seasons. Maybe his agent has been shopping his potential as a starter.
Wisely Spending on CC
Apparently, CC Sabathia and the New York Yankees have agreed to a seven-year $160 million contract, which is just under $23 million per season. [Update: Tim Brown is reporting the deal is for $161 million (exactly $23 million per year) and has an opt-out clause after three years.] Previously, I had projected Sabathia to be worth $24 million in a six-year deal. (Since I made that initial estimate, I corrected a minor error in my model that resulted in a very slight undervaluing of Sabathia.)
For the next seven years, I have Sabathia valued at just under $26 million a season, so the Yankees are paying about what he is worth.
Wasting Money on K-Rod
I continue to be amazed by the over-valuing of closers in the baseball labor market. Yesterday, the New York Mets and Francisco Rodriguez agreed to a 3-year, $37 million contract. The deal also includes an option for a fourth year for $13-$14 million based on easily-attainable criteria. What an absolute waste of money. I have K-Rod valued at $6 million per season over the next three years.
I’ve been saying for a while that closers are overpaid. Rodriguez has been a very good closer, but the problem is that closers don’t pitch much. Over the past three seasons, K-Rod has faced 4.7% of the team’s opposing batters; a decent starter will face three times as many batters. While we see K-Rod pitch at the end of games, often when games are on the line, he’s not pitching much. The Met’s would have been better off spending that kind of money on a good starter who would prevent run scoring over many more batters. A few more million a year could have brought in A.J. Burnett or Derek Lowe whose superior pitching would prevent situations that closers can rectify.
Addendum: I received a question about the role of leverage—the difference in the importance of when a pitcher typically appears within a game—in determining values. I’ve been asked it before, and my answers have been scattered over several different locations. So, here is my e-mail reply explaining why I value all innings pitched the same.
I have considered the impact of leverage, but I don’t think leverage can explain the vast differences in my estimates and what is happening in the market. Leverage is a product of outside factors when a pitcher faces the same rules during all times of the game. The quality of his pitching is the same in the 5th inning as it is in the 9th. (There is the argument about pressure, but I don’t buy this explanation at this level of competition.) Now, the fact that he is good enough to pitch in a high-leverage situation is worth something; however, I don’t believe the value is twice the average. And the fact that a pitcher has pitched in high or low leverage situations doesn’t mean he ought to get all the credit for it.
For example, take Scott Linebrink and Francisco Cordero. Last year, both pitchers signed four-year deals for $19 million and $46 million. I estimated that Cordero was worth about $2 million more than Linebrink, yet he was paid more than twice what Linebrink got. The only difference in their pitching histories is that one is considered to be a middle reliever and the other considered a closer. It’s the performance that matters and ought to determine their salaries, not when they pitch. If Cordero is worth $46 million because he pitches in high-leverage situations, then Linbrink should have received a similar salary to reflect his opportunity cost—he could have pitched in high-leverage situations, but he didn’t. I think the market is putting too much value on the “Closer” label.
Another factor is that better pitchers in earlier innings affect the leverage in later innings. So, a good starter preventing runs as an impact on reducing leverage later in the game by creating bigger leads. I’m not sure exactly how to value that. So, I believe that the proper method is to treat all pitcher innings the same, while acknowledging that some elite relievers have some extra value in that they could be used in more valuable spots. But this value doesn’t necessarily come from when they pitched in the past.
I’m also a believer in patchwork bullpens. Take a bunch of bad castoff starters, platoon them, and tell them to pitch as hard as they can.





