Rebunking “Wins”

I happened to run across a BTF link to a post at Rays Index that I had to read because of its curious conclusion.

In fact, in the absence of other stats, Wins is a very good, if not great, indicator of a pitcherâ€™s value. So next time you hear somebody say Wins is a crappy way to evaluate a pitcher, throw a drink in their face and then make them read this post.

As someone who would need a towel if readers followed this advice, I believe a response is in order. Now, the author Cork Gaines (“The Professor”) does acknowledge that Wins is not the best statistic to use for evaluating pitchers, but that’s not really news. When ever is there a situation when anyone is going to have to choose using Wins or nothing to value a pitcher? After reading the post, I maintain that Wins is a poor statistic to use for valuing pitchers. In fact, the statistical evidence used in the article shows the opposite of what the author thinks it shows.

Gaines uses regression estimates of Wins and Win% on ERA+, finding R2 values of 0.51 and 0.54 to justify the usefulness of Wins.* Those values are indeed statistically significant and reveal a real positive correlation between Wins and run prevention. But more so, they reveal why Wins are such a bad statistic to use for valuing pitching quality. How is showing that good pitchers get more Wins than bad pitchers busting a myth? Greg Maddux didn’t luck his way to 355 Wins, and no one who pooh-poohs Wins thinks his Win total is a result of randomness, unrelated to his ability. It’s the magnitude of the correlation that is important here.

The R2 reveals the percentage of the change in the dependent variable (ERA+ in this example) explained by changes in the independent variable(s) (Wins or Wins%). The remainder is due to explanatory factors not included in the model. Now, R2 can be tricky to interpret and it is sensitive to sample size; but, in general, the results indicate that 50% of the difference in ERA+ across pitchers can be explained by differences in Wins. That’s the problem, not evidence to the contrary. The main knock against Wins is that pitchers have control over only one half of the game: half the game is defense (50%) and the other half is offense (50%). An R2 of close to 0.50 confirms rather than debunks this notion.

When choosing performance metrics, it is important to use three criteria:
1) How well does it correlate with output? — Wins doesn’t do so bad here: Wins are correlated with run prevention. Still, other metrics of pitcher performance are far superior, and the life-boat circumstances when someone might need Wins to value a pitcher don’t happen. Why bring this up? No one has suggested that Wins and ability are uncorrelated.

2) How well does it measure ability? — It measures ability, but it is heavily polluted by outside factors (offense and fielding). This is the criterion used to justify using DIPs over ERA. If you want to know the statistic that most strongly correlates with run prevention for pitchers, it’s ERA by a longshot. It is almost a pure recording of the runs pitchers give up, so of course the correlation will be strong. The problem is that pitchers themselves don’t have much control over a major component of ERA: balls that are put into play. ERA fluctuates significantly from season to season for pitchers because it is so dependent on balls in play. DIPS measures are preferred over ERA because they more accurately capture actual pitcher contributions to run prevention, not because they correlate more strongly with run prevention. Similarly, Wins capture some aspects of pitcher ability, but a huge chunk of the contribution is determined by something beyond pitcher control. And the regression estimates that the explained variance of ERA+ are consistent with Wins reflecting half of what pitchers contribute to generating this metric.

3) How well does it match our intuition as to what matters? — -This criterion isn’t all that relevant in this case, and is reflected in the analysis in criterion 2. I use this rule in situations where correlations yield counterintuitive values. For example, strikeouts and home-run hitting are positively correlated; however, suggesting that a hitter should strike out more to increase his power would be wrong.

Gaines is right that Wins includes some useful information regarding pitchers, but the pollution impacts of outside factors are so large that in cases where we see Wins deviate from ERA or DIPs performance expectations that it is Wins that contains the misleading information. There is no reason to use Wins to evaluate pitcher ability. It is neither a very good nor great indicator of a pitcher’s value.

* A footnote to the article states that R2 ranges from -1 to 1 with greater positive (negative) values indicating a stronger correlation. This is incorrect. R2 ranges from 0 to 1. I was curious if the author was using a correlation coefficient R, which does range from -1 to 1 but has a different interpretation in terms of measuring explained variance. However, the graphs and intuition make it look as though the descriptive footnote is incorrect, not the main text of the analysis.

3 Responses “Rebunking “Wins””

1. Ken Houghton says:

“A footnote to the article states that R2 ranges from -1 to 1 with greater positive (negative) values indicating a stronger correlation.”

Half of the regressions run should be multipled by i? I knew I was doing something wrong?

I guess I misremembered that season where Nolan Ryan went something like 6-16 for Houston and baseball writers were talking about how sad they were that he wouldn’t win the Cy Young.

Or, for the other great use of Wins, how great an investment Rick Sutcliffe was for the Cubs in the years following his 16-1 in the NL.

2. Devon says:

Excellently explained. I noticed immediately that he was allowing confounding variables into the equation, resulting in a misleading result. I didn’t see the value part being off ’til now.