I am really enjoying the stats section over at The Hardball Times. For example, Studes has an interesting article on the effect of different types of balls in play on offense. The most intriguing statistic available is the percent of line drives allowed by pitchers. Now that is a tough statistic to find. So I was inspired to grab some of their numbers to play around with how pitchers are doing at preventing statistical runs instead of actual runs. It is possible that a pitcher can walk 27 batters in a game, strikeout none, and still not give up any runs. Possible yes, but highly unlikely. Such a bizarre outcome would require a lot of luck. Most pitchers who walk three batters an inning give up lots of runs. In the language of statistics walking batters is correlated with allowing a statistical estimate of runs. Sometimes they will give up more, sometimes less, but over a large set of observations we can make an average prediction that will cancel out the extremes. I am curious which pitchers have been lucky and unlucky relative to their fellow players. How much is luck distorting ERAs and our perceptions of individual pitchers, especially this early in the season? Since I watch the Braves a good bit I thought I would use them as an example.
First, I want to start out by estimating a model of pitcher performance. Using DIPS theory as a baseline I estimated a linear regression of Earned Runs Allowed (ERA) and Total Runs Allowed per 9 innings (RA/9) as a function of the factors that pitchers can control walks, strikeouts, home runs, and the percent of balls hit that are line drives. While the first three metrics are the 3 main components of the original DIPS theory, MGL has found some evidence that pitchers may have some control of over line drives, which are likely to generate runs (as Studes finds). The results are:
RA/9 = 2.37 – 0.46 (K/9) + 0.74 (BB/9) + 1.25 (HR/9) + 8.75 (LD%)
ERA= 2.27 – 0 .44 (K/9) + 0.74 (BB/9) + 1.15 (HR/9) + 6.98 (LD%)
The R-squares for both estimates were around .55 with a total of 211 observations of NL pitchers.
Next, now that I have my model for estimating ERA and RA/9 I can generate predicted values for all pitchers based on the performance of these four variables. So how well do the predicted values compare to actual values for individual pitchers? Here are the actual, predicted, and residual differences of ERA and RA/9 for Braves pitchers.
I have ordered the pitchers by residual ERA (rERA) with a high residual indicating more luck and lower residual indicating less luck. The standard deviation of the residual difference for ERA and RA/9 is about 3.5 for both metrics. Thus, considering how Gryboski and Ramirez have pitched, they have been the most lucky in terms of their ERAs. Smoltz and Cunnane have been the most unlucky.
So there you have it, fun with statistics.