Introducing the Sabernomics Simple Projection System
Well, I didn’t start out trying to do this, but have developed a projection system. Last week, lost in an attempt to solve some question that I have since forgotten, I accidentally came close to generating projections for both hitters and pitchers based solely on player statistics from the previous season. I decided to put a little more effort in to complete it for hitters. Pitchers will come later. While my method is simple, I was surprised at the predictive power that it appears to have employing data only from the previous seasons of hitters. Here are the factors I used to generate the predictions:
Walk rate
Strikeout rate
Home run rate
Extrabasehits per hit
Batting average on balls in play
I also controlled for the age, league, and park of the player. I used these factors in some form or another to estimate the batting average, onbase percentage, and isolated power for each player. From these I generated SLG and OPS. Now, this system is very simple.
I have only projected seasons for players with a minimum of 150 plate appearances in 2004.
I assume that every player is playing for the same team as last year (adjusting for new teams is a pain, I may do this later).
I project players who stayed on the same team for the entire season, because it’s a pain to look at split seasons (sorry, no Beltran projection).
I do not look at minor league stats.
I only project the big4 stats: AVG, OBP, SLG, and OPS.
I would like to fix all of these problems, but time is scarce right now. Hopefully, soon I can make changes at a latet time.
So, how does the model predict? Well, using player seasons from 19982004 the predicted OPS explains about 50% of the variance of the actual OPS. The rootmeansquarederror (RMSE) is 0.086, which means twothirds of the observations lie within 86 OPS “points” of the prediction. In the 2004 Baseball Prospectus, Nate Silver compares six projection systems for 2003. The systems ranged from explaining 4250% of the variance of OPS with RMSEs from 0.0850.098. When I apply my system to 2003 it explains 53% of the variance with an RMSE of 0.086. In fairness to these other systems, I am only looking at players with more than 150 PAs in the major leagues, not minor leaguers, foreign players, or previously injured players. But, I think that it is interesting that the system seems to be projecting similar numbers.
One of the most obvious weaknesses of the system looks to actually be a strength, which I did not expect to find. looking only at one previous season seems to tell us a lot about a player, even a player who deviates from his career norm. Look at Chipper Jones’s 2004 actual, 2005 projected, and career lines:
Year AVG OBP SLG OPS 2004 .248 .362 .485 .847 2005 .289 .386 .543 .928 Career .304 .401 .537 .937
As Braves fans know, Chipper had a horrible year last year based on his career stats, which we assumed was a product of his injury. But, now I’m not so sure he didn’t just have some bad luck. The system I employed has no way of knowing that Chipper was a career .937 OPS guy. But, from the information that I included from last year, and knowing his age and park, it concluded Chipper would do 80 points better in 2005. In fact, PECOTA (which I will not post here) projects Chipper’s numbers to be worse than mine, even though it includes Chipper’s past good performances. Now, that doesn’t mean PECOTA is wrong, but it is interesting that the system I developed has projected such a huge jump for a 33 yearold player that is consistent with his career output.
Anyway, here are the 2005 SSPS projections. If you have any thoughts or suggestions, please feel free to pass them along to me. I may or may not make updates, but I hope to post pitcher projections shortly.
8 Responses “Introducing the Sabernomics Simple Projection System”
Trackbacks/Pingbacks

bad bankruptcy credit loan personal very
Bad bankruptcy credit …

Kim
Lookks like your page was heavily hit by spam

Kim
Lookks like your page was heavily hit by spam

Kim
Lookks like your page was heavily hit by spam
Looking forward to more on this, JC.
Your system really likes Adam Dunn (.600+ SLG) and hates Ichiro (292/343/372). Obviously this is selfselection, but does this system prefer the TTO mashers vs. slap hitters (Ichiro, and maybe Nomar [who has no projection despite 354 PAs between BOS/CHC] to a lesser extent)?
It’d be interesting to see what your system projected for Darin Erstad after his 2000 season.
I assume that JC could run the projection on Erstad 2000 to generate an expected Erstad 2001 and compare it with real 2001. However that’s probably not a fair test, since both Erstad’s 2000 and 2001 seasons were likely used to generate the model. How much outofsample testing have you used, JC?
If this system regresses hitter H/BIP rates heavily (as some on primer have suggested) I guess it would penalize line drive hitters heavily since they would be more likely to sustain high H/BIP rates long term. Still, very interesting stuff, as always!
I like your hitting projections a lot more than the pitching projections. The big weakness of the pitching projections is that it only predicts ERA, not innings pitched, so one does not get a good idea of how long the pitcher will maintain that ERA.