Well, I didn’t start out trying to do this, but have developed a projection system. Last week, lost in an attempt to solve some question that I have since forgotten, I accidentally came close to generating projections for both hitters and pitchers based solely on player statistics from the previous season. I decided to put a little more effort in to complete it for hitters. Pitchers will come later. While my method is simple, I was surprised at the predictive power that it appears to have employing data only from the previous seasons of hitters. Here are the factors I used to generate the predictions:
Home run rate
Extra-base-hits per hit
Batting average on balls in play
I also controlled for the age, league, and park of the player. I used these factors in some form or another to estimate the batting average, on-base percentage, and isolated power for each player. From these I generated SLG and OPS. Now, this system is very simple.
I have only projected seasons for players with a minimum of 150 plate appearances in 2004.
I assume that every player is playing for the same team as last year (adjusting for new teams is a pain, I may do this later).
I project players who stayed on the same team for the entire season, because it’s a pain to look at split seasons (sorry, no Beltran projection).
I do not look at minor league stats.
I only project the big-4 stats: AVG, OBP, SLG, and OPS.
I would like to fix all of these problems, but time is scarce right now. Hopefully, soon I can make changes at a latet time.
So, how does the model predict? Well, using player seasons from 1998-2004 the predicted OPS explains about 50% of the variance of the actual OPS. The root-mean-squared-error (RMSE) is 0.086, which means two-thirds of the observations lie within 86 OPS “points” of the prediction. In the 2004 Baseball Prospectus, Nate Silver compares six projection systems for 2003. The systems ranged from explaining 42-50% of the variance of OPS with RMSEs from 0.085-0.098. When I apply my system to 2003 it explains 53% of the variance with an RMSE of 0.086. In fairness to these other systems, I am only looking at players with more than 150 PAs in the major leagues, not minor leaguers, foreign players, or previously injured players. But, I think that it is interesting that the system seems to be projecting similar numbers.
One of the most obvious weaknesses of the system looks to actually be a strength, which I did not expect to find. looking only at one previous season seems to tell us a lot about a player, even a player who deviates from his career norm. Look at Chipper Jones’s 2004 actual, 2005 projected, and career lines:
Year AVG OBP SLG OPS 2004 .248 .362 .485 .847 2005 .289 .386 .543 .928 Career .304 .401 .537 .937
As Braves fans know, Chipper had a horrible year last year based on his career stats, which we assumed was a product of his injury. But, now I’m not so sure he didn’t just have some bad luck. The system I employed has no way of knowing that Chipper was a career .937 OPS guy. But, from the information that I included from last year, and knowing his age and park, it concluded Chipper would do 80 points better in 2005. In fact, PECOTA (which I will not post here) projects Chipper’s numbers to be worse than mine, even though it includes Chipper’s past good performances. Now, that doesn’t mean PECOTA is wrong, but it is interesting that the system I developed has projected such a huge jump for a 33 year-old player that is consistent with his career output.
Anyway, here are the 2005 SSPS projections. If you have any thoughts or suggestions, please feel free to pass them along to me. I may or may not make updates, but I hope to post pitcher projections shortly.