What we really need, is for the amateurs to clear the floor.
— Bill James
What do you get when people actually listen to Bill James? Michael Schell’s book Baseball’s All-Time Best Sluggers: Adjusted Batting Performance from Strikeouts to Home Runs. Schell is certainly no amateur, he’s a professor of biostatistics at UNC and an example of what can happen when you design your own table-top baseball game as a child. I was nothing short of blown away by Schell’s book. This is serious sabermetrics. It’s complicated, but only because the complications are necessary to find the answers Schell is looking for. He does a magnificent job of making the case for the adjustments he chooses. If you are a bit wary of statistics, you probably won’t want to read about the minutia of the adjustments, but you don’t need to understand the statistical techniques to know why he’s doing what he’s doing. And trust me, if he was doing it wrong Princeton University Press wouldn’t be the publisher.
Schell begins by laying out his plan for the book. The end result is to compile rankings of the 100 greatest career and individual season “sluggers” in baseball history for hits, doubles, triples, doubles-plus-triples (DPT), home runs, runs, rbi, walks, strikeouts, stolen bases, OBP, SLG, and OPS. But, the end result of Schell’s analysis is the development of Event-Specific Batting Runs (ESBR) and Career Batter Rating (CBR), with positional adjustments, to generate the final rankings.
The adjustments that Schell makes are obvious to those familiar with “best ever” debates: seasonal variations, park effects, the talent pool of the leagues, and aging. However, the methods for making these adjustments are not so obvious and are the key to making this book an important contribution to sabermetrics. While the most simple transformations for comparing players across eras is a comparison to the mean, Schell also looks at the variance, skewness, and kurtosis of the events to compare hitters across playing conditions. The most technical tools Schell uses are piecewise linear regression (to evaluate playing eras) and multiple changepoint regression (for park effects). And both methods are clearly explained (just an explanation for interpretation not a technical explanation) in the appendices. One of the best features of the book is the tables of park effects by events for all ball parks in different eras. The park effects are not just for runs or home runs, but Schell breaks them out by offensive category — runs, batting average, DPT, doubles, triples, walks, strikeouts, and home runs by handedness of the batter are all included. The numbers for ever year don’t exist for all parks, but this is the most thorough record of these effects available. These tables alone justify purchasing the book.
The end result of all of this hard work is a group of tables to end the book. These last 80 pages present some answers to the “best ever” questions we’ve all asked. While this certainly won’t end these debates, it’s certainly a huge step in the right direction. In the conclusion, Schell even goes so far as to offer where his analysis could be improved. Schell has done a great service to the baseball analysis community, and those who are interested in sabermetrics ought to read this book. I will add that the book is more of an encyclopedia than a beach reader. It’s something you ought to keep close by while doing research, but it’s not something you want to consume cover to cover. I’ll admit that I haven’t read every word in it yet. Keep it on a bookshelf nearby, and when a question arises pick it up just to see what Schell has to say on the issue.