Archive for Sabermetrics

Baseball Hacks: A Review

Baseball Hacks : Tips & Tools for Analyzing and Winning with Statistics
Joseph Adler
O’Reilly Media, Inc.

This book started popping up for me on Amazon about six months ago. Every time I searched for a book on baseball, my cookies kept telling the folks at Amazon that I’d like this book. But why? I wondered. O’Reilly publishes computer books, what could they be doing publishing a baseball book? In fact, Baseball Hacks is a guide complete with 72 “hacks” for analyzing baseball data. Not only does it show you where to find data, but provides in-depth tools for extracting and analyzing what you get.

A few skills presented include:
— Spidering baseball data off the web, including MLB Gameday data.
— Writing Perl scripts to organize and extract play-by-play data.
— Using MySQL to make tables and extract information.
— Using R, to analyze data with linear regression and graphics tools.

Even if you are not skilled in any of these areas, the book walks you through them. The book is well-written with plenty of examples. Adler is clearly an accomplished sabermetrician with a vast knowledge of the practical tools of the sabermetrician. The book also includes small contributions from Tom Dierickx, Mark Johnson, Matthew Johnson, Ari Kaplan, Pete Palmer, and Brendan Roberts. Additionally, the book’s website has many other examples that are not in the book.

If you want to get into serious sabermetrics research, this is the book you need. And even if you already know a good bit about baseball analysis, there is plenty new in the book that you don’t know. I’ve already recommend the book to several people, and I’m sure I’ll have an opportunity to do so again soon.

The Great Debate Continues

Rich Lederer moderates a discussion at the Baseball Analysts between three sabermetric consultants: Tom Tango (Tangotiger), Mitchel Lichtman (MGL), and Eric Van. This discuss many things of interest, including unionizing sabermetric consultants.

Braves Lineup Analysis

OK, this is really cool. Cyril Morong analyzes run-maximizing lineups based off player OBP and SLG. Ken Arneson “perls it up” for the A’s and posts the script. Then Dave Pinto writes a program that allows you to enter in player stats to generate an optimum lineup for any team of real or hypothetical players. The results are fully linkable, and with the help of, you can e-mail and post on the web these lineups with ease.

So, in a matter of moments, I used my 2006 SSPS estimates for the Braves to generate the 2006 Optimal Braves Lineup.

Pitcher (Based on Smoltz’s career OBP and SLG)

Pretty cool! I love the internet.

Update: I made a slight change to the above analysis, by giving half of the pitcher’s plate appearances to Langerhans, but it didn’t change the order. I wonder why it puts the worst hitter in the 8-hole? I’m going to have to review Morong’s post.

Fantasyland: A Review

Fantasyland: A Season on Baseball’s Lunatic Fringe
Sam Walker
Viking Press

In 2002 Michael Lewis fired the first shot in a culture war taking place within baseball with Moneyball. In his bestseller, Lewis investigates the success of the Oakland A’s, which employ the new ideas of sabermetrics (the scientific study of baseball) to put winning team on the field. The protagonist is Oakland’s General Manager Billy Beane, a failed top-prospect but a successful front office manager. While Beane’s explosive personality and some condescending arrogance documented in the book caused some baseball insiders to grumble upon the book’s publication, it was the methods the A’s employed that created most of the establishment’s backlash. The members of baseball’s establishment—or, “The Club,” as Lewis later labeled his critics—was an old-boys network of GMs, scouts, and media who didn’t take kindly to the implication that the old way of doing things might not be the best way.

Moneyball was an indictment of the old guard. Lewis documented how Ivy League kids with knowledge of computers and mathematics could exploit the mistakes of leather-skinned scouts with one-armed tans. The new methods of measuring player performances that substituted experienced eyes, grizzled wisdom, and a radar gun with sample sizes and confidence intervals was too much. This conflict can be described as jocks versus nerds, experience versus youth, or man versus machine. These are classic human themes, which is why Lewis’s story is so compelling. Few members of The Club would deny the usefulness of stats, but they will always remind you that it’s the human scouts who can pick up things not reflected in numbers. These were the methods acceptable for “fantasy baseball” not the real thing.

The thing is, though Moneyball isn’t the most popular book in MLB front offices, Sam Walker acknowledges in Fantasyland that there has been no rebuttal of the ideas presented. If anything, the Moneyball&#153 philosophy seems to have extended its influence further into the game. This is where the story begins. Walker, who writes about sports for The Wall Street Journal, runs the reverse experiment of Moneyball to weigh in on the “scouts versus stat-heads” debate. Whereas, Lewis documents stat-geeks invading the big leagues, Walker brings the human element to the fantasy sports world, where the sabermetric revolution has strong roots.

Tradition has it that fantasy baseball revolution its origins to a few prominent New York publishers who met in a now defunct New York restaurant, La Rotisserie Française, which would give rise to the original name of fantasy baseball: Rotisserie. However, it turns out that the founding father of the league, Dan Okrent, actually got the idea from a group of professors at the University of Michigan, who played a similar game in the 1960s. So, the game that is played over beer and hot wings to fulfill the unmet dreams of sports fanatics actually has ivory tower origins.

Fantasy sports are now big business in the US, with an estimate of 15 million players of fantasy games. While only those with the most extreme athletic gifts can hope to be a part of real professional sports, fantasy games provide an outlet for the masses to participate. Players select real professional players to play on individual teams, where “owners” reap the benefits of on-field production in several statistical categories. Fantasy players must gather as much information as possible in order to win their leagues. It is the drive of fantasy players to win these contests that has generated new knowledge in projecting player performances based solely of categories in box scores. Many of these ideas fit with the strategy employed by the A’s in Moneyball.

But, Walker wants to challenge the roto-nerds on their turf with information that few fantasy players can get their hands on. Though Walker doesn’t have the knowledge or wisdom of experienced front office personnel, he does have a press pass, which gives him access to information that no fantasy player has. And he’s not just jumping into any fantasy baseball league. He’s hoping to test out his ideas in Tout Wars, an advanced league in which only the top Rotisserie experts can play. Walker’s plan for success is simple, “I’m going to be good at this Rotisserie game, because I know people.”

Walker’s quest begins by building his own “front office” that he uses to compile the vast amounts of information he collects. If big-league teams need a group of people to win, so will Walker. The front office is composed of two diverse personalities with slightly differing skills sets. Nando, the general manager of the Walker team—the Streetwalkers—-is a young fantasy buff. Though he is wise to the statistical bent of the fantasy community, he’s open and intrigued by Walker’s idea. Nando will be the human voice, many times telling the owner what he wants to hear. Sig is a NASA biomathematician working on a PhD. in statistics. Everything is in the numbers for Sig, and he’s less impressed with Walker’s plan, but still intrigued. Sig will clash with the owner and general manager many times over the coming months. But, both will also learn to find comfort in his numbers.

Next, Walker starts his personal scouting trip to spring training. There, he talks to players, scouts, mangers, and GMs to try and get the inside scoop. Although he finds some things, he’s certainly disappointed with his trip. He’s gained a few hunches, but possibly he’s formed some irrational attachments to players for the wrong reasons. Walker will also prowl the clubhouses during the regular season, not just to get dirt on players, but to actually try to influence coaches and managers. From a pitching coach he learns that a pitcher he was thinking about picking up is not suffering from a rumored injury. He also uses data to convince a manager to use a Streetwalker pitcher differently. Certainly, these are things that the normal fantasy player can’t do. But, Walker is frustrated that even with this edge, his advantage doesn’t seem to be helping much.

While Walker’s intent is scientific, testing out the idea that inside information is important in sports, one of the greatest aspects of the book is how the game shaped him personally. He becomes to the fantasy baseball world what George Plimpton became to professional football with Paper Lion. Walker meets all of the fantasy players. The Tout Wars participants form a unique bond of friendship, or maybe “acquaintanceship” is the better term. Tempers erupt, tensions are real, and feelings do get hurt. There is no monetary reward for winning the game, but approbation from the fantasy world means a lot to these men. The players know that whatever they do in life, in the fantasy world these are only other people who understand them.

Just as Bill James was the person bearing the intellectual responsibility for the subject matter of Moneyball, Ron Shandler—author of Baseball Forecaster—plays the part in Fantasyland. Shandler also happens to be as much a participant in the book as any other player, because he is playing in Tout Wars. Shandler is bothered by many things in the real and fantasy baseball. He sees the game changing, and doesn’t necessarily care for the new breed of baseball analyst. He cares about Tout Wars partly because it’s an arena to prove his place in the fantasy baseball world. He doesn’t explicitly say it, but I feel it. And this means more to Shandler than the real game, as he walks away from major league employment—the ultimate dream goal of most serious fantasy players—to concentrate on his fantasy empire. I don’t know enough about Shandler to judge the accuracy of Walker’s portrayal; but, assuming it is correct, I really like the man.

As a player, Walker loses himself in the experiment as he experiences the pangs of the competition, watching his team of players (each of whom received a Streetwalkers t-shirt as a sign of devotion) fail and succeed. He stays up late cheering them, let’s himself go when they slump, and celebrates when they succeed. He learns about the human side of players, and gains an appreciation for what he has. “Man, I love my wife,” Walker says on one occasion. Even if you can reduce a player’s performance down to his numbers, it’s important not to forget about the person. The point ought to be a cliché, but Walker has the writing skill to convey it without getting sappy.

I don’t want to spoil too much, but I don’t I think you will be surprised by the fact that the Streetwalkers don’t end up in first place. Season number one ends up as a learning experience. The real joy in reading the book is the documenting of experiences that every fantasy player feels: injuries causing panic-trades, competitors riding fluke performances, and angst over ethically questionable moves. The book really moves, and I enjoyed reading it quite a bit. In fact, the book caused me to join a baseball fantasy league this year for the first time in a long time.

Though it’s not chronicled in the book, Walker and his front office certainly learned from their mistakes in 2004. I didn’t follow the 2005 Tout Wars season to know how they did it, but both Walker and Nando (on his own team) finished first and second, respectively, in different leagues. Sig moved to the real world of baseball working for the St. Louis Cardinals. These movements seem to mirror Walker’s intuition: it’s not just stats or human observation, it’s both. And the both the real and fantasy worlds of baseball need a little more of what the other has got.

JC on Mazzone in the New York Times

I’d like to thank Jack Curry for mentioning my research in today’s New York Times.

Atlanta’s Pitching Wizard Takes His Magic to Baltimore

Smoltz said Atlanta’s perennial pitching success increased expectations among inexperienced pitchers and sometimes rattled them. Because the Braves had six Cy Young Award winners and nine 20-game winners under Mazzone’s tutelage, Smoltz said some pitchers tried to become the next Maddux or Glavine to impress Mazzone. That pressure, Smoltz said, could be detrimental.

Still, a detailed statistical analysis showed that Mazzone’s coaching makes a major difference. J. C. Bradbury, an economics professor at Sewanee: The University of the South, in Sewanee, Tenn., determined that Mazzone helped pitchers decrease their E.R.A.’s by slightly more than half a run per season.

“That’s a huge number,” Bradbury said.

In Bradbury’s 2004 study, he researched every pitcher who had pitched at least one season for Mazzone and compared their yearly E.R.A.’s with Mazzone and without him. Bradbury, a Braves fan who was skeptical of Mazzone’s effect on pitchers, was surprised by the results. His research is on the Web site

I enjoyed the article quite a bit.

Pinto’s 2005 PMR

Dave Pinto is starting to post the results of his Probabilistic Model of Range over at Baseball Musings. People often e-mail me to ask what defensive metric I think is the best. The answer is easy: PMR.

Not only will Dave be rating players by positions, he occasionally adds a few other bits of relevant commentary. For example, he finds Horacio Ramirez to be the 8th-luckiest pitcher on balls in play in 2005.

Lucky Ellis

I see that Mark Ellis and the A’s are heading for aribitration. Using PrOPS I found that Ellis had the luckiest season in the majors last year. It’s the second luckiest season in the past four years (the luckiest was Scott Podsednik’s 2003). Read more about it in The Harball Times Baseball Annual 2006.

Why Are HOF Voters Ignoring Murphy?

The Hall of Fame voting this year reveals some information about the voters. Mainly, they’re not very consistent. For one, Bruce Sutter gets in but Goose Gossage does not. Have any voters posted their reasons for voting for Sutter but not Gossage? Goose received 100% of ESPN writer’s votes, while Sutter only got 80%. Well, enough people have pointed this out, but I want to look at how voters treated the hitters.

Jim Rice and Andre Dawson were the only position players with a majority of the votes. Dale Murphy comes in with a measly 10.8% of the vote. The thing is, all three of these guys are very similar.

Player		Career		Gold Gloves	MVPs	OPS+	HOF Votes
Jim Rice	16		0		1	128	337
Andre Dawson	21		8		1	119	317	
Dale Murphy	19		5		2	121	56

They all played outfield. They had similar offensive numbers. They all had a few spectacular years, each winning an MVP. All had few post-season opportunities. They played in the same era. Both Dawson and Murphy won several gold gloves. So, come on Braves fans, it’s time to start lobbying. Dawson and Rice have Boston and Chicago fans going for them. It’s time to start up the comparisons to guys who are receiving serious HOF consideration. It may be that none of these guys ever make it, but I think they all should be treated the same by the voters.

Future Hall of Famers

With the announcement that Bruce Sutter will be the only inductee into the Hall of Fame in 2006, I thought I’d post my list of hitters still playing or too recently retired to be eligible whom I predict will be in the Hall of Fame. The methodology I use is the same one I used to examine which eligible players who should be in the HOF. Here they are, with their probabilities of getting in. This list is only for hitters.

Player			P(in HOF)
Barry Bonds		100.00%
Rickey Henderson	99.80%
Frank Thomas		97.44%
Ken Griffey		95.63%
Larry Walker		95.03%
Cal Ripken		91.22%
Roberto Alomar		88.01%
Jeff Bagwell		86.85%
Rafael Palmeiro		83.96%
Barry Larkin		81.51%
Alex Rodriguez		74.10%
Ivan Rodriguez		66.90%
Edgar Martinez		64.03%
Tim Raines		63.32%
Fred McGriff		62.86%
Gary Sheffield		60.90%
Tony Gwynn		60.78%
Mark McGwire		58.73%
Craig Biggio		56.77%
Juan Gonzalez		55.64%
Sammy Sosa		51.77%

There you have it. I don’t think there are too many surprises here.

Who’s Missing From the Hall of Fame?

Well, I probably shouldn’t post this, but why not? If things get out of hand I can just start talking about abortion and crime to settle everybody down. Baseball fans love to talk about who ought to be in the Hall of Fame. And now that the ballot is out and baseball diamond is empty, this is a good time to discuss who belongs and who doesn’t. It’s kind of a silly club, but the exclusiveness of it’s membership gives it some real integrity, which keeps our attention. Very few unworthy baseball figures have a plaque at Cooperstown.

Last week, over at The Baseball Analysts, long-time Bert Blyleven advocate Rich Lederer took his case to the people with a slew of celebrity guests columnists. Bill James also made a strong case in The Hardball Times Baseball Annual 2006. I’m convinced, Mr. Blyleven should be in, let’s just hope that 75% of the BBWAA agrees this year. Now, I have to admit, I feel ashamed. Ashamed that I haven’t taken the time to put up a similar case for my own personal baseball hero.

To put my baseball life in context, I’m 32 years old. I grew up in Charlotte, NC, and both of my parents’ families are from the Atlanta area. When I discovered baseball after throwing out first pitch at a AA Charlotte O’s game, I needed a major league team to follow. One Thanksgiving, I asked my uncle who his favorite team was, and he said “the Braves.” So, from that point forward, I was a Braves fan. And it just so happens that the Braves were starting to have some success. As I approached my 9th birthday the 1982 Braves had me believing that being a Braves fan was fun. Unfortunately, the following seasons were not so fun years to be a Braves fan. But there was always one thing right with the Braves: Dale Murphy. The first thing I used to look at in the paper every morning was Murphy’s standing in the NL home run race. Why even bother to look at the box score or standings? Dale Murphy was the Braves in my mind.

So, of course, I want Dale Murphy in the Hall of Fame for all the wrong reasons. But, that isn’t going to deter me from making a semi-objective inquiry as to Murphy’s HOF credentials. I want to look at all the players who are in the HOF, by any means, and see how Murphy stands up to the credentials of this club’s members. It just so happens, that my method doesn’t isolate one players but looked at all eligible players to see if Murphy clears a benchmark that has been arbitrarily set my the HOF. I approached this project as James did in Whatever Happened to the Hall of Fame? (a.k.a. The Politics of Glory) on a much smaller scale. I wanted to use the data available to any jerk capable of downloading the Lahman Database—I don’t have anything against nice people, but what nice person takes joy in tooling around with the Lahman Database?—to evaluate the worthiness of baseball hitters for the Hall of Fame. I didn’t look at pitchers, because Murphy never pitched, (glances at B-R page) not even once. Let’s look at all the eligible players who played their last game prior to 1995, and try to predict the likelihood of a player being in the HOF in based on several characteristics. The HOF file in the Lahman makes this a very doable task.

Now, my method for predicting who is in and who is out requires me to pick characteristics of players that likely influence HOF voters. This is where things get tricky. I’m sure there is no model I could pick that would be perfect. But, I think few will argue that the criteria I selected are unreasonable. And if you do, the Lahman is at your disposal, so get to work.

My criteria are as follows:
Career linear weights: Hey, it’s the best measure of run production.
Run environment: I include variables in the regression estimate for the average runs per game scored in the league and the average ball park factor during each player’s career.

Position: I classified players by the defensive position at which they played a plurality of games. All outfielders were treated the same.
Gold Gloves: The number of gold gloves won by a player. For players who played prior to the award, I include a variable in the regression to control for this lack of opportunity. I didn’t include defensive stats, because I HATE all publicly available defensive stats. They are stupid and tell us very little. If a guy is going to get into the HOF for his defense, he’s going to win gold gloves and/or play shortstop or catcher.

Awards: In addition to gold gloves, I include the number of MVP awards won.
Longevity: I include the number of seasons played in the league.

Using all of these factors, I employ a probit regression model to estimate the probability that a player is in the Hall of Fame. I include only those players who stopped playing prior to 1995 (arbitrary 10-year cutoff) to ensure players several opportunities to be elected. I then generated predicted probabilities of players being in the Hall of Fame.

Here is the list of players who are not in the HOF, who’s predicted probability of being in the HOF, based on the characteristics listed above, is greater than 50%.

Player            	First    Last    P(in HOF)
Bill Dahlen        	1891    1911    80.18%
Pete Rose        	1963    1986    78.39%
George Van Haltren    	1887    1903    72.86%
Keith Hernandez        	1974    1990    70.99%
Dwight Evans        	1972    1991    68.46%
Dale Murphy        	1976    1993    68.43%
Jimmy Ryan        	1885    1903    66.83%
Bob Elliott       	1939    1953    58.84%
Phil Cavarretta       	1934    1955    57.99%
Bob O'Farrell        	1915    1935    55.68%
Vern Stephens        	1941    1955    52.99%
Bob Johnson	        1933    1945    52.79%
Dolph Camilli        	1933    1945    52.59%
Cupid Childs        	1888    1901    51.64%
Larry Doyle        	1907    1920    50.56%
Deacon McGuire        	1884    1912    50.09%

I’m happy to report that Dale Murphy makes the cut! There are a lot of older players in here, but there are a few more recent names that are quite interesting. Keith Hernandez and Dwight Evans both make the list. Both of these guys James singles out in WHTTHOF? as being worthy, but not in. As for Rose, I don’t support his reinstatement or inclusion in the HOF. He knew what he was doing, and I don’t feel sorry for him.

So, if you have a ballot and you read this post, please take the time to consider Mr. Murphy, as well as the other players on this list. I know he didn’t play for many winners, but he couldn’t really help that. At least he never threw a tempter tantrum to complain about it. You’re not going to damage the high standards of the Hall of Fame that have kept it interesting by electing Murphy. Most of all, I believe Murphy deserves this honor, and I think it’s time he gets the plaque he’s earned.