High-scoring games and big-money players are more important than wins in drawing fans to major-league baseball games, according to a Star-Telegram analysis of all 30 teams over the past five years.
— Every increase of 100 runs scored brought in 273,160 fans.
— Every $10 million increase in payroll brought in 130,000 fans.
— Every $10 increase in the cost of attending a game brought in 51,372 fans.
Wow! Especially interesting is the interpretation of a violation of the law of demand, about which Skip notes, “Ahem. Higher prices “brought in” fans? I believe we have a specification issue here!” Clearly, higher prices do not increase consumption, so I decided to further investigate the study. Maybe there is something else wrong with the study.
Since the article does not list the exact methodology I tried to reconstruct it on my own as best as I could. I gathered five years worth of data on attendance, winning percentages, ticket prices, fan cost index (FCI), runs, era, HRs, and opening day payroll by team.* All of this data is available at MLB.com, Doug Pappas’s, and Rodney Fort’s websites. I looked at these variables for all 30 teams from 1999-2003 for 150 total observations (although the observations will be reduced to 120 because of my empirical technique to control for autocorrelation). I estimated the model with dummy variables for each team to capture team-specific factors, and I controlled for detected autocorrelation. If you are familiar with Stata, I used the xtregar command to estimate the model. Here are the regression results.
My results are a little different from the Star-Telegram estimates, but this can be explained away by some small specification issues. But importantly, winning percentage is not statistically different from having no effect on attendance, so I feel like I am on the right track. In fact, only team payroll is statistically significant. So I decided to investigate further. Check out the bivariate relationship between attendance and winning percentage.
This seems pretty real to me, but it might not be. What could be wrong? Well, let’s take a look at the original specification. The model includes winning percentage along with runs and era. This is not much different than including runs scored and runs allowed, which would correlate with the Pythagorean win percentage. This is almost like putting win percentage in the regression twice. This creates a problem known as multicollinearity, which can bias the standard errors upwards and lower t-scores. While I am normally hesitant to drop potentially collinear variables, I think it is the right thing to do in this instance. When these redundant variables are dropped from the analysis the coefficient changes very little and the t-statistic rises. This gives me even more confidence that this is the right thing to do.
Now, that is a little more like it. To put the winning percentage estimate in perspective, a 0.1 increase in winning percentage (e.g., going from .500 to .600 team) is associated with increasing season attendance by about 170,000 fans per year, or about 2,000 fans per game. So, I don’t think it is time to throw out the notion that winning improves attendance just yet.
*(Note: I would like to point out that I did not include a dummy for whether or not a team made the playoffs like the Star-Telegram study, because I think this would only exacerbate multicollinearity the problem…plus it would be a pain to gather.)