Archive for November, 2009

Aging and Selective Sampling

In several recent posts, I have discussed my study of how baseball players age. It’s a study that I embarked upon with no prior expectations other than that I should observe a pattern of improvement and decline among players. But the fact that my estimates of where the typical aging function reaches its peak differs from the established Sabermetric standard has brought a strong negative response.

Phil Birnbaum has rejected my estimate of peak age for various reasons. In his latest post, he takes issue with my sample (again), which he has concluded is biasing the peak age estimates upwards. Such upward bias is possible and ought to be avoided, which is why I already addressed it in the paper and in follow-ups; however, Birnbaum continues his objection on new grounds.

I begin by focusing on his attempted replication of my model. He admits that it is not a perfect replication, but his deviations are important. I use two common panel estimators for estimating aging, random and fixed effects. What Birnbaum appears to have done is estimated a quasi-fixed effects model by including dummy variables for each player using OLS. It’s not necessarily a bad approach (indeed his results find a peak not far below my estimates), but it’s not equivalent to what I did, and I believe his method leaves out important information, especially the park and league effects.

The bigger issue is that the model, as he defines it, is impossible to estimate. He cannot have done what he claims to have done. Including the mean career performance and player dummies creates linear dependence as a player’s career performance does not change over time, which means separate coefficients cannot be calculated for both the dummies and career performance. My initial thought was that the mean was dropped from the estimation and he didn’t realize it (econometrics packages sometimes do this without much warning), but he does report a coefficient for the mean. Maybe the player dummies do not uniquely identify individuals, I don’t know. Something is going on here, but I’m not sure what it is.

Regardless of whatever the estimation problem is, I want to focus on to his analysis of the sample, which is the heart of his critique. For my study, I used players with more than 5,000 career plate appearances and at least 10 years in the majors. He has already identified that including good players with long careers may as a potential source of bias, and I responded why I used the cutoffs that I did.

Including a list of players who played 10 years or more allows for the smoothing of random fluctuations over time, because we don’t have to worry about players being dropped in and out of the sample. More importantly, it allows for identifying a career baseline for each player from which we can observe how he progresses. It certainly shouldn’t perform worse than the average-yearly-change method. I did not make this choice lightly. My cutoffs were chosen because they fit with cutoffs used by other researchers and I tested the models for sensitivity to cutoff decisions.

The cutoffs were introduced to combat bias induced by non-random entry and exit into and from the league. I find it interesting that Bill James (1982 Abstract) was the one to notice the problem of players dropping out of the sample making it difficult to observe aging. He referred to the unfinished careers of player dropouts as “white space.”

I think that any successful statistical analysis of aging must find, and none of us has yet, some way to deal with “the white space.” The white space is the area of measurable production insufficient to support continued activity….

In the Palmer study [observing average performance by age], all the players within the white space are not counted; in the VAM study [observing total performance by age] all players in the white space are counted at zero. Neither is correct….

I am trying to write about when a ballplayer is at his peak. I am trying not to write about how one can study these things. But sabermetrics is young, and someday some smart guy will figure out a way to solve the problem. [Emphasis original (pp.196 — 197)]

In order to measure aging without having white space bias the results you have to have a sample to observe both young and old performance. All players have young performance, but not all players have old performance. Typically, early exits are caused by unpredictable injuries that can strike a player of any age or a player just cannot cut it and got to play at an early age due to roster shortages or bad talent assessments. The latter results in a team giving up on a prospect it felt was more promising or a player deciding to move on to something else—something quite common in my 86-year sample when big money was not paid to players. The big problem isn’t measuring all the young guys who play baseball, but measuring the production of old guys who don’t.

Still, in order to alleviate Birnbaum’s fear that the high standards for inclusion are biasing peak age upward, I estimated the impact of age after dropping the sample to 1,000 and eliminating 10-years of play requirement, and I found that the peak age remained at 29. The sample cutoffs do not bias the estimates, and I explained why the estimation procedure does not generate bias. I also showed examples of skills that peaked earlier and later then 29 in the sample. In Birnbaum’s own estimates, he finds extending the samples has only a small effect on the estimated peak; however, he then makes the following claim, “the way the study is structured, that small difference is really a big difference.”

Birnbaum thinks expanding the sample as I did isn’t the proper way to handle to potential aging effects. Instead, he believes it’s necessary to use separate samples of players who have career minimums and maximums. He looks at two sub-groups: players with 1,000 — 3,000 plate appearances, and players with 3,000 — 5,000 plate appearances, and finds that they have peaks earlier than the more-inclusive sample. How should we view information drawn from these samples?

Let’s consider a member of the latter group, Marcus Giles, who had 3,340 plate appearances in his career. What happened to Giles? He started off playing well, making the All-Star team at 25, but was out of the league by 29. What caused Giles to decline? Maybe he had some good luck early on, maybe his performance-enhancing drugs were taken away, or possibly several bizarre injuries took their toll on his body. It’s not really relevant, but I think of Giles’s career as quite odd, and I imagine that many players who play between 3,000 — 5,000 plate appearances (or less) have similar declines in their performances that cause them to leave the league. I’ve never heard anyone argue that what happened to Giles was aging.

The Giles example reveals why Birnbaum’s separate samples of players with shorter careers show earlier peaks. They don’t have the opportunity to show enough change over the course of a career. Some players will enter and progress normally, and then suddenly get knocked out by an injury. Others may jump in and perform at the extreme top of their abilities and earn further playing time, but eventually they will lose time and eventually be cut, or they get hurt and don’t get a chance to demonstrate improvement. We’re not seeing aging but fluctuations around a true-talent level in a small sample. Guys who perform above their abilities get the chance to stick around and stink up the joint, while those who perform below it get less of an opportunity to prove they are better players. The reason why players with longer careers need to be included and players with short careers don’t have to be included is that there are plenty of players who try (or are tried) but can’t cut it; but, players who don’t try but make it later don’t exist. Thus, if we want to make sure we are capturing the information by extending the sample beyond my preferred sample we need to keep the long careers in the sample, not look at the short-career guys all by themselves and conclude peak age is 27.

But Birnbaum takes his analysis one step further. Rather than arguing that players with short careers have earlier peaks—you could make that argument, but I don’t know how you identify the short from the long when in-career roster decisions must be made—he believes that the short-career estimates need to be heavily weighted to generate an average for all players. He believes that the long careers of players cause their high peaks to be overweighted when estimating an average peak because they are counted more with more career observations.

If a player has a 15-year career, with a peak at age 29, he gets fifteen “29” entries in the database. If another player has a 3-year career with a peak of 27, he gets only three “27” entries. So instead of the result working out to 28, which is truly the average peak of the two players, it works out to 28.7.

That’s not how it works. His belief is based on a misunderstanding of how least-squares generates the estimates to calculate the peak. There is no average calculated from each player, and especially not from counting multiple observations for players who play more. The procedure selects weights to minimize prediction error across the sample of observations. The additional observations add more information about how players age at older ages and does not generate additional “entries in the database” from which an average peak is calculated. We’re not drawing peaks from a sample of peaks that includes repeated observations of players, as Birnbaum implies. The expanded sample provides more information from which the procedure estimates the impact of the weights. If anything, the large number of short-career players included in the expanded sample (like Marcus Giles) pulls the estimated peak downward. If we were to take a weighted average, we would do so to reduce, not increase, the impact of players with short careers. Yet still, even with many more players holding short careers, the procedure estimates that players continue to peak in their late-20s.

Birnbaum’s misunderstanding causes him to try to correct for a bias that isn’t there by taking the weighted average of each sub-group by the number of players, which pulls the estimated peak age downward. Taking a weighted average of age groups that gives more weight to a biased sub-sample isn’t proper. Basically, he’s using a sample of early-exiters to estimate general aging effects for all players. These samples don’t remove selection bias; they suffer from it more than the larger sample. To use this calculation to conclude that the average player peak is 27 incorrect.

In conclusion, I find human aging to be a fascinating subject to study, and I have enjoyed reading the large academic and sabermetric literature on the topic. I plan continue my research in this area beyond baseball. It really doesn’t matter to me where peak age for baseball players is. I looked at the data using standard empirical methods to answer the question and I am merely reporting what I found. I have addressed all criticisms raised, but if you are not satisfied with my responses then feel free to continue holding your belief.

— — —

Also, in a preceding post, Birnbaum gets a few things wrong about my study. I want to correct his errors.

— He interprets the aging function as treating all players as identical.

What Bradbury’s model does is take both curves, put them in a blender, and come out with two curves that look exactly the same, peaking in the late 20s.

What I did was estimate an aging function using changes in player performances over time. It’s the same approach used in every single study of aging that I have ever seen. Human beings do differ in many respects; how much they differ when it comes to aging is a question that is difficult to study because of significant noise. So, the best we can do is to pull out the aging information from a large sample of players, which I did. It’s the exact same idea employed by the delta, mode, total value models that have previously been employed to measure aging. I’m just using a different estimator to pick up this effect, and it’s an estimator that allows me to control for many relevant outside factors that have previously not been taken into account. Some players peak early, others late: from this combined information I am estimating an average aging effect. If what I have done is inappropriate, then you have to throw out all previous studies on aging. We can’t say peak age is 27; we have to throw our hands and the air and say “who knows?”

— He states that my estimates are in absolute numbers.

A consequence of the curves having the same shape is that declines are denominated in absolute numbers, rather than percentages of a player’s level. If the model says you lose 5 home runs between age X and age Y, then it assumes *everyone* loses 5 home runs, everyone from Barry Bonds to Juan Pierre — even if Juan Pierre didn’t have 5 home runs a year to lose!

That’s incorrect. I’m estimating home-run rates, not raw home runs. All other stats are estimated as rates except linear weights. This is stated in the paper.

UPDATE (12/11/2009): Phil Birnbaum replies. I have nothing more to offer, you are free to reach your own conclusions.

Friday Links

— My latest Huffington Post column is now up. I discuss why Wins shouldn’t be used to evaluate pitchers.

So, kudos to Law and Carroll for using the right criteria for making their Cy Young picks. It’s fine to disagree with their choices, but the reasoning behind their decisions is much more sound than the reasoning used the chorus of sportswriters who are condemning them.

Rob Neyer disagrees with most of my Hot Stove Myths.

Here is a list of links that expand on my original post.

The number of free agents at a position affects the price of free agents at a position

GMs can buy low and sell high

Every trade has a winner and a loser

Players peak at 27 and old players are worthless (Follow-ups here and here)

Cubs Sign Grabow

According to the Gordon Wittenmyer of the Chicago Sun-Times, the Cubs have agreed to a contract extension with John Grabow. It’s being reported to be for two years and for a total of “at least $7 million.”

I’ve got him worth about $4 million over that span. I have to give this deal the thumbs down.

On Other Methods for Estimating Aging

My recent posts on my aging study have received a fair amount of criticism. There is no doubt that my study is imperfect, but all empirical researchers must make tradeoffs when selecting samples or choosing estimators. Small samples are sometimes preferred over large samples for expediency. Complicated models are sometimes necessary when simple methods mask biases. The goal of any researcher is to settle on a sample and method that has the greatest net benefits, because there is no such thing as a perfect study; and I’m satisfied with what I have found. I didn’t want the peak estimate to be 29 any more than I want it to be 27, but that is the outcome of a study that I feel was well designed.

But while I’ve presented and defended my study, I’ve said very little about why I selected the method I did over other methods. So, I want to discuss a few problems with alternative methods commonly employed to study aging that I was trying to avoid. These problems don’t necessarily doom these methods—they too have their own benefits and costs, and it would be wrong to pick on one side of the ledger—but, the fact that I could avoid these problems was a plus. So, I want to discuss three methods that are sometimes used to measure aging: the annual-change or “delta” method, the bucket method, and the mode method. I also discuss the importance of controlling for the run environment when estimating age effects.

The Annual-Change Method: This method takes a sample of players and averages how much players tend to change as they increase in age every season. For example, we look at players who played consecutive seasons and see how their performance changed. For some it will go up, some will go down. The data will have significant noise, but averaging all players changes reports a general trend. But, this method is subject to a bias in sample selection from who gets to play.

Playing time is a function of present performance and past performance. Because of this, past performance affects the sample in a way that highlights declines. Managers are trying to identify the best players to play. A good performance in the past will keep you in the lineup even if you slump through the short term. Bad performance in the past will prevent playing in the future. To have a two-year sample you have to reach the playing-time minimum in both seasons. To keep this simple, let’s assume that players can have two types of seasons (good and bad), generating the following combinations of seasons in a two year sample: good-good, good-bad, bad-good, and bad-bad. We’ll get plenty of the first two types of seasons, but the latter two will happen less. The draws from year1 and year2 talent pools are not random, because the lucky-good can go from good to bad, but the lucky-bad don’t get the opportunity to go bad to good. I’ll call this phenomena the survivor effect (Fair (2005) notes something similar).

Imagine we have two players who are both true .750 OPS hitters. PlayerA hits .775 in Year1, and PlayerB hits .725 in Year1, because of normal random fluctuations. PlayerB doesn’t get the opportunity to have a Year2 to have a corresponding upward rebound. PlayerA gets to play in Year2 and his performance falls to .725. Possibly in the next round, his Year2 and Year3 won’t be recorded because he’s deemed incapable of playing (unless you’re the Braves and you build an advertising campaign around him). Thus, when we average in the change, we will be averaging in more declines than is reflected by aging.

So why do we see any positive improvement up to the mid-20s at all (26 is where Nate Silver finds that it ends)? The survival effect ought to be less relevant when players are younger, because the aging function is steeper at this point (meaning improvements are larger and likely to overcome bad luck) and managers expect improvement and will be more tolerant of one bad year (”Tough year, kid. Hang in there.”) For older players the effect is the opposite. Being PlayerB at 36 may cause teams to disallow a bounce-back year because they observe may be a sign that his career is over. [Republished from my comments where it was not particularly visible.]

The Bucket Method: This method involves looking at player performances and organizing them into age “buckets.” After doing so, we can compare which buckets have the highest level of performance. This the main method used by Bill James in his initial study of aging in his 1982 Abstract.

The problem with the bucket method depends on what you’re dropping into buckets. If it’s total performance, then the buckets will be filled at younger ages by players who are not necessarily good. If they don’t cut it they will leave the game and replaced by more young players. Therefore, the bucket totals are full from sheer numbers, not difference in performance by players of those ages. And this is biased towards the young, because while many players drop out of baseball, they almost never drop in. A player in his mid-to-late 20s who hasn’t made it will leave the game to start building his human capital for his non-baseball life or the team may give up on him. Older players rarely stick around for the chance to get a few at-bats in their 30s.

If you are looking at average rate performance by age bucket, so as to avoid the summation problem, you have a new issue. Baseball players have to meet a minimum threshold of performance to play. If you meet that, you normally get to stick around, young and old. 20-year-olds, 30-year-olds, and 40-year-olds who can play, will; thus, their average performances will look quite similar. What’s happening is that good players enter early and leave late, making their buckets appear more productive than they are. In his original aging article, James criticizes Pete Palmer for a study that uses such a method and finds very little aging effect at all. James points out that Palmer can’t measure the “white space”—there is not performance to measure, just white space—where players who have declined don’t get a chance to perform. Aging in the white space what I am trying to capture by using multiple regression analysis to control for player quality.

The Mode Method: This method uses the most common age at which players typically have their best season to identify peaks. This method is used not only in sabermetrics but in academic studies of aging. For example, researchers of looked at the most common age at which track and field athletes set world records. An interesting finding from these studies is that while records tend to fall over time, the age at which the athletes break the records does not.

The mode method is imperfect as well, with estimated peak ages biased downward. The reason for this is that there are two main factors that cause players to decline: aging and random non-aging-related injuries. An example of the former is when a player’s reflexes slow and he can’t get around on a fastball. An example of the latter is when a player blows out his ACL sliding hard into a base and he never heals to reach his original potential. Players decline and leave the sport for both reasons, but the latter is definitely not aging. When we look at the mode, we are not differentiating from the cause of deterioration. Because of non-aging attrition, more players will have an opportunity to have peak ages earlier than later. The thing is, it isn’t predictable who will suffer these injuries (though some injuries are associated with age). The attrition isn’t aging, and players who avoid injuries should improve beyond the mode best season. In my study, I identified the mode being less than the mean and median for all players even for those who stayed in the sample. The reason for this is likely unpredictable injury shocks, because plenty of players have good seasons after their 30th birthdays. The mode method is also not very helpful for measuring aging rates: we can find peaks, but can’t track the path to and from them.

Another Important Factor: Aside from the problems with methods above, any study of aging using players over time must account for changes in the environment in which players perform. Baseball’s run environment can shift quite dramatically over the course of a player’s career. Colin Wyers provides an example of how not accounting for changes in run environment can lead to erroneous conclusions about aging.

Wyers looks at all players in 2008 who were 29 and sees how their performance differs from 2006, when they were 27. He finds that players the players had OPS 0.014 lower at 29 than at 27, and declares “That’s not what we should expect to see if the average peak age is in fact 29.” It’s a cherry-picked sample from which no one should draw any conclusions, but the data doesn’t reveal what Wyers thinks it does. In fact, it reveals the exact opposite of what he claims. The run environments were vastly different in 2008 and 2006. In 2006, the league-average OPS was 0.768; in 2009, league-average OPS was 0.749. Thus on average, if players didn’t age at all we would expect their OPS to decline by 0.019. That the sample only declined by 0.014 means that they improved, not declined. The improvement is also consistent with the aging estimates listed in my paper (about a 0.65% difference from the peak). Wyers also argues that from 1997–2008 the general trend was a decline from 27 to 29, but the trend of runs was also declining during this period.

Runs/Game 1997--2008

But I would also like to point out that no matter what you think the true peak might be, the real finding here is the flatness of aging. Good players tend to remain good and bad players tend to remain bad over a range, and will perform slightly better and worse than expected from their late-20s to early-30s. I find that hitters play within two percent of their peaks from age 26 — 32; for an .800 OPS peak player that’s a range from about .780 to .800 for about seven years. I find it interesting that James ends his own study that started it all with a statement with which I agree:

Good hitters stay around, weak hitters don’t. Most players are declining by age 30; all players are declining by age 33. There are difference in rates of decline, but those differences are far less significant for the assessment of future value than are the differing levels of ability.

It’s best not to get caught up in thinking about exact peaks; instead, focus on peak range, which is flat over many years. A player hitting his 27th birthday isn’t starting on the downside of his career, he’s approaching his peak and will be there for many years. A guy who’s thirty may still be playing his best baseball, too. But the most important factor to consider when evaluating a player is the innate talent of the player himself.

More on Player Aging

Phil Birnbaum has a new theory as to why I’m wrong (I suspect it won’t be his last), and he links to others who think I’ve made the same mistake.

This time, my sample is the problem. By choosing a sample of players from 24 — 35 with a minimum of 10 seasons played and 5,000 plate appearances, this “biases” the estimates because I’ve chosen a sample that excludes people with short careers. To demonstrate this, Birnbaum simulates a new world to show that sample choice can affect estimated average peaks. This is irrelevant to what I have done, and shows a serious lack of understanding of the technique I employed. I’m not taking mean of the sample to calculate a peak, I’m estimating an aging function using a common-yet-sophisticated technique designed to see how changing factors of many units over time affect an outcome. Because of the way the technique works, the sample won’t bias a peak estimate as suggested.

This fact is easily seen in the graphs presented in the paper. Below, I post the aging functions for strikeouts per nine innings and walks per nine innings for pitchers on a single graph. The functions have their peaks (denoted by vertical dotted lines) at the opposite ends of the sample—9 years apart. This is a curious finding that I discuss in the paper.

Peak K and BB

Why didn’t they center around the middle of the sample, or why weren’t strikeouts biased upwards due to the career requirements? Just because the sample is pared down doesn’t mean that the technique will be biased one way or the other. The people in the sample still age like normal human beings. The technique captures how these individuals age in accordance with the aging process by looking at how players’ performances change. Strikeouts are shown to peak early because the players in the sample strike out fewer batters as they age—the function actually peaks outside the sample range at 23.56. All of this confusion could be cleared up with an introductory econometrics course.

In any case, it’s always possible that estimates might be sensitive to sample selection. These critiques tend to focus on what could be rather than what is. In the paper, I explain the reasoning for my sample choice by highlighting potential selection bias problems. As I said in the comments: “Including a list of players who played 10 years or more allows for the smoothing of random fluctuations over time, because we don’t have to worry about players being dropped in and out of the sample. More importantly, it allows for identifying a career baseline for each player from which we can observe how he progresses. It certainly shouldn’t perform worse than the average-yearly-change method.” I did not make this choice lightly. My cutoffs were chosen because they fit with cutoffs used by other researchers and I tested the models for sensitivity to cutoff decisions.

If you’re not convinced, why don’t we move from the hypothetical to reality. The graph below maps the aging function estimated on the low threshold of 1000 career plate appearance and 300 plate appearances in a season. No more age range, no career-length limit, and a vastly-reduced history of performance. And guess what? Peak age is 29.

Peak age

How Do Players Age?

The last hot stove myth that I previously wrote about has to do with player aging.

Players peak at 27 and old players are worthless — Players peak at 29 — 30. And just because a guy is past his peak doesn’t mean he’s not valuable. The aging process is gradual, more like the Minneapolis Metrodome than an Egyptian pyramid. If a guy was good last year, even if he’s in his mid-30s, he’ll probably be good next year. Now, the older he gets the more dangerous long-run contracts get, but one- and two-year deals are fine.

This may not seem like much of a myth, as the conventional wisdom has long been that baseball players peak around 30. This, it turns out, is a myth of sabermetrics. Many methods have been used, aggregating performance into age-buckets, identifying the most-common (or mode) age at which players have their best season, and calculating average changes in performance from age to age. These methodologies suffer from various problems that could induce bias; therefore, I set out to conducted my own study in away that might satisfy my concerns.

I gathered a sample of major-league baseball players over 86 seasons who had significant career lengths to track performance over time. Without a career, there is no trajectory to follow. I tracked their performances over ages 24 — 35, throwing out younger and older years when only the best players typically play. Following individual players over time allowed me to set baselines for each player according to his ability, while controlling for changes in playing environments which may fluctuate quite a bit over the course of a player’s career. For example, a hitter who started his career in the mid-1980s and finished in the late-1990s might have appeared to have continuously improved, when in fact his higher offensive statistics reflected a jump in league offense. I used z-scores to measure player performance in terms of standard deviations from the league average to measure performance relative to one another.

I poked, prodded, and tortured the data, and it screamed that peak performance occurs around age 29 since the early-1920s, possibly 30 for more-recent players, not matter what changes I made. Some aspects of performance peak earlier while others peak later, but overall players tended to gradually improve until 29 and slowly declined after that. I think it’s particularly interesting that the average pitcher in the sample continues to reduce his walks until his 32nd birthday, while strikeouts peak at 24. Batters also improve in their ability to draw walks into their early-thirties, even after their hitting and power have declined. It appears that players continue improve mentally even after their physical skills are eroding. Maybe there is something to veteran know-how.

Age is often used as a reason to chastise GMs for picking up players past their prime. Though old players may not be what they once were, the evidence indicates they can still be valuable. According to my estimates, a hitter who has a .900 OPS at his peak would be expected to post around an .850 OPS at 35; a pitcher with a peak 3.5 ERA is expected to post around a 3.75 ERA at 35. Yes, age saps athletic skill, but the stock of skill being diminished is also important.

I should also note that previously, Phil Birnbaum argued that the quadratic shape of my estimated aging function could bias the peak. Certainly, this could happen. I looked into the possibility using polynomials of higher magnitudes and fractional polynomial methods that do not require symmetry. The results still hold, so I went with the more parsimonious functional form.

Update: Birnbaum offers a new critique, and I respond.

Defending Hakes and Sauer

It seems that I have upset a few people with one of my interview answers at Chop-n-Change, involving a paper by Jahn Hakes and Skip Sauer. Here’s a brief response that covers the criticism the paper has received (cross-posted in the comments).

— —
1) The goal of Hakes and Sauer was to test the Moneyball hypothesis that OBP was undervalued relative to SLG; hence, the title of the paper.

2) This test must include OBP and SLG in the model. The concept can be broken down and testing further, which they did, but what is interesting is if this central tenet of Moneyball is true. The exercise is not about designing the perfect model for predicting salaries. I vividly recall discussing this fact with the authors at the time the paper was written when I asked them about alternate specifications of the model. They responded that they had done this and this analysis would be a part of another paper, which it was, but were focused on Moneyball itself for this exercise. This then creates the problem of adjusting for playing time. This could be controlled for in ways other than plate appearances (e.g., interaction terms), but the authors ultimately decided the parsimony of their specification made it the right choice. Adding in the impact of all sectors of the labor market is another tough issue. Ideally, you would like to separate the labor classifications, but they are trying to estimate the market price for the entire labor market—reserved and arbitration-eligible players are a part of that market. So, they include dummies to act as a control. Again, interaction terms or some other correction could have been used, but they felt that their final specification was best. And they were able to convince many other economists (colleagues, editors, and referees) at different levels of review that what they produced was the best choice.

3) The goal of the study was to identify if the market was out of whack at the time the book was written. The findings indicate the pre-Moneyball models don’t predict as well as the post-Moneyball sample based on what we would expect them to be. That is a point in favor of the paper, not an objection. Furthermore, in 2001 the labor market was especially out of whack, and I find it odd that it was the specification chosen for close examination. The regression equation was designed to pick up information from real-world data, the values are not something presupposed by the authors. The coefficient on OBP is negative—higher OBP lowers your salary. You don’t need to plug in any values to see that this is counter-intuitive. Part of the reason why the salaries remain so stable when Tangotiger adjusts the inputs is that the higher value for OBP cuts into the impact of SLG. As Hakes and Sauer acknowledge in the text, the coefficients on OBP are not even statistically significant—the market appeared to be ignoring the relevance of OBP at the time. That’s their argument.

4) So, the Hakes and Sauer papers may be imperfect, joining the ranks of every other empirical study ever written. If you think you can do better, here is a solution. Take the freely available data and run alternate specifications. As it stands, the critique is that the perfect is the enemy of the good. If further testing reveals the labor market was not out of whack, then we have an argument.


— My radio interview on minor-league sports (softball and soccer) in Atlanta on WABE.

— Alex Remington has posted an interview at Chop-n-Change where we discuss, what I’ve been up to, GM myopia, the Braves, and sabermetrics versus Sabermetrics.

— My latest Huffington Post article on the importance of Gerald Scully’s model for valuing players.

Tim Hudson’s Hometown Discount

The long-awaited announcement of Tim Hudson‘s new contract with the Braves has finally come. The terms guarantee Hudson $9 million a year over the next three seasons, plus a $1 million buyout of a team option for a fourth year. The fourth-year option also pays out $9 million, so the total value that could be paid out is $36 million over four years. The contract voids a $12 million option for 2010, that the Braves were likely going to buy out for $1 million.

Hudson is an interesting player. He’s ranged from good to dominant. He was really pitching some of his best baseball as a Brave right before his injury. The good news is that he pitched well in his return through 42 innings. With a full offseason to recover, I think there is good reason to believe that he will be back to normal; however, the injury risk may have reduced his value somewhat. I proceed to my valuation with this caveat.

If Hudson pitches as he did in 2007 and 2008 over the course of a full season, then he’ll be worth about $12.5 million per year over the next three seasons. Thus, it appears that Hudson is giving the hometown discount that he promised—smart move by Frank Wren and the Braves. This allows the Braves to trade one of its other starters (who will it be?) and still have pitching stability going into the future.

If you see Hudson out and about in the Atlanta area, be sure to say “thanks”—but, please, don’t pester him. Or, maybe throw a little support to the Hudson Family Foundation. He wants to be in Atlanta, and he has strengthened his club by doing so. It’s nice to have you on board for the long haul, Tim.

Huffington Post Sports

If you read The Huffington Post, you know that one of its deficiencies has been its lack of a sports section. Well, they agreed and have just launched one. I am excited to be a part of this venture and will be posting columns there.

My first column is up (a reprint of my earlier blog post on hot stove myths). You can look for my columns in the future on my author page. Some of my columns will be cross-posted, others will be new material—I haven’t quite figured out how I’m going to do this—but I will make it known when a new column is up.