Aging and Selective Sampling

In several recent posts, I have discussed my study of how baseball players age. It’s a study that I embarked upon with no prior expectations other than that I should observe a pattern of improvement and decline among players. But the fact that my estimates of where the typical aging function reaches its peak differs from the established Sabermetric standard has brought a strong negative response.

Phil Birnbaum has rejected my estimate of peak age for various reasons. In his latest post, he takes issue with my sample (again), which he has concluded is biasing the peak age estimates upwards. Such upward bias is possible and ought to be avoided, which is why I already addressed it in the paper and in follow-ups; however, Birnbaum continues his objection on new grounds.

I begin by focusing on his attempted replication of my model. He admits that it is not a perfect replication, but his deviations are important. I use two common panel estimators for estimating aging, random and fixed effects. What Birnbaum appears to have done is estimated a quasi-fixed effects model by including dummy variables for each player using OLS. It’s not necessarily a bad approach (indeed his results find a peak not far below my estimates), but it’s not equivalent to what I did, and I believe his method leaves out important information, especially the park and league effects.

The bigger issue is that the model, as he defines it, is impossible to estimate. He cannot have done what he claims to have done. Including the mean career performance and player dummies creates linear dependence as a player’s career performance does not change over time, which means separate coefficients cannot be calculated for both the dummies and career performance. My initial thought was that the mean was dropped from the estimation and he didn’t realize it (econometrics packages sometimes do this without much warning), but he does report a coefficient for the mean. Maybe the player dummies do not uniquely identify individuals, I don’t know. Something is going on here, but I’m not sure what it is.

Regardless of whatever the estimation problem is, I want to focus on to his analysis of the sample, which is the heart of his critique. For my study, I used players with more than 5,000 career plate appearances and at least 10 years in the majors. He has already identified that including good players with long careers may as a potential source of bias, and I responded why I used the cutoffs that I did.

Including a list of players who played 10 years or more allows for the smoothing of random fluctuations over time, because we don’t have to worry about players being dropped in and out of the sample. More importantly, it allows for identifying a career baseline for each player from which we can observe how he progresses. It certainly shouldn’t perform worse than the average-yearly-change method. I did not make this choice lightly. My cutoffs were chosen because they fit with cutoffs used by other researchers and I tested the models for sensitivity to cutoff decisions.

The cutoffs were introduced to combat bias induced by non-random entry and exit into and from the league. I find it interesting that Bill James (1982 Abstract) was the one to notice the problem of players dropping out of the sample making it difficult to observe aging. He referred to the unfinished careers of player dropouts as “white space.”

I think that any successful statistical analysis of aging must find, and none of us has yet, some way to deal with “the white space.” The white space is the area of measurable production insufficient to support continued activity….

In the Palmer study [observing average performance by age], all the players within the white space are not counted; in the VAM study [observing total performance by age] all players in the white space are counted at zero. Neither is correct….

I am trying to write about when a ballplayer is at his peak. I am trying not to write about how one can study these things. But sabermetrics is young, and someday some smart guy will figure out a way to solve the problem. [Emphasis original (pp.196 — 197)]

In order to measure aging without having white space bias the results you have to have a sample to observe both young and old performance. All players have young performance, but not all players have old performance. Typically, early exits are caused by unpredictable injuries that can strike a player of any age or a player just cannot cut it and got to play at an early age due to roster shortages or bad talent assessments. The latter results in a team giving up on a prospect it felt was more promising or a player deciding to move on to something else—something quite common in my 86-year sample when big money was not paid to players. The big problem isn’t measuring all the young guys who play baseball, but measuring the production of old guys who don’t.

Still, in order to alleviate Birnbaum’s fear that the high standards for inclusion are biasing peak age upward, I estimated the impact of age after dropping the sample to 1,000 and eliminating 10-years of play requirement, and I found that the peak age remained at 29. The sample cutoffs do not bias the estimates, and I explained why the estimation procedure does not generate bias. I also showed examples of skills that peaked earlier and later then 29 in the sample. In Birnbaum’s own estimates, he finds extending the samples has only a small effect on the estimated peak; however, he then makes the following claim, “the way the study is structured, that small difference is really a big difference.”

Birnbaum thinks expanding the sample as I did isn’t the proper way to handle to potential aging effects. Instead, he believes it’s necessary to use separate samples of players who have career minimums and maximums. He looks at two sub-groups: players with 1,000 — 3,000 plate appearances, and players with 3,000 — 5,000 plate appearances, and finds that they have peaks earlier than the more-inclusive sample. How should we view information drawn from these samples?

Let’s consider a member of the latter group, Marcus Giles, who had 3,340 plate appearances in his career. What happened to Giles? He started off playing well, making the All-Star team at 25, but was out of the league by 29. What caused Giles to decline? Maybe he had some good luck early on, maybe his performance-enhancing drugs were taken away, or possibly several bizarre injuries took their toll on his body. It’s not really relevant, but I think of Giles’s career as quite odd, and I imagine that many players who play between 3,000 — 5,000 plate appearances (or less) have similar declines in their performances that cause them to leave the league. I’ve never heard anyone argue that what happened to Giles was aging.

The Giles example reveals why Birnbaum’s separate samples of players with shorter careers show earlier peaks. They don’t have the opportunity to show enough change over the course of a career. Some players will enter and progress normally, and then suddenly get knocked out by an injury. Others may jump in and perform at the extreme top of their abilities and earn further playing time, but eventually they will lose time and eventually be cut, or they get hurt and don’t get a chance to demonstrate improvement. We’re not seeing aging but fluctuations around a true-talent level in a small sample. Guys who perform above their abilities get the chance to stick around and stink up the joint, while those who perform below it get less of an opportunity to prove they are better players. The reason why players with longer careers need to be included and players with short careers don’t have to be included is that there are plenty of players who try (or are tried) but can’t cut it; but, players who don’t try but make it later don’t exist. Thus, if we want to make sure we are capturing the information by extending the sample beyond my preferred sample we need to keep the long careers in the sample, not look at the short-career guys all by themselves and conclude peak age is 27.

But Birnbaum takes his analysis one step further. Rather than arguing that players with short careers have earlier peaks—you could make that argument, but I don’t know how you identify the short from the long when in-career roster decisions must be made—he believes that the short-career estimates need to be heavily weighted to generate an average for all players. He believes that the long careers of players cause their high peaks to be overweighted when estimating an average peak because they are counted more with more career observations.

If a player has a 15-year career, with a peak at age 29, he gets fifteen “29” entries in the database. If another player has a 3-year career with a peak of 27, he gets only three “27” entries. So instead of the result working out to 28, which is truly the average peak of the two players, it works out to 28.7.

That’s not how it works. His belief is based on a misunderstanding of how least-squares generates the estimates to calculate the peak. There is no average calculated from each player, and especially not from counting multiple observations for players who play more. The procedure selects weights to minimize prediction error across the sample of observations. The additional observations add more information about how players age at older ages and does not generate additional “entries in the database” from which an average peak is calculated. We’re not drawing peaks from a sample of peaks that includes repeated observations of players, as Birnbaum implies. The expanded sample provides more information from which the procedure estimates the impact of the weights. If anything, the large number of short-career players included in the expanded sample (like Marcus Giles) pulls the estimated peak downward. If we were to take a weighted average, we would do so to reduce, not increase, the impact of players with short careers. Yet still, even with many more players holding short careers, the procedure estimates that players continue to peak in their late-20s.

Birnbaum’s misunderstanding causes him to try to correct for a bias that isn’t there by taking the weighted average of each sub-group by the number of players, which pulls the estimated peak age downward. Taking a weighted average of age groups that gives more weight to a biased sub-sample isn’t proper. Basically, he’s using a sample of early-exiters to estimate general aging effects for all players. These samples don’t remove selection bias; they suffer from it more than the larger sample. To use this calculation to conclude that the average player peak is 27 incorrect.

In conclusion, I find human aging to be a fascinating subject to study, and I have enjoyed reading the large academic and sabermetric literature on the topic. I plan continue my research in this area beyond baseball. It really doesn’t matter to me where peak age for baseball players is. I looked at the data using standard empirical methods to answer the question and I am merely reporting what I found. I have addressed all criticisms raised, but if you are not satisfied with my responses then feel free to continue holding your belief.

— — —

Also, in a preceding post, Birnbaum gets a few things wrong about my study. I want to correct his errors.

— He interprets the aging function as treating all players as identical.

What Bradbury’s model does is take both curves, put them in a blender, and come out with two curves that look exactly the same, peaking in the late 20s.

What I did was estimate an aging function using changes in player performances over time. It’s the same approach used in every single study of aging that I have ever seen. Human beings do differ in many respects; how much they differ when it comes to aging is a question that is difficult to study because of significant noise. So, the best we can do is to pull out the aging information from a large sample of players, which I did. It’s the exact same idea employed by the delta, mode, total value models that have previously been employed to measure aging. I’m just using a different estimator to pick up this effect, and it’s an estimator that allows me to control for many relevant outside factors that have previously not been taken into account. Some players peak early, others late: from this combined information I am estimating an average aging effect. If what I have done is inappropriate, then you have to throw out all previous studies on aging. We can’t say peak age is 27; we have to throw our hands and the air and say “who knows?”

— He states that my estimates are in absolute numbers.

A consequence of the curves having the same shape is that declines are denominated in absolute numbers, rather than percentages of a player’s level. If the model says you lose 5 home runs between age X and age Y, then it assumes *everyone* loses 5 home runs, everyone from Barry Bonds to Juan Pierre — even if Juan Pierre didn’t have 5 home runs a year to lose!

That’s incorrect. I’m estimating home-run rates, not raw home runs. All other stats are estimated as rates except linear weights. This is stated in the paper.

UPDATE (12/11/2009): Phil Birnbaum replies. I have nothing more to offer, you are free to reach your own conclusions.

Comments are closed