Archive for September, 2004

The Protection Externality: It Doesn’t Exist

Protection is just one of those things in baseball that is largely believed to exist. Having a good batter behind you in the line-up can help you and a poor batter can hurt you. Now, though protection is part of the conventional baseball wisdom, the sabermetric community seems to be a little more skeptical. Here is a sample of some of the studies I know of.

David Marasco has a nice little write-up on protection in The Protection Mini-FAQ.

In the 1985 Baseball Abstract, Bill James used a case study of Dale Murphy and Bob “I’m tired” Horner. Horner’s great playing shape provided an excellent natural experiment during the 1984 season. Murphy, who batted in front of Horner, hit better with Horner in than Horner out, but the difference was not statistically different. The Grabiner study below has a brief summary of James experiment.

David Grabiner looks at 25 players from the 1991 AL who bat have many at-bats with and without a slugger on-deck. He found no evidence that a better on-deck batter helped the preceding hitter.

Dylan Wright uses a method similar to Grabiner’s for the 2002 NL. For his “protectors” and “protectees” he finds mixed results.

In general, these studies used matched pairs of hitters that reflect when we think protection should be happening. Mainly, we have two good hitters next to each other in the batting order. While this is one way to do it, I think the search for natural experiments in matched pairs unnecessarily limits the sample size. As an economist, I see “protection” as an externality. That is, the contributions of one player are spilling over onto another player to generate harm or benefit. It helps to have Barry Bonds bat behind you, it hurts to have Neifi Perez. If protection exists, I view it as a continuous concept that applies to all players. Bad players hurt preceding hitters, while good players help them. And the degree of ability affects the amount of the spillover. Why not look at the impact of the quality of the on-deck hitter to see if he is impacting the current hitter in all situations?

So, I did…or more correctly, Doug Drinen and I did. We’d been discussing the concept of protection for a while, but this summer we finally broke down and did something about it with play-by-play data. Using Retrosheet event files we were able to estimate the impact of every on-deck hitter on the current hitter from 1984-1992. The play-by-play data allowed us to control for the game situation during every plate appearance. While we were looking at protection, we were also curious in identifying another possible spillover, which we call the effort externality. While having a good hitter batting behind you might put more balls in the strike-zone, it doesn’t mean these pitches are of the same quality than with a poor hitter on-deck. It’s not that the pitcher just wants to avoid walking a batter when a good hitter follows. The pitcher wants to keep the hitter off-base any way he can. Pitchers are not dumb. They understand that putting more balls in the strike-zone increases the chance that the hitter will reach base via a hit, possibly with power. So, pitchers may reach back for a little extra gas in these situations. This means that a good on-deck hitter has reason to lower a current batter’s chances of reaching base via a walk AND a hit. If the effort effect is larger than the protection effect, then a good on-deck hitter can hurt rather than help the batter in front of him. Since the effect is ambiguous we need to go to the data.

The results lead us to not only reject the protection hypothesis, but also we find evidence that good on-deck hitters actually harm the hit and power probabilities of the current batter. This is consistent with the effort hypothesis. However, the magnitude of the spillover is tiny and for all practical purposes the effect is zero. Even very good (bad) hitters have only a very small impact on the batters who precede them.

“But what about [insert possible excluded variable]?” Well, we controlled for a heck of lot of potential outside influences: platoon effects of the batter and the on-deck batter, the base/out configuration, the quality of the pitcher, the score differential, the inning of the game, and the park in which the game was played. Given the number of observations we are convinced that protection is a myth; it doesn’t exist.

If you want to read the study you can find it here. It is full of Greek symbols and confusing terminology (it’s written for an audience of academic asshats), but you can largely skip over most of this to get the big picture. Enjoy!

More on the Estimators

I just wanted to pass along that I have tried adding few things to the MLB Salary Estimators for both position players and pitchers. Still, I cannot improve on the fit of either model. Here is what I have tried so far.

Defense: In the initial model I tried to include some of the more accessible measures of defensive ability, but nothing worked. So, after saying I wouldn’t do it earlier, I went out looking for UZRs (Ultimate Zone Rating). I tried several different measures of UZR and none of them had any impact on salaries. But, I’m not really surprised by this. Defense is so very hard to measure. And while UZR is the best thing that average fans has access to, I suspect the defensive methods used by MLB teams are much better….Or my model sucks.

Pitching: I tried two additions to the pitcher model; one worked and one didn’t. First, I tried controlling for the team of the player. The team should capture some park effects and quality of opponents, not to mention the thriftiness of the team owner. The fit of the model did improve, but the improvement was so small that it was not worth updating the estimator. Next, I tried to control for the park effects of all of the FIP components (K,BB,and HR). It didn’t help.

I would also like to address a common criticism that is misplaced. That is, several people have criticized the estimator because it does not account for part-time players. Well, there is a very simple correction. The model assumes the player gets 500 ABs in a season. If you want to look at a guy with 250, divide by 2.

Thanks for all of your thoughts and suggestions. Keep ’em coming!

The Sabernomics MLB Salary Estimator for Pitchers

Here is The Sabernomics MLB Salary Estimator for Pitchers. Well…sort of. I’m much less happy with this version than with the estimator for position players. So what’s wrong? Well, the model only explains about 55% of the variance of player salaries, which means 45% is due to other stuff. Maybe star-power and leadership are really important for pitchers, I just can’t say.

So take a look at it and play around. I am sure you will find that it nearly impossible to make a pitcher earn what Randy Johnson makes, including using Randy Johnson’s own stats. This may be a general result from the fact that pitchers appear to be overpaid relative to position players, as Studes has found. But, I’m just not sure. Send me your ideas, I would love to hear them. But please, no more “This thing is BS! There is no way [Insert name of favorite good player] ought to make [Insert insane amount of $ here] less than [Insert name of hated bad player].” You’re just embarrassing yourself, and you have missed the point of this exercise. As I said, this is somewhere between a tool and a toy. Have fun, but don’t freak out over it. If I could create a salary estimator that was perfect, I would present it. Since, I can’t, this is the next best thing until someone else comes along with an improvement. AND I whole-heartedly encourage someone making improvements. So have at it.

The Pitcher Estimator Update

The salary estimator for pitchers is nearly ready. In fact, it’s 100% ready, but I am just not happy with it. I cannot explain pitcher salaries with the stats very well, so I am holding back the release hoping that I can think of something that might improve the fit of the model. If not, oh well, I’ll release it anyway. I just feel better putting out the warning to soften any letdown you may feel. Get those expectations way down ;).

The Sabernomics MLB Salary Estimator

Have you ever wondered “what is that player worth?” I ask this question all of the time. I can look at player stats and determine who’s good and who’s bad. I can also look at the salaries of some players and see a real steal or an albatross. But, in many cases it’s just hard to tell. I often see players with good but not great numbers. Maybe they’re not rookies but they are younger than most. And salary? Well, maybe it’s between $3-$7 million. Is that good or bad?

In an effort to minimize my confusion I decided to estimate a statistical model of player salaries. I wanted to my model to include information about player quality and salary guidelines as provided by the labor structure of the game. In the end I used a model based on OBP, Iso-Power, service years, and reserve classification. Using the results from my model, along with some PHP help from Doug Drinen, I then developed the Sabernomics MLB Salary Estimator. If you are interested in a player’s market value, simply enter in a few numbers and out pops the estimated market value of the player. Now, when I say “market value” I am referring to the market value based on the skills that I think are relevant to producing and preventing runs. Right now, the estimator can only handle position players, but I plan to add pitchers to the model shortly.

Also, in the next few days I will publish some lists of the most overpaid and underpaid position players, so stay tuned. If you have some questions about the estimator, check out the FAQ after making an estimate. Thanks again to Doug Drinen for his help.

Addendum: Repoz opened a thread about the estimator on Baseball Primer. So you can read what others have said about the estimator and my responses.