Sunday, January 11, 2009

I recently received the 2008 annual Baseball Research Journal (volume 37) from SABR (the Society for American Baseball Research). In it, Trent McCotter has published an article entitled "Hitting Streaks Don't Obey Your Rules: Evidence That Hitting Streaks Aren't Just By-Products of Random Variation."

McCotter, a recent Phi Beta Kappa inductee at the University of North Carolina-Chapel Hill who has written extensively on baseball hitting streaks, once again presents an interesting take on the statistical aspect of streak hitting in his latest effort.

McCotter and his faculty collaborator Peter Mucha started by creating a huge database of year-specific game-by-game hitting data for all players active from 1957-2006. Someone who played 10 years would thus have 10 different lines of data. Each line (hypothetically) would look something like the following, where H = getting at least one hit in a game, and N = no hits in the game; in reality, however, there would be up to 162 entries for a player, depending on how many games he appeared in:

HHNNN NHNHN...

For each player-year, McCotter and Mucha then re-sorted the sequences of H's and N's into some random alternative, such as the following (the number of H's and N's would, of course, be constant between the player's actual sequence and the random re-sorting):

NHNNH NHNNH...

In fact, each player-season was randomly re-sorted 10,000 times!

McCotter's reasoning was that, if lengthy hitting streaks were simply a result of random variation on a player's underlying hitting ability, the random simulations should produce as many streaks of a given length as actually occurred in a player's real-life hitting portfolio.

The initial results (summarized in Table 2 of the article) showed the actual frequency of lengthy hitting streaks to be greater than the frequency obtained in the random simulations. For example, 274 actual hitting streaks of 20 or more games occurred in real life, whereas the average of all the excess simulated hitting logs generated 192.43 streaks of that length. For streaks of 25 or more games, 62 actually occurred whereas 35.74 were generated randomly. Similar trends occurred for 30+ and 35+ hitting streaks, although the numbers started to get very small (i.e., 5 streaks of 35 or more games actually occurred in real life, whereas 1.48 were generated randomly).

The greater number of actual, real-life hitting streaks of a given length, relative to the random simulations, is consistent with the idea of a "hot hand" (i.e., a player systematically raising his underlying hitting ability when in the midst of a hot streak), but does not prove the existence of one. As McCotter acknowledges, there could be other reasons for a greater number of lengthy hitting streaks existing than would be expected by chance.

For example, a player could be highly aware of his hitting streak and take special action to perpetuate it, such as an aggressive pull-hitter "going with what he's given" and slapping an outside pitch to the opposite field for a single. Also, a hitter may benefit from a generous ruling of "hit" (vs. "error") by the official scorer. (As an aside, a theory of Joe DiMaggio achieving his record 56-game hitting streak in part through such generosity has been making the rounds.)

Further, McCotter noted that his original random simulations included games in which the batter had not started, which could downwardly affect the numbers of streaks in the simulated sequences (i.e., a non-hit game owing to when the batter only appeared once as a pinch-hitter, could insert itself between hit games in the random sequences, thus holding down the length of hitting streaks).

A second series of simulations was run, this time excluding non-start games. Indeed, much of the difference between the actual and simulated numbers of streaks disappeared. McCotter describes the following finding, as one example:

...in real life for 1957-2006, there were 274 streaks of 20 or more games; the first permutation (including non-starts) had an average of a mere 192 such streaks; and the second permutation (leaving out non-starts) had an average of 259 such streaks. The difference between 259 and 274 may not sound like much, but it is still very significant when viewed over 10,000 permutations, especially since we still aren't quite comparing apples to apples (p. 68).

McCotter concludes his article on the following note:

This study shows that sometimes batters really may have a hot hand, or at least that they adapt their approach to try to keep a long hitting streak going -- and baseball players are nothing if not adapters (p. 69).

To the extent McCotter is claiming evidence for a relatively modest-sized hot-hand effect, subject to other possible interpretations, I would concur with him.

1 comment:

Anonymous said...

The random permutation method really helps us determine whether players are 'streaky-ier' than we'd expect, but it doesn't help much with determining why they are so.
At first, it seems like we could explain almost all of the increase in streaks by saying that player behavior (such as not pinch-hitting during a streak or perhaps swinging at more pitches trying to get a hit) is a controlling factor. But then again, I can't think of any good reasons why we'd see increases in streaks as short as 7 or 8 games. Do players really pay attention to their streaks when it's just a week long? Maybe so, but I'm not sure that, over 50 years' worth of data, it would be so common as to make a difference.
That's what still has me thinking that, even if we try to account for player behavior changes, we still might see evidence of a hot hand.
And this still doesn't mention the fact that, if it's true that players are actively trying to continue their hitting streaks, it must be true that (on the whole) they're being successful at it. This 'make your own result' idea is kind of at the base of clutch hitting and perhaps hot hands, too.