Several days go, while watching some very erratic home-plate umpiring, I posted a short item to Facebook quoting myself, "The strike zone is a probability density function." I had originally made the statement during a play-by-play radio broadcast of a London Tigers baseball game over 20 years ago. The Tigers were a AA minor league team. The radio station manager asked me not to do that again and not to try to explain it on air.
What does it mean to say the strike zone is a probability density function? Basically, the closer a pitch is to the centre of the strike zone, the more likely is the umpire to call it a strike.
It turns out I was even more right than I thought (if that makes sense). Consider this piece [via John Henderson, former student, colleague, and co-author]. Sure as shootin', pitches on or near the edge of the strike zone (but still inside it) are less likely to be called strikes than pitches near the centre. And pitches outside the zone still have a probability of being called strikes [as was often said about pitches from Greg Maddux].
Here is a graph from the article:
Probability a Pitch Is Called a Strike
The strike zone is indicated by the red bars along the axes. The height of the "mesh" indicates the percentage of times that a pitch in that location was called a strike.
The article then continues, presenting evidence that probably shouldn't surprise me, but it does. If a batter has two strikes, there's a lower probability that the umpire will call the next pitch a strike even if it is in the strike zone. And if the batter has three balls [delete old joke here], there is a greater chance the umpire will call a strike.
To the extent this is true, it affects a batter's (and a pitcher's) strategies. As John wrote,
You are on the mound and have me 0-2 on two 109 mph fastballs. I'm worried about my team but also my .300 average and its associated $2m bonus. The data say I should let the next pitch go by if it's close, in clear contradiction to the conventional wisdom that I "protect the plate".
My, oh my: A multivariate endogenous strike zone with serial correlation and simultaneity bias. That should keep the sabremetricians happy for awhile.