Category Archives: Sabermetrics

Exploring the Impact of Fielding on Wins

We’ve previously used Out of the Park Baseball (OOTP) to test out theories on hitting, such as how well OPS and Runs Created predict team win totals. The sabermetrics folk have made great strides in trying to create meaningful statistics for fielding, including Ultimate Zone Rating (UZR), Total Zone (TZ), and Defensive Runs Saved (DRS). We won’t go into great detail about what each of those do – FanGraphs does a better job than we ever could. But it’s not as easy to take the results of any of these statistics and translate it to what matters most – a player’s contributions to a team’s win total.

Baseball Reference does include DRS into its WAR calculations, but there’s always a danger when we’re extrapolating one step beyond any one particular calculation. For instance, DRS provides an estimate of runs saved which is then used in a calculation to estimate how many additional wins you might expect. But each of those calculations will have an error range and will be impacted by a myriad of other factors. We were looking to use OOTP for a more direct way to see how fielding impacts a team’s win total.

Our first foray simply looked at teams with different overall fielding capabilities. OOTP uses several different ratings for fielding, available when editing player characteristics. For instance for an infielder there is Infield Range, Infield Error, Infield Arm, and Turn Double Plays. Each rating is based on a scale of 1-250.

OOTP Fielding

We set up an 11-team league, with each player on each team having the same overall fielding ability but with each team varying in their abilities. So for instance one team had each player with a “1” rating for each fielding ability, while another team had each player with a “250” for each fielding ability. All players had the same league average ratings for hitting. All pitchers were equivalent pitchers with average ratings, and an average ground/fly ratio.

We simmed three seasons (with all injuries and player development turned off). Of course, the better fielding teams did better, but it was somewhat surprising as to how much better they did. The team made up of the highest rated fielders average a record of 113-49 with the team made of the lowest rated fielders went and average of 42-120.

What was also interesting were the number of errors committed per game. The best fielding team committed only .28 errors per game with the worst fielding team 1.31. We would have thought with everyone on the team having a 1 rating for every fielding attribute that they would have kicked and thrown the ball around more. But they still on average gave one extra out to the other team than the best fielding team. By comparison in 2014 the Reds had the fewest errors (.62 errors/game) while the Indians had the most (.72 errors/game).
The more important difference seemed to be in balls the fielders didn’t get to due to range issues. Defensive efficiency for the best fielding team was .768 while for the worst it was .606. In 2014 the best team DEF was .712 by the Reds and the worst was .672 by the Twins.

So let’s try to extrapolate this to some meaningful MLB differences. Since the original league took fielding ratings to extremes, we created a league with teams whose defensive ratings more closely resembled MLB. In this 9-team league, fielding ratings for all players ranged between 115 and 155 (the range in the original sim which more closely resembled MLB fielding stats).

Again we simmed three seasons, and the difference between the first and last place teams was again quite large. The top fielding team went on average 92-70 while the worst fielding team went 71-91 – a whole 21 game difference. Here are the results:

Final stats

Along with charts for errors/game compared to wins and team DEF compared to wins.

Def and wins

Errors per game

There are certainly many factors that can influence these results – most notably around balls hit in play (e.g. increased strikeout rate, HR %). But this certainly does suggest that getting a good grasp on accurately rating fielders can have a big impact on a team’s win total.

How well do wOBA and RC Predict Team Performance?

Okay, so we’ve already done two posts looking at OOTP leagues filled with clones of two players: Slappy Slapstick and Sluggish Slugger. One showed that Sluggish, the low BA guy with sexy power, got walloped head to head by Slappy, the unsexy high BA no power guy. The second showed the same in an MLB environment, but only when Slappy and Sluggish both had OPS high above the league average. Sluggish was better in the MLB environment when both had league average OPS.

These sims showed the limitations of OPS – the first big sabermetric stat to make its way into national telecasts – certainly lacks somewhat in being a robust stat to value all players. Being an arbitrary stat simply combining OBP with SLG it’s not surprising that it lacks robustness. So we went looking for something that might work better.

So we turned to wOBA (weighted On-Base Average). This stat, created by Tom Tango, is based on the common sense premise that all hits are not created equal. The stat uses aggregate league totals to weight the value of each method of getting on base (a good description of wOBA and how it is calculated can be found at FanGraphs).

Unfortunately, OOTP does not deal with wOBA, so transferring this to the Slappy/Sluggish universe took a little bit of work. First, we ran one season with Slappy and Sluggish and calculated the weights for wOBA using league totals, and modified the abilities of Slappy and Sluggish to make them equivalent in wOBA and equal to the wOBA from the previous season. This, by the way, gave a rather sizable advantage in OPS to the Sluggers (.887 to .799). Their attributes stats predicted a line for the Slappy’s of .347/.452/.799 with no HR. The Sluggers were designed to go .253/.303/.887 with 42 HR.

Then we set them loose on 5 seasons – after each season we restored the league back so as not to mess with the weights for wOBA which change from year to year.

In this universe, the results were much closer. Teams made up of Slappy’s won an average of 85 games a year with teams made up of Sluggish Slugger’s won an average of 77. While this still might seem an advantage for the Slappy’s, you have to keep in mind we took two very extreme players – the Slappy’s were give the lowest possible rating (1) for gap and power attributes. Teams made up of Slappy’s never hit more than 2 home runs in any single season (and while I didn’t bother to comb through the individual box scores I would not be surprised if they were all inside-the-park jobs). Also, to create a league made solely of these players (along with clones of the same average pitcher), would greatly amplify any differences between the two groups. In a MLB environment where there is a variation in terms of players’ skills, these differences would likely be noticeable at all.

Then we did the same with RC (Runs Created), created by Bill James. This is in thanks to a suggestion made by a member of the Baseball Sim Addicts!!! Facebook group. As with wOBA this took a little bit of tweaking but both Slappy and Sluggish were made to have an equivalent RC of 99. Slappy’s stat line was created to be .371/.491/.862 with Sluggish’s working out to .220/.332/.868. After running 5 additional seasons we came out with nearly the exact same overall results: Slappy’s teams finished with an average of 84 wins with the Sluggers finishing with an average of 78.

wOBA and RC certainly did a lot better at evening out the two teams. One could argue that a difference of 7 or 8 games in a simulation designed to greatly exaggerate any differences goes a long way in demonstrating the robustness of the two metrics. And even with these small but consistent differences they are the best metrics available when applied to a typical ML team. It does lead me to wonder though what is behind the small (and in the real world likely meaningless) advantage the Slappy’s have. Do the formulas need some minor tweaking? Is there something in the OOTP game engine?

Update: After a night of thinking about it, it likely has to do with fielding. All players were set to equivalent fielding ratings – but they were all average. Since the Slappy’s had a greater number of balls put in play, it allowed for more opportunities for errors. Looking back at the yearly stats the Sluggers did consistently produce more errors, some of which would have led to runs. While I cannot say for certain at this time, it would look like that could very well be the deciding factor between the two teams.

Slappy’s vs. Sluggers Part 2

My “real” job for the past 20 years has been a researcher. It’s a well-known saying that good research raises more questions than it answers. My previous blog post on singles hitters versus sluggers raised a few questions and comments. One comment came from through Twitter from Geoff M.:

Another well-known fact of research is that a single study will always have inherent limitations (or flaws, if you like). Using just a league of Slappy’s and Sluggers has the shortcoming of potentially amplifying any differences between the two. Just because it shows up in a league made completely out of those types of players doesn’t mean it would have any kind of noticeable impact in a league more representative of MLB.

So I went ahead with Geoff’s suggestion.

The original Slappy vs. Slugger sim gave each player an arbitrary OPS of .800. For my initial sim, I gave each player the league average after a 2014 MLB sim, which came out to .732. Turning off injuries, player development, and not allowing the AI to make any roster changes, I simmed 10 singular seasons with 1 team of Slappy’s, 1 team of Slugger’s, and 28 MLB teams. Both the Slappy and Slugger teams had Average Pitchers who were created with expected stats to be the league average.

The first set of 10 seasons was a bit eye-opening:

Picture2

In only 2 seasons did the Slapsticks win more games than the Sluggers, and as you can see, both teams made up of league average players were just utterly awful, losing on average more than 100 games a season.

This brought up the question of whether the OPS value used affected the outcome. So I did two additional sims: one replicated the original 4-team Slappy/Slugger league with everyone having a .732 OPS and the other replicated the Slappy/Slugger in MLB with each Slappy and Slugger having an .800 OPS.

First, the original 4-team league. Turns out changing the OPS to .732 made no difference, with season after season having the two teams of Slappy’s well ahead of the Sluggers (I also ran several more seasons of the original experiment just to be sure). The Slappy’s consistently won 90+ games with the Sluggers winning 60+. So the second level of OPS made no difference in that sim.

The MLB sim with both Slappy’s and Sluggers having .800 OPS was different. Here is the average performance of each team over ten seasons comparing both sets of sims:

Picture3

In this, the Slappy’s greatly improved their win total and beat out the Sluggers in every category (though OPS was very close). The Slappy’s even had two winning seasons. I wish I had a compelling answer for why the Sluggers outplayed the Slapsticks when each had a low OPS in an MLB environment but the Slapsticks won out in an MLB environment with a higher OPS while the sims with just the 4-team league always showed a consistent Slappy advantage.

At least with the four different sims we ran, the Slappy’s outperformed the Sluggers in three of them, though in a real-life environment it may depend on the value of OPS and not be a very straightforward answer.

If you have any hypotheses feel free to comment below or send us a tweet at @BullpenByComm.