# Do MLB Playoff Odds Work?

One of the more fan-accessible advanced stats are playoff odds [technically postseason probabilities]. Playoff odds range from 0% – 100% telling the fan the probability that a certain team will reach the MLB postseason. These are determined by creating a Monte Carlo simulation which runs the baseball season thousands of times [FanGraph runs theirs 10,000 times]. In those simulations, if a team reaches the postseason 5,000 times, then the team is predicted to have a 50% probability for making the postseason. FanGraphs and Baseball Prospectus run these every day, so playoff odds can be collected every day and show the story of a team’s season if they are graphed.

Above is a composite graph of the three different types of teams. The Dodgers were identified as a good team early in the season and their playoff odds stayed high because of consistently good play. The Brewers started their season off strong but had two steep drop offs in early July and early September. Even though the Brewers had more wins than the Dodgers, the FanGraphs playoff odds never valued the Brewers more than the Dodgers. The Royals started slow and had a strong finish to secure themselves their first postseason birth since 1985. All these seasons are different and their stories are captured by the graph. Generally, this is how fans will remember their team’s season — by the storyline.

Since the playoff odds change every day and become either 100% or 0% by the end of the season, the projections need to be compared to the actual results at the end of the season. The interpretation of having a playoff probability of 85% means that 85% of the time teams with the given parameters will make the postseason.

I gathered the entire 2014 season playoff odds from FanGraphs, put their predictions in buckets containing 10% increments of playoff probability. The bucket containing all the predictions for 20% bucket means that 20% of all the predictions in that bucket will go on to postseason. This can be applied to all the buckets 0%, 10%, 20%, etc.

Above is a chart comparing the buckets to the actual results. Since this is only using one year of data and only 10 teams made the playoffs, the results don’t quite match up to the buckets. The desired pattern is encouraging, but I would insist on looking at multiple years before making any real conclusions. The results for any given year is subject to the ‘stories’ of the 30 teams that play that season. For example, the 2014 season did have a team like the 2011 Red Sox, who failed to make the postseason after having a > 95% playoff probability. This is colloquially considered an epic ‘collapse’, but the 95% probability prediction not only implies there’s chance the team might fail, but it PREDICTS that 5% of the teams will fail. So there would be nothing wrong with the playoff odds model if ‘collapses’ like the Red Sox only happened once in a while.

The playoff probability model relies on an expected winning percentage. Unlike a binary variable like making the postseason, a winning percentage has a more continuous quality to the data, so this will make the evaluation of the model easier. For the most part most teams do a good job staying around the initial predicted winning percentage coming really close to the prediction by the end of the season. Not every prediction is correct, but if there are enough good predictions the predictive model is useful. Teams also aren’t static, so bad teams can become worse by trading away players at the trade deadline or improve by acquiring those good players who were traded. There are also factors like injuries or player improvement, that the prediction system can’t account for because they are unpredictable by definition. The following line graph allows you to pick a team and check to see how they did relative to the predicted winning percentage. Some teams are spot on, but there are a few like the Orioles or Red Sox which are really far off.

The residual distribution [the actual values – the predicted values] should be a normal distribution centered around 0 wins. The following graph shows the residual distribution in numbers of wins, the teams in the middle had their actual results close to the predicted values. The values on the edges of the distribution are more extreme deviations. You would expect that improved teams would balance out the teams that got worse. However, the graph is skewed toward the teams that become much worse implying that there would be some mechanism that makes bad teams lose more often. This is where attitude, trades, and changes in strategy would come into play. I’d would go so far to say this is evidence that soft skills of a team like chemistry break down.

Since I don’t have access to more years of FanGraphs projections or other projection systems, I can’t do a full evaluation of the team projections. More years of playoff odds should yield probability buckets that reflect the expectation much better than a single year. This would allow for more than 10 different paths to the postseason to be present in the data. In the absence of this, I would say the playoff odds and predicted win expectancy are on the right track and a good predictor of how a team will perform.

# Pirates 2014 — Bullpen

All the graphs are pulled from this Fangraphs leaderboard.

The Pirates bullpen has a been a source of problems and criticisms for the Pirates this year. At the beginning of 2014, the bullpen had almost the same personnel as the 2013 season. Bullpens can vary wildly from year to year, and the Pirates relievers pitched out of their minds for most of 2013, so you’d expect there to be some fall off. Currently [August 26, 2014], the Pirates lead MLB with 22 blown saves. Personally, I abhor saves and blown saves, but I needed to get this out of the way, since it’s the stat that will get thrown around the most. And for reference Tony Watson [the All-Star] leads the team with 6 blown saves. So there’s that.

I wanted to look at some of the peripheral stats of the Pirates bullpen to understand the entire story. First, the Pirates starters have been terrible this year. They rank last in starter WAR, middle of the pack in FIP, and near the bottom in WPA. Analyzing that situation is for another day, but suffice it to say they give up a lot of runs before the bullpen gets into the game. The smaller the average lead the bullpen has to hold on to, the more often they will give up the lead [accrue a blown save]. Shutdowns and meltdowns are Fangraphs stats which are better for evaluating individual relievers than saves. They provide a broader evaluation of how a pitcher or bullpen has performed rather than just looking at save situations. For a shutdown a pitcher basically adds to the win probability while for a meltdown a pitcher subtracts from the win probability. For instance last night Jared Hughes had a meltdown allowing three runs and inverting the win probability.

The Pirates are in the middle of the pack for both of those stats. There really isn’t anything interesting here.

Finally, the Pirates’ reliever xFIP is not very good. It’s towards the lower end of MLB. xFIP is one of the better park-independent, context-independent predictors of pitching skill. It just uses BB, K, and flyballs [for HR/FB]. This will also ‘adjust’ for some of Grilli and Frieri’s HRs that they gave up when they were struggling earlier this year. Those struggles won’t affect the bullpen moving forward since they are no longer on the team.

After this quick analysis to answer my initial question about the Pirates bullpen, they aren’t good. They aren’t terrible, but they aren’t good. They do have two really good pitchers with Melancon and Watson. Two decent pitchers in Wilson and Hughes. Then the rest aren’t great. Taking this analysis, what could the Pirates do to improve? Frieri was a gamble that didn’t pay off. But honestly, I think from a management stand point, you had to get rid of Grilli to get him out of the closer role. John Axford might help. He’s been good in the 5 appearances for the Pirates so far, and his career xFIP is 3.26 which is pretty good. As far as a trade, ‘proven’ relievers are overvalued in the free agent market, and the trade market was really expensive this year. Overall, one reliever isn’t going to affect your win total dramatically.

# Predicting Baseball Wins with WAR

This is a lot of debate about the usefulness of the comprehensive baseball statistic, WAR — Wins Above Replacement. I don’t think that WAR is the end all statistic, but it is a useful tool. Why? Because it can describe relatively accurately how a player contributes to a team. It also can help fans understand the real impact of one player. I might have to refer people here once people start clamoring that a single player will change the direction of a team at the trade deadline.

If anyone wants a primer on the details of what goes into the WAR stat, check out baseball-reference.com’s comparison between systems. Basically, WAR is the number of statistical wins the player is responsible for above a replacement player. In theory the replacement is the mediocre AAA player that is not a prospect. That statistic is the middle estimate of the impact the player will have, a player can be ‘responsible’ for more wins than their WAR number, but also drastically less. Think of WAR as the average wins he’s responsible for.

For probably over a year, I’ve wanted to see if WAR actually can predict the number of wins a team will have. I forget my original methods of trying to determine this, but this time round, I used FanGraphs’ WAR numbers for both pitching and batting from the last decade of season for all 30 teams. That’s 300 data points. After assembling the data and then running it through a basic linear regression, I was quite happy with what I saw. I’ve heard that if you add 48 to the team’s WAR number that you will get their total wins, and this can be seen mathematically by looking at real data.

I’ve graphed the actual wins to WAR and actual wins to the Pythagorean predicted wins for comparison. [Pythagorean wins performed better.] The linear regression for the WAR comparison actually turns out to be incredibly powerful. The regression coefficient is almost exactly equal to one meaning that each unit increase in WAR means an equal increase in wins. The y-intercept is +48.5, which means for the last decade the number of theoretical replacement wins has been just about 48. This should make sense, since the calculation of WAR is calibrated to a 48 win replacement level. The actual implementation of WAR works really well to predict teams wins. Unfortunately, this model will have a 95% prediction interval of 20 wins. That seems like a lot but, it shows how much luck has to do with a baseball season.

Pythagorean wins are typically used to show how lucky the team has been this year or not. This is actually a slightly better predictor of a teams’ success than WAR. There is less variance since run differential is just one step away from wins. You can see from the histograms that the spread on Pythagorean wins is less than with WAR. This can also be seen in the r-square for the linear regression. Pythagorean wins linear model has an r-square of .87 while the WAR model has an r-square of .77. This ultimately means that 87% and 77% of the variance is explained by the model indicating that the Pythagorean wins is slightly more accurate. The trade off is that WAR can give you player-level detail while run differential is only team-specific.

As always, let’s look at what the Pirates did.

A theme I always harp on was that the 2013 Pirates were good and really lucky. This can be seen by the data point for 2013 falling above the linear regression trend line. If you were wondering 2012 and 2011 (the two ‘collapse’ years) also fall above this line. I don’t know if this is the best way to measure a collapse, but the in-season stats did indicate regression during all three seasons 2011-2013.

# Probability and Sunday Night Baseball

There’s nothing I like more than a bases-loaded, no-outs situation in baseball. This might be my favorite situation/stat no one realizes. There’s around a 15% chance that the team who has the bases loaded will not score at all that inning! 15% might not seem like much, but over the course of the season it happens often.

Let’s set the scene: Bottom of the ninth, down by two, the Pirates knock in a run and get McCutchen on 1st with no outs to move within one run of the Cardinals.

This is a win probability graph FanGraphs has for every game. I’m not entirely sure what all they consider when calculating a win probability, but it mirrors the data I have, so there’s not much to discuss there. Clearly, the closest they came to winning the game was after Barmes walked putting Alvarez, the winning run on 2nd.

source: FanGraphs

According my run probability calculations for 2013, the probability to score at least one run with bases loaded and no outs was lower than the Pirates batting with a runner on second/third or first/third and no outs [Probabilities –123: 77.9%, 1_3: 82.4%, _23: 90.9%] The advantage of having the bases loaded is a walk or HBP brings a runner home, but the downside is there is an easy force at home. That would hurt the Pirates in this instance because Mercer didn’t hit the ball past the pitcher’s mound making for an easy 1-2-3 double play.

# Pirates — Run Probability

Presented without much commentary or analysis. This is how the Pirates fared last year given a certain number of outs and with runners on specific bases. So for example with no ones and nobody on base the Pirates had a 26% chance of a scoring a run from that point in the inning on till the end. So that would score a run once in about every 4 innings. The stat I always reference is bases loaded and no outs. It should be the highest, for the Pirates, it’s not. Runners on 2nd and 3rd with no outs is the highest.

For a point of comparison the black reference lines on the bar graph are the MLB average for that specific base-out state.