Category Archives: all articles

MLB 2013 Tree Map

2013 MLB Wins Visualized


Dashboard 1


Want to know where all the wins in MLB came from last year?  This chart tells you all 2431 (30 teams x 82 wins + 1 Gm163) wins came from last season.  The chart is fully interactive.  I do advise using a large screen since the chart is so large.

This is called a tree map, and the area each cell or group of cells represents wins.  To understand this, the teams with the most wins have the largest area.  And then they are sorted left to right and then top to bottom, so Boston is in the top left while Houston is in the bottom right.  The teams as grouped by their team color.  Then there are sub-cells which denote the wins against specific opponents.  Those sub-cells are sized by wins.  So if you look at the first sub-cell in Boston’s group you’ll see the NYY. This is because the Red Sox got the most wins against the Yankees. You’ll find that division foes usually have the most wins since they play each other the most. The winning percentage against that opponent is also listed in the cell, this is so you can evaluate how well the team actually did against that opponent. Did they win more or lose more? The answer will be determined by if the number is below .500 or above .500.

2013 AFC Playoffs and Bayesian Statistics

How does a team like the Steelers go from 0-4 to a dark horse for the final AFC playoff spot to inches away from clinching to the birth?  Then the Chargers who were a dark horse themselves go on an secure the 6th seed?

Philip Rivers mentioned, in an endorphin-high interview, that no one gave them a chance, saying the odds were against the Chargers.  This delves into the different realms of motivational and analytical that I want to stay away from, but how did these odds really work and how did they change through out the day?  Week 17 of the 2013 NFL season was a great example of Bayes’ theorem.

Here’s the math for Bayes’ theorem:

bayes theorem


This is read: the probability of A given that B happens is equal to the probability of B given that A happens times the probability of A divided by the probability of B.  We are going to use this basic form of Bayes’ theorem to understand what happened to the Steelers and Chargers playoff odds throughout the day on December 29th.

The important point I’m going to illustrate here is that probability is not a property inherent in an event.  It is rather a guess or calculation based on known facts or frame of reference at the time.   When that frame of reference changes (in this case when the NFL schedule plays out) we can calculate new odds.  I’ll be using the above mathematical formula to calculate new probabilities as we discover new information through out the day for the Steelers.  Then I’ll compare this with a similar chart for the Chargers, who made the playoffs.

The first and largest problem for this exercise is determining the win/loss probability for the games.  I’ve searched online and there is much disagreement on what the playoffs odds where at the beginning of the day, let alone what any singular games odds were.  I could use Vegas odds as crowd-sourced odds, however, I’ll make this part easy and just make up numbers for illustrative purposes.  I have the Steelers and Chargers win probability weighted high because they were playing  the Browns and the Chiefs’ back-ups.  The Ravens and Dolphins I have at even odds of 0.5.  These could be endlessly debated, but let’s just assume they are correct.

Assumed Probability for Week 17

Assumed probability chart for week 17.

For the Steelers to make the playoffs four things needed to happen.  First they had to win. [P(SteelersW) = .85]  Next the Ravens and Dolphins had to lose . [P(DolphinsL) = P(RavensL) = .5]  And then San Diego had to lose later in the day.  [P(ChargersL) = .75]  If you multiply all these together you will get roughly P(SteelersPlayoffs) = .05.  So there’s a 5% chance with the information at the beginning of the day that the Steelers will make the playoffs.

So let’s use Bayes’ theorem to calculate what the playoff are are if we know the Dolphins lose their game — P(SteelersPlayoffs | DolphinsL).




From just knowing the Dolphins losing their game, you can infer that the Steelers chances of gaining of a playoff birth is twice as likely as it was before.  I should also explain why P(DolphinsLoss | SteelersPlayoffs) is equal to 1.   This term assumes the Steelers made the playoff and asks what the probability is of the Dolphins’ loss given this information.  The Dolphins must lose if the Steelers make the playoffs so the term is equal to 1.  This will be true for every game we are considering.  (This problem becomes more complicated if there are multiple paths to playoffs, because the term will no longer be 1.)

Starting at P(SteelersPlayoffs) = .05, you can calculate the conditional probability in a chain as the Steelers’ win, Dolphins’ loss, and Ravens’ loss occurs.  Then the probability of the Steelers making the playoffs is calculated solely on the remaining game Chiefs/Chargers.  In-game win probabilities are calculated on  They are dependent on time left in the game, score, and field position, and down, and they are independent of team skill.  This is the probability graph for the Chargers game.  I used this for the final two Steelers calculations: just before the Chiefs missed a FG, and then in OT during the Chief’s final drive after the Chargers kicked a FG to take the lead.

Steelers Playoff Probability

Steelers Playoff Probability — Area chart


You can see how the Steelers’ probability changed after each event, and how small the area was until the Chargers almost lost their game.  I will compare this with the Chargers, who had a better chance all through the day, except when the Chiefs were threatening to win.  The large green area is the probability at the end of the Chargers game when they won, and it’s value is 1.0 denoting they have clinched the playoff birth.

Chargers Playoff Probability

Chargers Playoff Probability — Area chart


To respond to Philip River’s on field comments about the Chargers having long odds, it’s misleading to think the Chargers somehow overcame those odds themselves, when they were the favorites to capture the final wild card spot when their game started.  It was the others teams’ losses that increased  the odds in the Chargers favor before they even played.

This exercise serves as an example of how and why probabilities change over time, and it illustrates how probability relies on known information or a reference point.  And how changing the reference point affects the known probability.

2012 Toronto Raptors Correlation

2012 Basketball Scoring Correlation


Below are correlation graphs illustration a significant (albeit slightly weak) correlation between how many points one team will score and how many points their opponents will score in a given game.

This isn’t anything novel, but rather an illustration and confirmation about what you might surmise about teams that play faster score more and their opponents in turn score more, because there are more possessions over the course of the game.   The Pearson’s correlation coefficients are:

Knicks     .2362
Jazz           .3837
Raptors  .4004
Wizards .4753

(The higher value means the game scores are more strongly correlated with each other.  Typically .30 is a good correlation, less than that is rather weak.)


To contrast a sport that doesn’t exhibit this, here’s a break down of the Penguin’s season two years ago (the last full season).  The trend line is virtually straight and the confident interval on the trend line dips negative, suggesting there is no statistically significant correlation between how many goals the Penguins score and the how many goals their opponent’s score in a given game.

2011 Penguins Scatter Plot


The correlation coefficient for the Penguins is .0572, which is not statistically significant.  We can conclude that hockey offenses opperate essentially independent of each other.

To further analyze basketball scoring, it would be good to eliminate overtime games, and to see if the team’s correlation to their schedule is related to how good or bad they are, since over the course of a season, a team plays a rather balanced scheduled.  My thinking is mediocre teams will correlate better over a course of a season versus a good or bad team.