Route Two Soccer – Updating Our Priors

Facebooktwittergoogle_plusreddittumblrmailFacebooktwittergoogle_plusreddittumblrmail

I’ve been traveling this week and haven’t had a chance to catch most of the games yet. So in lieu of diagramming a specific match, I wanted to take a broader perspective—taking stock of the league and the teams now that we’re just about 20% through the season.

The NWSL season so far: we haven’t learned as much as we think we have

The persistent problem here (as with every attempt to analyze a complicated system) is the overwhelming force of randomness. Even when probabilities are set in stone, the actual distribution of results is subject to significant fluctuation. For example, if I flip a coin 10 times, I’d expect to get an equivalent number of heads and tails. And indeed, that’s the most likely result. But I’ll only actually get that specific result about one time in four. 40% of the time I’ll get a 6/4 distribution, one way or the other. And almost a third of the time, I’ll get something outside that range.

Point being: you’re often going to get results that look wildly out of line from your expectation. The issue is how to explain this effect. And there are (broadly speaking) three different possibilities:

  1. Sheer random variance. Perhaps we’re just in one of the 33% of worlds where heads came up at a disproportionate rate.
  2. Something has changed. The coin used to be weighted evenly, but due to some unanticipated effect, it has changed. In this case, we should expect results to remain on this new course.
  3. The initial prediction was wrong. Maybe the results are entirely in line with the true probability. It was simply our own misperception that led us to assign the wrong chance to the event.

Depending on which of these is correct, our expectations going forward will shift pretty significantly. So it’s actually quite important to put new information into context and assess where it leaves us now. And the unfortunate reality is that, as human beings, we are often desperate to impose narrative meaning onto randomness. We might know intellectually that it’s perfectly plausible for a coin to come up heads 8 times out of 10, but in our guts we’ll start to wonder if maybe the coin is lucky.

There have been countless studies of this effect, in everything from sports to weather to financial portfolios. Our natural inclination is to over-interpret the significance of the most recent data points and assume that it creates a new trendline which will proceed indefinitely.

Far more likely, though, is that unlikely outliers are just that: outliers. In that case, we should expect reversion to the mean. As time goes on, as we collect more data, results will trend back toward their expected performance and the outliers will be washed out by the accumulation of data.

By way of example, look to North Carolina who appeared to be unbeatable, right up until they lost. Going into the game, with Carolina riding high and Orlando stuck at the bottom of the table, that result may have seemed unlikely. But you only have to go back a month to find quite a few predicting Orlando and Carolina to be in close competition for a playoff spot. Based on that, Orlando winning at home would be thoroughly unsurprising.

But we have learned some things

All that said, while it’s important to not treat recent results as fully dispositive, we also don’t want to dig in too aggressively. After all, even if reversion to the mean is the most likely explanation for an outlier, that doesn’t mean that we know what the mean is.

The point, after all, isn’t that every result is literally random (that in a given match, every team is as likely to win as to lose). The ‘mean’ is simply the most likely result for a given team. For a good team, over time that might stabilize around 2 points per game. For a terrible team, it might stabilize at 0.5 points per game.

The question is how much five games should change our expectations. And this is where qualitative work becomes more important.

When you’ve got a well-designed model, that has been rigorously tested and analyzed, it will often beat expert predictions, even without the ability to draw ‘thick’ qualitative inferences—simply by virtue of processing power.

But, as we all know, soccer is a complicated game, involve a lot of moving parts. And beyond that, the sort of complex modeling that has been developed in some men’s sports simply doesn’t exist for women’s soccer.

The closest we’ve got for the NWSL is the prediction system at Fivethirtyeight, which appears to be a relatively ‘dumb’ model. That is to say: it knows baseline results but not much else. That’s not a terrible thing, since even with a ‘dumb’ model, you’ll generally get a reasonable assessment. It may be dumb, but that is precisely what keeps it from over-correcting sometimes.

Still, while regressing to the mean is a good starting point, you don’t want to completely ignore the information that you can glean from actually watching the games. After all, we’re all familiar with games where one team dominates but ends up losing from one unlucky bounce, or games when a team creates a ton of chances and just can’t manage to finish. The result is ultimately what matters the most, of course. But for predictive purposes, there is a lot more to a game than just the final scoreline. This is one of the key insights of expected goals.

Alright, so how should we interpret events so far?

My default is to approach things from the perspective of Bayesian inference. We build initial predictions based on the best available evidence and then determine how confident we are in those guesses. These are our priors.

As new information filters in, we assess how it comports with our priors. If our priors were strong, we can regard a modest disconnect as perfectly acceptable, requiring no meaningful update of the prediction. Even very good teams play poorly now and again, and we can safely regard this as just the sort of normal variation that comes with a game that includes significant elements of chance.

In cases of weak priors, new information will be more highly valued, since it can help to ease the fog of uncertainty. However, even here it’s important to remember that small sample sizes are inherently unstable. If you were unsure about the quality of a team a month ago, that should likely remain the dominant theme of your analysis.

The key point here is: if your perspective on a team has shifted significantly after five games, you’re probably overestimating the significance of those games and underplaying the importance of all the work that went into the initial prediction. Over the long term, good predictions should be pretty stick—not shifting too quickly except in relatively rare cases of genuine major transformation.

Updating our priors

Taking all that into account, let me walk through a few of the main priors that were widely (but by no means universally) shared going into this season, to see how they’re faring.

  1. Portland, Chicago, and North Carolina as likely playoff teams

Everything still looks good on this front. Neither Portland nor Chicago has yet played particularly well, but they remain at the top of the table. It would be a decent bet to assume both will play better going forward and draw a bit further from the crowd.

Meanwhile, North Carolina has outperformed the other two, and has been widely regarded as the class of the league so far. And through five games, that has been true. Whether we expect that to continue for the next 19 is more of an open question. The weaknesses diagnosed before the season for them haven’t gone away, so it would be at least a little bit surprising if they continued to pace the league by such a large margin.

  1. Washington, Boston, and Houston as challengers for the bottom

These three were generally regarded as the weakest of the league. So far, nothing we’ve seen from Washington or Houston argues strongly against that premise. Both have shown flashes of quality, but both have also struggled mightily.

Boston, however, have been the darlings of league so far, and are being discussed as a legitimate playoff contender. And they are one of the key points of conflict as we attempt to update our predictions. Just how much should one make from their performance so far? Seven points from five games is good, and clearly shows that they are miles better than they were in 2016. On the other hand, any run of the mill bad team will have stretches like that in a season.

Those results, therefore, are perfectly consistent with the prior that said: ‘Boston will be much improved, turning from a dreadful team into a mediocre one.’

The question is whether Boston’s performances have been good enough to challenge that assumption. After all, they thrashed Seattle (who has been very good in their other three recent games), and played very even with two expected playoff contenders (NC and Chicago), even if they only got a solitary point from those games.

From my perspective, this is a case where new information has only increased the uncertainty. It is still quite possible that Boston could drift back down toward the bottom soon. It’s also quite possible that they continue to play at this level and hang around in the playoff race all season. I haven’t seen enough yet to feel confident in either direction. A month from now, we’ll likely be in a far better position to assess their true talent.

  1. Parity

This was the mantra going into the season, and everything so far has supported the idea. While North Carolina remains a full length ahead of the field, everyone else is packed close, with just four points separating 2nd from 10th.

It’s been a season full of surprising results. But that’s hardly surprising in the broader sense. Because when everyone is reasonably close in quality, you should expect a lot of strange results from game to game, while also expecting those to even out over the long term.

  1. Seattle???

One of the biggest peculiarities this year is Seattle, who have performed exceptionally against two weak teams, played a tough draw against Portland, while also looking awful against Sky Blue and Boston. But again, this shouldn’t necessarily be too surprising, as it fits fairly well with the consensus preseason opinion that Seattle was a flawed team with enough talent to beat anyone but enough weaknesses to fall flat against anyone.

Like Boston, they are tough to lock down. But unlike Boston, there’s no particular reason to expect all that much more clarity. Chances are decent that they’ll simply remain like this all year—mixing good and bad performances evenly enough to stay in the playoff hunt without ever giving their supporters much reason to feel safe.