What I'm Up To - March Madness Part 4

Joe
Jan 24, 2019
8 min read

Winter break has come to a close, and so it’s time to put a bow on my little March Madness experiment. Since my last post, I took a closer look at the probabilities, went back and updated some of my modeling choices, developed some heuristics for picking game outcomes, and scored them. I’ll go through each of these step by step, then give a conclusion to pull it all together.

After looking at the code, I realized I had some mistakes with previous files. The stochastic analysis was done with 2017 data, even though it was used on 2018 teams. Correcting that just required me to change a single variable, so it wasn’t too difficult to do. The next step was to pick teams.

To pick my teams, I used a few techniques. The first was to make picks purely on matchups (henceforth called the “Matchup” heuristic). This one is pretty straightforward; using the same matchup dictionary from the stochastic analysis, this heuristic picked the team with the higher likelihood of winning each game. With the teams decided, it moved to the next round and did the same thing, until there was only one. This essentially mirrors the way actual tournaments progress chronologically.

For the next heuristic, I picked the team in each matchup with the higher probability of getting to the next round (“Next Round” heuristic). This gave me a little more insulation from the possibility that I’m wrong. To explain, let’s use a smaller version of the tournament with eight teams; the numbers correspond to their order on the bracket (so in the first round, team 1 plays team 2, team 3 plays team 4, and so on), and not their seed. If the second round is supposed to include teams 1, 3, 5, and 7 (based on the matchup probabilities in the first round), then I may choose team 3 to beat team 1, team 7 to beat team 5, and then team 3 to win it all. That would be determined by the first-round matchup probabilities; team 3 has a greater than 50% chance of beating team (separately) 4, team 1, and team 7. But what if the probability between team 4 and team 3 is really slim, almost 50/50? Then there’s a good chance that team 3 doesn’t even make it to the next round, let alone the semifinals or championship. Choosing based on the larger probability of advancing allows the model to consider the uncertainty in previous matchups, and should give a higher expected score. It also takes advantage of all of the Law of Total Probability work I describe in my last post.

The other heuristics involve “anchoring” on a particular round (“Anchoring” heuristics). The heuristic picks a round (like Final Four, or Elite Eight) and chooses the teams most likely to get to those points. Those teams are then selected as winners through all of the previous rounds. That leaves some empty spaces, which are filled by picking the higher probability of being in the next round. For the rounds after the “anchoring” round, the heuristic just uses the matchup probabilities. With this, you’d expect the bracket anchored on the round of 32 to be the same as the first heuristic, which uses matchup probabilities (spoiler alert, it is).

For the sake of completeness, I reversed the anchoring heuristic. The anchoring concept stays the same, but it uses the matchup probabilities first and the next-round probabilities after the anchoring round.

After making these heuristics, I created a function to calculate the expected score of a bracket. It does this using the ESPN scoring method, multiplying the probability of a correct pick (or the probability of a given team getting to a given round) by the value of that correct pick. The function finds the scores for each round, then the total score. This is simply an application of “expected value” from probability.

THen, just for fun, I tested them on the real tournament. Filling out the bracket manually would mean showing results for 127 games, which I didn’t want to do, so I created a function to do it for me. In essence, it iterates through every team in the seed list (in order of bracket appearance, top down) then, on every team, iterates through their NCAA rounds. It starts on the first game entry in the schedule (or the second, for teams that had a First Four matchup) and adds the team’s name to the Round of 64 list. If there’s another game after the first, the function adds the name to the Round of 32. It keeps iterating until the last game.

If you run either of the programs (BuildBrackets.py or it’s tournament-only-trained cousin BuildBrackets_TOURNEY.py), you’ll get the results in a csv file (named “heuristic scores” or “heuristic scores TOURNEY ONLY”).

Intuitively, I expected the Next Round heuristic to be the most effective. Like I mentioned earlier, it allows the program to account for the possibility of making mistakes, so its expected value should be higher. Of course, the expected value isn’t a great predictor of the actual tournament results, so while it performed better than average, it wasn’t the best heuristic to use.

All in all, I’m pretty satisfied with the project. My goal was to continue familiarizing myself with python and a host of different libraries, while solving an open ended problem with my own solution. While I drew a lot of things from tutorials, many of them were my own, using some knowledge I’ve picked up along my young programming “career.”

That said, there’s certainly some room for improvement. I’m sure my machine learning process could use work; for the purposes of this project, I just treated it as a black box. I put the data in, and it spit out a model.

I’ve also heard the phrase “feature selection” thrown around, and I imagine I could have benefited from some preliminary analysis of my “custom” stats. That way, I could eliminate some of the unnecessary stats and choose ones that are more predictive.

I am, however, quite satisfied with the stats I developed. I rarely see analyses that are closely model the way games actually play out (for example, comparing one team’s offensive efficiency to the others defensive efficiency), and I find many sports analyses to be overly simplistic.

A lot of the work I did was done by hand. For example, I had to type out all of the abbreviations for the seeds in the Round of 64. This program also did not consider the “First Four” games for teams seeking to earn a spot in the tournament.

There are some next steps I could do (and that I may do later on). For one, I could simulate the tournament a bunch of times to get a better sense of the performance of the brackets. While expected score is a nice metric to use, picking from a set of alternatives requires more analysis. It’s common to use a 10-50-90 rule to also consider the bottom and top 10% of results.

In addition, the brackets I’ve created are just constructs in my python program. To actually implement them (as of now), I’ll have to manually fill out every bracket. With so many different brackets generated automatically, it would be nice to make the program fill them out for me.

As a final addendum, I’m going to do one of my favorite things, which is explain why I believe some other approaches are both flawed and incorrect.

The first flaw is in the big picture strategy of picking teams. In my experience, your average bracket challenge participant chooses winners for each game, going round by round until the final. As I alluded to earlier with the Next Round heuristic, this approach doesn’t leave any margin for error. It’s so common to see people miss out on a lot of points because the team they have winning the whole tournament didn’t make it past the Elite Eight (something you can weigh using the Law of Total Probability). More fundamentally, this is a terrible way to pick teams. Humans almost never use rigorous methods to predict outcomes. The idea of a “gut feeling” or “passing the eye test” is almost nonsensical with 64 teams in the bracket; you’d have to watch way too much college basketball to pick every game with confidence.

That’s also a reason why you shouldn’t believe any “expert” opinions that aren’t backed up by math. If you look at the numbers, the “experts” rarely do better than average. If you’re going to trust someone else’s qualitative opinion, you may as well just pick the higher seed in every matchup.

There’s another fairly common strategy: estimate there will be “N” upsets in each round, then pick the N most likely ones. THIS STRATEGY IS TERRIBLE! In general, it’s good to predict outcomes based on the mechanics of a process. There’s not some magic force that decides how many upsets are going to occur; the number of upsets is a function of the outcome of all the games. Picking the number of upsets before you evaluate each game models the process entirely backwards.

Another fairly common strategy I found, especially among finance folks, involves picking teams that are “underrated.” The idea here is that you get more bang for your buck; you don’t pick teams that everyone else is picking, so you have a better shot at standing out. Ideally, you’d look at some probabilities created by other people, consider the teams that are picked more than their probability would suggest, then pick the other way. For example, if Kentucky is favored by 90% of bracket fillers to win a particular matchup, but their probability is around 70%, you’d pick the other team to differentiate yourself from the crowd.

Like every persuasive strategy, this one has a nugget of truth. You should most certainly consider the probability of a team winning, not just its status as a matchup favorite. But everything else about it is wrong. The underlying premise of this strategy is, as I alluded to before, related to finance. In finance, it’s most profitable to buy undervalued investments. Later, the market should correct this difference between price and true value, and you can sell for a profit.

But the tournament doesn’t act the same way. While a rational market makes an investment’s price converge to a “true value” as people continue to evaluate and reevaluate its value, the tournament isn’t affected by people adjusting their valuations of different teams. The key distinction here is that markets are dynamic; you know the current state, but that state will change over time. The March Madness tournament, on the other hand, is actually static. There’s only one state of the tournament (a bracket filled with actual results), and that state is unknown when you fill your bracket. In a way, it’s like a multiple choice test. The “state” of the test is the set of correct answers, which you don’t know. If you want to maximize your score on an exam, you fill you the answers you believe to be correct regardless of what your neighbor thinks.

Now, this example is probably too simple to fully encapsulate the “hedge fund” strategy. You have to consider the relative value of winning, doing just okay, or finishing near the bottom of the field (and in most formats, everything that isn’t “winning” carries the same value). It doesn’t change that the only value in this is that you’re different. While it minimizes your chances of performing just average, it may increase the chance that you do worse.

As always, you can find the code and all the requisite files here. The ReadMe doc should explain how to use them, but if you have any questions whatsoever, feel free to contact me at josephzaghrini@gmail.com.

What I'm Up To - March Madness Part 4

Recent Posts

Comentários

Subscribe Form