Artifact Insights

How easy can you predict the final Olympic medal table with a simple but data-driven approach?

by | 23 July 2021 | Food for Thought | 0 comments

We have seen the power of simple predictions during our exercise for the football #EURO2020 results prediction game. The simple ArtiBots have predicted the final scores of all matches better than 99.8% of human gamers. You can find more details in this Artifact Insights Blog: https://insights.artifact.swiss/2021/07/why-simple-robots-better-than-humans-predicting-football/

We now thought we would want to reload the sports results prediction experience with one of the most precious events conducted, the Olympic Games in Tokyo. As such we aim to predict the final medal table as of the 9. August once all competitions and Gold Medal decisions have been made for history.

In order do so, I followed the widely used data science methodology CRISP-DM (see: https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining):

  1. I’ve started to analyze the medal table, the different competitions, the way the medals were awarded (e.g., I did not know that there are more than 50 competitions where 2 Bronze Medals are given), special effects such as doping scandals, new competitions, and now as well COVID-19 implications.
  2. With this business understanding I’ve assessed potential data sources from where to obtain the necessary information. I’ve scraped the historic medal tables back to 2000 and the all-time medal table and other inputs, like the competitions, the new disciplines, and the past doping cases from the official IOC site but also from sports magazine and finally Wikipedia.
  3. I’ve then stared to prepare the data to either operationalize the information (e.g., on new competitions and doping cases by country) and to harmonize the information (e.g., different country names and formats)
  4. Once I’ve reached a conform data layer where all information could be analyzed smoothly, I’ve started to run some basic analysis to understand the average, mean, trend, and seasonality / special effects of the data. Based on the modelling, I’ve started to understand how the data across years is connected and was able to define weighting factors for the regression analysis. Before defining the final weightings, I’ve experimented the effects of different factors and have tried to predict previous medal tables (aka A/B testing). In general, the more recent Olympic games should have a much higher weighting factor than the games long ago. E.g., I’ve weighted the 2016 (the last Summer Olympic games) with a factor 13 while the all-time Olympic medal table was rated with a factor of 3 and the 2000 medal table only with a factor of 1. For the geeks among us despite the bottom-up analysis – the weighting factors have turned out to follow the Fibonacci number (see more here: https://en.wikipedia.org/wiki/Fibonacci_number)
  5. With this initial prediction, I then started to iterate the process and included additional data sets, such as the new competitions, potential tied decision (where 2 Silver Medals are given as the athletes had the same results) and other impacting factors to better model the reality.
  6. Bringing it all together, I’ve then applied the prediction model to the past data to predict the #TOKYO2020 Olympic games Gold, Silver & Bronze Medals by country. The results can be observed in the table below – of course I’ve added Switzerland to the Top 10 countries. (The full table can be access at the end of this blog)
RankCountry🥇 Gold Medals🥈 Silver Medals🥉 Bronze MedalsTotal Medals
1USA554242139
2China34222480
3Great Britain26212067
4Russia22192566
5Germany20182159
6France12171746
7Japan12111841
8Australia11151440
9Italy11121134
10South Korea117927
26Switzerland3339

Knowing that this is not a perfect prediction – but we are curious whether this quite straight forward Data Science approach will yield as good results as the #EURO2020 ArtiBots. Interesting, that the algorithm sees USA leading the medal table. Switzerland is performing better than generally expected (goal of 7 medals has been communicated). Overall, there will be 75 nations winning at least one of the predicted 1071 awarded medals across Gold, Silver & Bronze.

We are curious to measure our performance on the 9. August, after the XXXII Olympic Games are closed.

To bridge the time, we can already now start to compare our predictions with the one e.g., from @CNNSport – see http://cnn.it/3wXe6qO and @GracenoteGold. We can see that our forecast of the medal distribution sees about 10% more medals for the US team while we are similar for China. Interesting is that we predict much more medals for the British team as we see them on 3rd rank with 67 medals, CNN shared that Great Britain would end up 8th with only 36 medals.

What is your assessment of these initial predictions? Do you believe in the 9 medals the Swiss team “should” win? What is your view on Russia and Great Britain? Do you think the US team will dominate the Tokyo games? Happy to discuss!

Tokyo 2020 Olympic Games Medal Table Prediction

RankCountry🥇 Gold Medals🥈 Silver Medals🥉 Bronze MedalsTotal Medals
1USA554242139
2China34222480
3Great Britain26212067
4Russia22192566
5Germany20182159
6France12171746
7Japan12111841
8Australia11151440
9Italy11121134
10South Korea117927
11Hungary95620
12Netherlands87722
13Spain67518
14Cuba65819
15Brazil55717
16New Zealand55515
17Kenya55414
18Jamaica54211
19Sweden47516
20Canada461323
21Ukraine45716
22Romania44412
23Poland34714
24Kazakhstan34714
25Czech Republic33510
26Switzerland3339
27Iran3238
28Croatia3227
29Denmark25613
30Belarus24511
31South Africa2428
32Slovakia2325
33Ethiopia2248
34Norway2237
35Georgia2237
36Colombia2237
37Turkey2237
38Finnland2237
39Greece2226
40Nord Korea2226
41Belgium2226
42Thailand2226
43Uzbekistan2158
44Argentina2125
45Azerbaijan14712
46Mexico1337
47Bulgary1236
48Serbia1225
49Slovenia1225
50Indonesia1214
51Lithuania1135
52Chinese Taipeh1023
53Tunisia1023
54Bahamas1012
55Bahrain1001
56Armenia0213
57Malaysia0213
58India0123
59Egypt0123
60Austria0112
61Ireland0112
62Estonia0112
63Venezuela0112
64Mongolia0112
65Portugal0112
66Nigeria0112
67Vietnam0101
68Latvia0101
69Algeria0101
70Philippines0101
71Trinidad and Tobago0011
72Marocco0011
73Israel0011
74Katar0011
Michi Wegmüller

Michi Wegmüller

Co-Founder – Empowering Agile Analytics at Scale

Michi Wegmüller is co-founder of Artifact SA and responsible of Artifact’s Analytics Garage offering. He has more than 15 years of experience in Data and Analytics consulting and has supported a diverse set of Swiss and international clients across industries. He has helped to realize analytics initiatives that are sustainably growing and continuously delivering value to the business and functional units. He is passioned about agile analytics at scale.

Artifact SA

Artifact SA

Accelerating Impact with AI & Data Science

Spearheading in AI & Data Science to accelerate impact for your business in Switzerland. Pragmatic analytics services leader for consulting & implementation.

0 Comments

Submit a Comment

Your email address will not be published.

*

code