About 3 weeks ago, we released our newest prediction with simple but effective Data Science methods. Based on historical data, we modelled predictions for the Olympic summer games in Tokyo (#Tokio2020). You can find more details in this Artifact Insights Blog: https://insights.artifact.swiss/2021/07/how-easy-can-you-predict-the-final-olympic-medal-table-with-a-simple-but-data-driven-approach/
Now, as the XXXII Olympic Games are closed, it is time to review our predictions with the actual medal table and compare it to other estimations published e.g., by @CNNSport and @GracenoteGold.
Recap of what we’ve done
In summary, the predictions were from reasonable to good (leveraging Lewis’s classification from 1982), but certainly much better than expected given the simple & pragmatic approach for Data Science we have chosen. Our two “top-down” iterations to predict the medal distribution – 1st a historic trend analysis and 2nd various corrections due to e.g., COVID-19, new Olympic disciplines, estimations of doping cases and others. So, we did not estimate each single discipline based on the available data of e.g., past winners and winners from non-Olympic competitions (i.e., “bottom-up” approach).
For the assessment of the forecast accuracy, we have decided to use the standard methods in the field of forecasting – using the Mean Absolute Deviation (MAD) and the Mean Absolute Percentage Error (MAPE). As in literature widely discussed, we decided to use the arithmetic average MAPE (AMAPE) across all individual predictions (ranks, number of gold / silver / bronze medals and total number of medals) over the weighted MAPE (WMAPE) as the results would have led to misleading interpretations, especially for the rank prediction.
In accordance with the definition from C.D. Lewis (1982) published in the Journal of Forecasting under the title “International and Business Forecasting Methods”, we then assessed the forecasts for their respective power as outlined in the below table.
Overall results & key findings
Overall, our predictions were from reasonable to good (as per Lewis’ classification) for forecasting looking at an AMAPE of 32% across all countries (rank measured that gold medals before total medals, which is equal to a MAD of 12 ranks per country across all countries but only a MAD of 5 across the top 50 countries) and an AMAPE of 35% of gold medal wins and an AMAPE of 21% of the total medals wins across the top 50 countries (which is equal to a MAD of 2,6 gold medals per country, respectively 4,7 total medal per country).
Clearly, these results could be further improved adding additional bottom-up analysis and predictions into the Data Science approach. Comparing this overall to the predictions shared by @CNNSport and @GracenoteGold (see https://www.gracenote.com/virtual-medal-table), our prediction algorithms have yield at least comparable and, in some cases, even slightly better accuracies. I.e., while we predicted 17 of the top 20 countries (9 out of the top 10), GracenoteGold/CNN Sport predicted only 15 countries (9 out of the top 10).
Comparting the number of overall medals collected within the top 50 countries, GracenoteGold/CNN Sport had an AMAPE 24.7% (which means a MAD of 6 medals) compared to our algorithms that had an AMAPE of 21.6% (= average absolute error of almost 4.7 medals).
On the other hand, GracenoteGold/CNN achieved an AMAPE of 29% for the top 50 country ranking which outperformed our predictions that achieved an AMAPE of 33%. Likewise, the AMAPE of the gold medal wins was for GracenoteGold 31,4% (which means a MAD of 2.3 gold medals per country) while our algorithm achieved 35% (which means a MAD of 2.6 gold medals per country).
In general, our forecast performed better for the prediction of the total medal, the silver medal and the bronze medal wins while the prediction from GracenoteGold/CNN performed better for the absolute rank of a nation and the number of gold medal wins.
A quick analysis performed to validate the performance and the predictions has been done with regards to the continents as well as the development status of a given nation (as per UN classification in the WESP2020 Annex publication).
Looking at continents and the average rank across all countries, Oceania performed best, followed by Europe and North America – Africa did perform least. It’s interesting to say that our predictions outperformed the predictions from GracenoteGold/CNN quite substantially with an AMAPE of 51% (AMAPE of 74%).
Likewise, the developed countries performed on average 20 ranks better than the countries in transition and the developing countries. Again, the predictions of our forecasts have outperformed the predictions from GracenoteGold/CNN – we achieved an AMAPE of 49% while GracenoteGold achieved 74%. These comparisons between continent and development status however might be biased due to the incompleteness of the data of the GracenoteGold/CNN predictions.
If you are interested in mode details – here are our highlights
- We predicted 9 out of the 10 nations which ended in the top 10 ranking. We missed to predict the Netherlands into the top 10 (we put them 12th, but they ended in 7th rank) but had South Korea included (as 10th, which were 16th finally. We also predicted 17 out of the top 20 ranked nations, which outperformed the GracenoteGold/CNN prediction, which only got 15 nations right;
- Looking at some of the predictions, we were surprised by the rank Great Britain should take – we forecasted the 3rd place, they ended up in 4th rank. Special to mention is that this is significant better predicted than other attempts such as the one from GracenoteGold/CNN who only predicted them in 7th rank;
- Looking at the top 20 ranked nations, our approach performed best with an average of only 3.7 ranks error (MAD) – which is equivalent to an AMAPE of 35,6%;
- Overall gold medal prediction was with an AMAPE of 34,7% for the top 20 nations and 35% across the top 50 countries participated reasonable (which is equal to a MAD of 4 respective 2.6 gold medals per country);
- As such out of the top 50 countries, 39 had 2 or less gold medal differences – in other words, only 11 nations had a bigger prediction error (MAD) than 2 gold medals;
- The gold medal prediction for Switzerland was 100% accurate – 3 gold medals predicted & realized;
- Likewise, the total number of medals predictions for China, Great Britain, Russia, Australia, Italy, Canada, Hungary was outperforming (MAD of 4 medals) the predictions from GracenoteGold/CNN (MAD of 8 medals);
- This is supported by the fact that overall, 45 out of the top 50 nations had achieved the predicted numbers with a maximum error of 10 medals (of which 37 had a smaller error than 5 medals), which is equal to an AMAPE of 21.6%;
- The 1080 medals that have been distributed during Tokyo 2020 have been predicted with an AMAPE of 48.9% and WMAPE of 25.8% across all countries;
- Equal to the predictions of the gold medals, also the silver & bronze medal predictions for the top 50 nations showed a reasonable – good forecast performance with an AMAPE of 45% for silver medal wins and 36.1% for bronze medal wins. I.e., the prediction algorithms only had on average only 2-3 medals more or less (MAD) compared to the actual medals won. Especially the silver & bronze medal win prediction did outperform the predictions from GracenoteGold/CNN.
Points for further evaluation / improvements
- We have underestimated the Japanese performance by far. In other words, we could find a strong “home advantage” effect in our data and hence our predictions have been off for quite a bit. I.e., we predicted for Japan only 12 gold medals and overall rank 7 however in reality, Japan won stunning 27 gold medals and ended in 4th rank. It seems that GracenoteGold/CNN have better predicted the home advantage. Knowing about this effect, we could easily model it into the prediction algorithms by e.g. assessing the abnormal performances from the performances of all host countries in the past years. The effect we observed in the data of the Tokyo games was as big as 30% higher compared to “normal” conditions;
- This also could be seen in the slight underestimation of Brazil, which hosted the last Olympic Summer games back in 2016. In fact, Brazil performed 20% better than expected and e.g., won 7 gold medals instead of predicted 5 and won overall 21 medals instead of predicted 17;
- In the top 50 – only 6 predictions were spot on for gold medals and 5 predictions for the overall number of medals won. In future iterations, more details should be added to better predict individual country performances;
- The top-down algorithm did only capture very few unexpected results. E.g the predictions fail to find the excellent results for Uganda (4 medals, o/w 2 gold), Ecuador (3 medals, o/w 2 gold) or Kosovo (2 medals, o/w 2 gold), nor did it predict the gold medal for Puerto Rico, Dominican Republic, or Bermuda. Generally, to predict unexpected performances (i.e., outliers) the data would need additional bottom-up details to find out. Here the chosen algorithm has some performance challenges;
- The algorithm has predicted some notable results for North Korea, Vietnam, Algeria, and Trinidad & Tobago – however none of these countries did win any medal during the Olympic Games. While North Korea did not participate (what could have been known during the data preparation phase, again the algorithm did not take unexpected results, aka outliers into consideration of nations who would just perform under their expectations. There are ways to augment prediction algorithm with a “luck” or “destiny” factor – but this has not been done for the given case study.
- Finally, the GracenoteGold/CNN prediction were outperforming our forecast for the overall rank of a country (AMAPE of 28.9 / MAD of 4 ranks compared to an AMAPE of 32.8% / MAD of 4.9 ranks) and the number of gold medals won (AMAPE of 31.4 / MAD of 2.2 medals compared to an AMAPE of 35% / MAD of 2.6 medals). Here further analysis is needed to be more precise in understanding individual outperformers and national conditions that might support the victory at the Olympic Games.
In summary, despite the simple & heuristic Data Science approach, we were able to make good and, in some cases, excellent predictions. We were especially surprised by the reasonable to good forecast quality of the individual medal predictions – an average error (MAD) of 2-3 medals given the total amount of 1’080 medals awarded cannot be neglected.
One could also assume that there is also quite some value in heuristic predictions – and hence such are good starting points for professional prediction work. Clearly, the key efforts have been on the preparation of the data. This commonly known fact did account here for about 75% of the time spent. Given the results with the 4 hours effort in total put into the prediction, we have clearly beaten the pareto efficiency rule – with 20% effort 80% of the results. While the prediction algorithm did have some significant flaws, many of these could be improved in future iterations adding additional data elements.
Finally, it remains notable that our top-down approach has been surprisingly comparable with (if not outperforming) more detailed, bottom-up prediction approaches (see the comparison with GracenoteGold/CNN Sports). We should however not make the mistake to generalize this finding. In fact – it makes sense to look at the data from multiple angles to get a differentiated view and more accurate predictions. The winning strategy would most likely be to combine a top-down & bottom-up Data Science approach. This remains to be confirmed in future case studies.
The secret sauce for good predictions is
- to look at the data with different Data Science approaches;
- to enrich the data with further external & internal data sources;
- to eventually combine the best out of them
We are convinced that our top-down approach could significantly improve its prediction accuracy by integrating additional bottom-up data factors and predictions. We will further validate these learnings in our next case study! Watch out for future predictions.
Engage with us
What is your assessment of the results from our predictions?
Did you do already some heuristic Data Science approaches and compared its performance with detailed bottom-up assessments?
We are eager to learn and discuss about your findings and conclusions!
Finally, if you have a good idea on what we could predict in our 3rd issue of “Man vs Machine”, e.g., the UEFA Champions League 2021/22 results – then please share with us.
Rank predictions table:
|Rank Actual||Country||Artifact Rank Prediction||Artifact Rank Difference (MAD)||Artifact Rank MAPE||CNN Rank Prediction||CNN Rank Difference (MAD)||CNN Rank MAPE|
Medals prediction table:
|Rank Actual||Country||🥇 Gold Medals|
🥇 Gold Medals Prediction
🥇 Gold Medals Difference (MAD)
🥇 Gold Medals MAPE
🥇 Gold Medals Prediction
🥇 Gold Medals Difference (MAD)
🥇 Gold Medals MAPE
Total medals prediction table:
Medals Difference (MAD)
Medals Difference (MAD)
|CNN Total Medals MAPE|
Co-Founder – Empowering Agile Analytics at Scale
Michi Wegmüller is co-founder of Artifact SA and responsible of Artifact’s Analytics Garage offering. He has more than 15 years of experience in Data and Analytics consulting and has supported a diverse set of Swiss and international clients across industries. He has helped to realize analytics initiatives that are sustainably growing and continuously delivering value to the business and functional units. He is passioned about agile analytics at scale.
Accelerating Impact with AI & Data Science
Spearheading in AI & Data Science to accelerate impact for your business in Switzerland. Pragmatic analytics services leader for consulting & implementation.