🎮 Forecasting Victory: 2024 League of Legends Worlds Matches Predictions

This data science project explores 2024 League of Legends match data from Oracle’s Elixir, focusing on how in-game resources influence victory and how side selection (🔵 Blue vs. 🔴 Red) impacts team performance. Through a combination of statistical analysis and machine learning, the project ultimately builds a predictive model to forecast match outcomes.

Table of Contents

Introduction

The raw data from Oracle’s Elixir contains 117,576 records (rows) and 161 features (columns).

Each 12 consecutive records correspond to one match:

Therefore, the dataset covers a total of 9,798 matches.

The 161 features can be categorized into three main groups:

The Blue🔵 side refers to the team located on the bottom left of the map and always gets first pick in the draft. The Red🔴 side is positioned on the top right corner. In competitive play, the team with Side Selection Privilege chooses the side for Game 1, and then the losing team picks the side for the next game. (Source: LOL Worlds 2024 Fantasy - E-Go App)

This makes side selection a strategic tool — a subtle but important factor that can influence match outcomes. Surprisingly, this goes against the common belief that both sides should be equally fair in terms of gameplay.

In practice, Blue Side teams consistently perform better. One contributing factor is the camera perspective advantage: although both sides appear symmetrical, the Blue side benefits from a slight downward tilt in the in-game camera. This offers a clearer view of flanks, jungle movements, and overall map activity — making it easier to react and make informed decisions. (Source: Is red stronger than blue in League of Legends? - Eloking)

To better understand side selection privilege, this project analyzes match data to explore the question: How does side selection (🔵 Blue vs. 🔴 Red) impact team performance?

Below lists the used features and their description:

Features Description
result 1 (Win), 0 (Lose)
side red, blue
firstblood Whether the team took the first kill, 1 (Yes), 0 (No)
firstdragon Whether the team took the first dragon, 1 (Yes), 0 (No)
firstbaron Whether the team took the first baron, 1 (Yes), 0 (No)
firsttower Whether the team took the first tower, 1 (Yes), 0 (No)
firstmidtower Whether the team took the first mid-tower, 1 (Yes), 0 (No)
firsttothreetowers Whether the team took the first third-tower, 1 (Yes), 0 (No)
gamelength How long the match lasted in seconds
golddiffat(10/15/20) Gold difference between two teams at 10/15/20 minutes
xpdiffat(10/15/20) XP difference between two teams at 10/15/20 mintues

Data Cleaning and Exploratory Data Analysis

Data Cleaning

Extract team data and target columns

After filtering and selection, the dataset contains:

Check and modify NaN

Below shows the result of Null value checking. It revealed that at least 2,822 team records contain incomplete data. Since filling in simulated values wouldn’t make sense in a competitive esports context—and the missing data accounts for only ~15% of the entire dataset—dropping the rows with NaN values is a reasonable and efficient solution. After deleting NaN data, the dataset contains:

Losing only ~15% data is acceptable

NaN Checking

Categorize Gamelength

The gamelength ranges from 1143 to 3482 seconds. Below shows the distribution of gamelength:

Instead of focusing on specific game lengths in seconds, our analysis is more concerned with the relationship between general time periods (in minutes) and other features. Therefore, the gamelength column needs to be categorized into time periods, and drop the original gamelength.

Below are the results after categorizing

Time Period Count
30-35 5522
25-30 5348
35-40 2714
<=25 1786
>=40 1404

Dateset overview

Below is a preview of the dataset after cleaning

  side firstblood firstdragon firstbaron firsttower firstmidtower firsttothreetowers golddiffat10 golddiffat15 golddiffat20 xpdiffat10 xpdiffat15 xpdiffat20 time_label win
30 Blue 0 1 1 1 1 1 1364 2293 4248 557 949 2138 <=25(mins) True
31 Red 1 0 0 0 0 0 -1364 -2293 -4248 -557 -949 -2138 <=25(mins) False
32 Blue 0 0 0 0 0 0 -88 -75 777 625 1092 2722 35-40(mins) True
33 Red 1 1 1 1 1 1 88 75 -777 -625 -1092 -2722 35-40(mins) False
34 Blue 0 1 1 0 0 0 -2583 -561 -1528 -1718 410 -722 30-35(mins) True

Univariate Analysis

🔴 For red side teams, 95% of XP differences range from -2129 to 1903, with a median of -63.
🔵 For blue side teams, 95% of XP differences range from -1903 to 2129, with a median of 63.

These results suggest that the blue side has a slight advantage in XP gain during the early game, likely contributing to better early-game momentum.

The plot below shows the distribution of XP difference at 10 minutes for red side team:

Bivariate Analysis

Win Rate for each side and firstblood

The plot below shows the win rates based on team side (🔵 blue vs 🔴 red) and whether the team secured first blood:

Teams that secured first blood had a win rate approximately 18.6% higher than those that did not.
🔵 Blue side teams showed an average 4.9% higher win rate compared to 🔴 red side teams.

These insights highlight the strategic importance of first blood and support the observed advantage of blue side teams.

Win Rate by Side and First Objective Secured

From the analysis above, it’s clear that the first resource secured (such as first blood, tower, baron) has a significant impact on a team’s chance of winning. However, the strength of this impact varies by objective.

The plot below compares win rates for each side (🔵 blue and 🔴 red) based on whether they secured key objectives first. It ranks these objectives by their positive influence on win rate, in ascending order.

Key insights:

Difference in Gold and XP at 10 Minutes Across Game Lengths

The two plots below illustrate how gold and XP differences at 10 minutes vary across different game duration groups:

Interesting Aggregates

Table 1 shows the quantified differences in win rate, first objective secured rate, and gold/XP difference between the two sides (🔵 Blue vs 🔴 Red):

The results illustrate that except first dragon rate, 🔵 Blue teams consistently outperform 🔴 Red teams across all key indicators.
Blue side teams not only have a higher win rate, but also secure early objectives more often and maintain a stronger lead in both gold and XP.

side firstblood firstdragon firstbaron firsttower firstmidtower firsttothreetowers golddiffat10 golddiffat15 golddiffat20 xpdiffat10 xpdiffat15 xpdiffat20 win
Blue 0.516275 0.384643 0.501967 0.548706 0.572314 0.571837 144.923 331.158 523.683 66.8972 94.4559 95.871 0.527483
Red 0.483725 0.61488 0.456421 0.451294 0.427686 0.428163 -144.923 -331.158 -523.683 -66.8972 -94.4559 -95.871 0.472517

Table 2 shows the quantified differences in win rate between 🔵 Blue and 🔴 Red sides across different game durations:

side <=25(mins) 25-30(mins) 30-35(mins) 35-40(mins) >=40(mins)
Blue 0.601344 0.522438 0.516117 0.511422 0.52849
Red 0.398656 0.477562 0.483883 0.488578 0.47151

Imputation

Imputation is not required in this case, as the cleaned dataset contains no missing (NaN) values.

Framing a Prediction Problem

We aim to predict whether a team wins or loses a match based on their in-game performance features collected by the 20-minute mark, as analyzed in the sections above.

Baseline Model

The baseline model uses logistic regression to predict whether a team will win or lose a match, based on early-game features available by the 20-minute mark.

Based on insights from the exploratory data analysis (EDA), the features side and firstbaron showed strong influence on match outcomes. Therefore, the baseline model uses these two categorical features along with xpdiffat10 — a quantitative feature representing early XP advantage — to train and make predictions.

The table below shows features description:

Feature Type Description Method
side Nominal Team side: Blue or Red One-Hot Encoding
firstbaron Nominal Whether the team took first Baron (0/1) One-Hot Encoding
xpdiffat10 Quantative XP difference between two teams at 10 min Standard Scaler

The model uses 30% data as test data. One-hot encoding is applied to the nominal features using OneHotEncoder(drop='first') to avoid multicollinearity, and StandardScaler() is applied to ensure fair contribution in the logistic regression model.

The basic model has 0.8281 accuracy, and 0.88 AUC score.

The performance of the baseline model isn’t perfect, but it is strong given its simplicity.

However, there is still room for improvement:

Final Model

Feature Engineering

firstdragon and firstblood are included in the model because they capture early-game advantages that strongly correlate with match outcomes as shown in previous EDA section. Moreover, following new features are created:

Feature Input Columns What It Captures Why It Matters
xp_per_min xpdiffat10, xpdiffat15, xpdiffat20 XP difference per minute Considers XP difference at all time periods to reflect leveling (dis)advantage
gold_per_min golddiffat10, golddiffat15, golddiffat20 Gold difference per minute Considers Gold difference at all time periods to reflect economic (dis)advantage
tower_score firsttower, firstmidtower, firsttothreetowers How many kinds of tower a team firstly taken in total(0-4) Measures overall map pressure and early tower control
gold_drop_1015 golddiffat10, golddiffat15 Gold lead change (10–15 mins) Indicates gold economy shift from 10 to 15 mins
gold_drop_1520 golddiffat15, golddiffat20 Gold lead change (15–20 mins) Indicates gold economy shift from 15 to 20 mins
xp_drop_1015 xpdiffat10, xpdiffat15 XP lead change (10–15 mins) Indicates xp advantage shift from 10 to 15 mins
xp_drop_1520 xpdiffat15, xpdiffat20 XP lead change (15–20 mins) Indicates xp advantage shift from 15 to 20 mins

In addition to logistic regression, we also trained models using Random Forest and Decision Tree classifiers to explore the impact of non-linear relationships and feature interactions on prediction performance.

Tuning Hyperparameters

We use GridSearchCV to find the optimal tree depth for Random Forest and Decision Tree. Tuning max_depth helps control model complexity and reduces the risk of overfitting by limiting how deeply the trees can grow. The train result shows that Random Forest’s optimal tree depth is 6, Decision Tree’s optimal tree depth is 5.

Models Performance

The Logistic Regression model performs 85.08 accuracy and 0.93 AUC.

The Random Forest model performs 84.98 accuracy and 0.92 AUC.

The Decision Tree model performs 83.75 accuracy and 0.91 AUC.

Models Comparison

Below shows the comparison of three model’s Accuracy. The final Logistic Regression model has the best performance on accuracy.

Below shows the comparison of three model’s AUC score. The final Logistic Regression model has the best performance on AUC score.

Logistic Regression model playing better than other two tree models suggests that:

As a result, the final Logistic Regression model is selected as the final model since it has the highest accuracy and AUC score while it’s also simple and easy to interpret.

Compared to the base logistic regression model, the final model demonstrates a notable improvement in predictive performance:

Overall, 85.08% accuracy is not perfect for prediction model. But 0.93 AUC indicates excellent performance, with the model having a high ability to distinguish between classes. The final model now is more confident and accurate in ranking match outcomes.


Thanks for reading!
⬆️ Back to Top