🎮 Forecasting Victory: 2024 League of Legends Worlds Matches Predictions
This data science project explores 2024 League of Legends match data from Oracle’s Elixir, focusing on how in-game resources influence victory and how side selection (🔵 Blue vs. 🔴 Red) impacts team performance. Through a combination of statistical analysis and machine learning, the project ultimately builds a predictive model to forecast match outcomes.
Table of Contents
- Introduction
- Data Cleaning and Exploratory Data Analysis
- Framing a Prediction Problem
- Baseline Model
- Final Model
Introduction
The raw data from Oracle’s Elixir contains 117,576 records (rows) and 161 features (columns).
Each 12 consecutive records correspond to one match:
- The first 5 records represent players data from the blue side.
- The next 5 records represent players data from the red side.
- The final 2 records provide team-level overviews for both sides.
Therefore, the dataset covers a total of 9,798 matches.
The 161 features can be categorized into three main groups:
- Team/Player Information:
Includes identifiers such as player names, team names, league affiliations, and match timestamps. - Match Overview:
Contains high-level game details like match outcome (victory/loss), side selection (🔵blue/🔴red), champion picks and bans, total game duration. - In-Game Performance Metrics:
Captures gameplay stats such as kills, deaths, assists, objective control (dragons🐲, barons😈, towers🗼), and team differences in XP and gold across different time intervals.
The Blue🔵 side refers to the team located on the bottom left of the map and always gets first pick in the draft. The Red🔴 side is positioned on the top right corner. In competitive play, the team with Side Selection Privilege chooses the side for Game 1, and then the losing team picks the side for the next game. (Source: LOL Worlds 2024 Fantasy - E-Go App)
This makes side selection a strategic tool — a subtle but important factor that can influence match outcomes. Surprisingly, this goes against the common belief that both sides should be equally fair in terms of gameplay.
In practice, Blue Side teams consistently perform better. One contributing factor is the camera perspective advantage: although both sides appear symmetrical, the Blue side benefits from a slight downward tilt in the in-game camera. This offers a clearer view of flanks, jungle movements, and overall map activity — making it easier to react and make informed decisions. (Source: Is red stronger than blue in League of Legends? - Eloking)
To better understand side selection privilege, this project analyzes match data to explore the question: How does side selection (🔵 Blue vs. 🔴 Red) impact team performance?
Below lists the used features and their description:
Features | Description |
---|---|
result | 1 (Win), 0 (Lose) |
side | red, blue |
firstblood | Whether the team took the first kill, 1 (Yes), 0 (No) |
firstdragon | Whether the team took the first dragon, 1 (Yes), 0 (No) |
firstbaron | Whether the team took the first baron, 1 (Yes), 0 (No) |
firsttower | Whether the team took the first tower, 1 (Yes), 0 (No) |
firstmidtower | Whether the team took the first mid-tower, 1 (Yes), 0 (No) |
firsttothreetowers | Whether the team took the first third-tower, 1 (Yes), 0 (No) |
gamelength | How long the match lasted in seconds |
golddiffat(10/15/20) | Gold difference between two teams at 10/15/20 minutes |
xpdiffat(10/15/20) | XP difference between two teams at 10/15/20 mintues |
Data Cleaning and Exploratory Data Analysis
Data Cleaning
Extract team data and target columns
- Selected only the team-level records, excluding individual player-level rows
- Extracted the key features listed in the previous section
After filtering and selection, the dataset contains:
- 19596 rows (2 teams × 9798 matches)
- 15 columns (side, result, objectives, and performance features, etc.)
Check and modify NaN
Below shows the result of Null value checking. It revealed that at least 2,822 team records contain incomplete data. Since filling in simulated values wouldn’t make sense in a competitive esports context—and the missing data accounts for only ~15% of the entire dataset—dropping the rows with NaN values is a reasonable and efficient solution. After deleting NaN data, the dataset contains:
- 16774 rows (2 teams × 8387 matches)
- 15 columns
Losing only ~15% data is acceptable
Categorize Gamelength
The gamelength
ranges from 1143 to 3482 seconds. Below shows the distribution of gamelength
:
Instead of focusing on specific game lengths in seconds, our analysis is more concerned with the relationship between general time periods (in minutes) and other features. Therefore, the gamelength column needs to be categorized into time periods, and drop the original gamelength
.
Below are the results after categorizing
Time Period | Count |
---|---|
30-35 | 5522 |
25-30 | 5348 |
35-40 | 2714 |
<=25 | 1786 |
>=40 | 1404 |
Dateset overview
Below is a preview of the dataset after cleaning
side | firstblood | firstdragon | firstbaron | firsttower | firstmidtower | firsttothreetowers | golddiffat10 | golddiffat15 | golddiffat20 | xpdiffat10 | xpdiffat15 | xpdiffat20 | time_label | win | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
30 | Blue | 0 | 1 | 1 | 1 | 1 | 1 | 1364 | 2293 | 4248 | 557 | 949 | 2138 | <=25(mins) | True |
31 | Red | 1 | 0 | 0 | 0 | 0 | 0 | -1364 | -2293 | -4248 | -557 | -949 | -2138 | <=25(mins) | False |
32 | Blue | 0 | 0 | 0 | 0 | 0 | 0 | -88 | -75 | 777 | 625 | 1092 | 2722 | 35-40(mins) | True |
33 | Red | 1 | 1 | 1 | 1 | 1 | 1 | 88 | 75 | -777 | -625 | -1092 | -2722 | 35-40(mins) | False |
34 | Blue | 0 | 1 | 1 | 0 | 0 | 0 | -2583 | -561 | -1528 | -1718 | 410 | -722 | 30-35(mins) | True |
Univariate Analysis
🔴 For red side teams, 95% of XP differences range from -2129 to 1903, with a median of -63.
🔵 For blue side teams, 95% of XP differences range from -1903 to 2129, with a median of 63.
These results suggest that the blue side has a slight advantage in XP gain during the early game, likely contributing to better early-game momentum.
The plot below shows the distribution of XP difference at 10 minutes for red side team:
Bivariate Analysis
Win Rate for each side and firstblood
The plot below shows the win rates based on team side (🔵 blue vs 🔴 red) and whether the team secured first blood:
Teams that secured first blood had a win rate approximately 18.6% higher than those that did not.
🔵 Blue side teams showed an average 4.9% higher win rate compared to 🔴 red side teams.
These insights highlight the strategic importance of first blood and support the observed advantage of blue side teams.
Win Rate by Side and First Objective Secured
From the analysis above, it’s clear that the first resource secured (such as first blood, tower, baron) has a significant impact on a team’s chance of winning. However, the strength of this impact varies by objective.
The plot below compares win rates for each side (🔵 blue and 🔴 red) based on whether they secured key objectives first. It ranks these objectives by their positive influence on win rate, in ascending order.
Key insights:
- Securing First Baron or First to Three Towers shows the strongest correlation with winning, for both sides.
- 🔵 Blue side consistently gains slightly higher win rates from each objective compared to 🔴 red side.
Difference in Gold and XP at 10 Minutes Across Game Lengths
The two plots below illustrate how gold and XP differences at 10 minutes vary across different game duration groups:
- The violin plot shows that the spread of XP differences narrows as game length increases. This makes sense — longer matches tend to be more competitive, so the performance gap between the two teams is usually smaller early on.
Interesting Aggregates
Table 1 shows the quantified differences in win rate, first objective secured rate, and gold/XP difference between the two sides (🔵 Blue vs 🔴 Red):
The results illustrate that except first dragon rate, 🔵 Blue teams consistently outperform 🔴 Red teams across all key indicators.
Blue side teams not only have a higher win rate, but also secure early objectives more often and maintain a stronger lead in both gold and XP.
side | firstblood | firstdragon | firstbaron | firsttower | firstmidtower | firsttothreetowers | golddiffat10 | golddiffat15 | golddiffat20 | xpdiffat10 | xpdiffat15 | xpdiffat20 | win |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Blue | 0.516275 | 0.384643 | 0.501967 | 0.548706 | 0.572314 | 0.571837 | 144.923 | 331.158 | 523.683 | 66.8972 | 94.4559 | 95.871 | 0.527483 |
Red | 0.483725 | 0.61488 | 0.456421 | 0.451294 | 0.427686 | 0.428163 | -144.923 | -331.158 | -523.683 | -66.8972 | -94.4559 | -95.871 | 0.472517 |
Table 2 shows the quantified differences in win rate between 🔵 Blue and 🔴 Red sides across different game durations:
- The results indicate that 🔵 Blue teams consistently outperform 🔴 Red teams at all game lengths.
- Notably, in shorter matches (≤ 25 minutes), Blue teams win over 60% of the time — a significant advantage.
- However, in longer matches (> 25 minutes), the win rate difference between the two sides narrows to within 6%, suggesting the side advantage becomes less impactful as the game progresses.
side | <=25(mins) | 25-30(mins) | 30-35(mins) | 35-40(mins) | >=40(mins) |
---|---|---|---|---|---|
Blue | 0.601344 | 0.522438 | 0.516117 | 0.511422 | 0.52849 |
Red | 0.398656 | 0.477562 | 0.483883 | 0.488578 | 0.47151 |
Imputation
Imputation is not required in this case, as the cleaned dataset contains no missing (NaN) values.
Framing a Prediction Problem
We aim to predict whether a team wins or loses a match based on their in-game performance features collected by the 20-minute mark, as analyzed in the sections above.
- Prediction Type: Binary Classification
- Response Variable: win(True=Win, False=Lose), the only variable represents the match outcomes, interpretable
- Evaluation Matrics: confusion matrix, accuracy, ROC curve, AUC score. Unlike accuracy and precision, which depend on a specific classification threshold typically 0.5, ROC AUC evaluates model performance across all possible thresholds. This gives a more complete view of the classifier’s ability to separate the two classes. Also, in the dataset, there may be a slight imbalance in match outcomes (blue side winning more often). ROC AUC is robust to class imbalance, whereas accuracy may be misleading in such cases.
- Except for
time_label
, all the used features are known at the time of prediction(before the game end).
Baseline Model
The baseline model uses logistic regression to predict whether a team will win or lose a match, based on early-game features available by the 20-minute mark.
Based on insights from the exploratory data analysis (EDA), the features side
and firstbaron
showed strong influence on match outcomes. Therefore, the baseline model uses these two categorical features along with xpdiffat10
— a quantitative feature representing early XP advantage — to train and make predictions.
The table below shows features description:
Feature | Type | Description | Method |
---|---|---|---|
side | Nominal | Team side: Blue or Red | One-Hot Encoding |
firstbaron | Nominal | Whether the team took first Baron (0/1) | One-Hot Encoding |
xpdiffat10 | Quantative | XP difference between two teams at 10 min | Standard Scaler |
The model uses 30% data as test data. One-hot encoding is applied to the nominal features using OneHotEncoder(drop='first')
to avoid multicollinearity, and StandardScaler() is applied to ensure fair contribution in the logistic regression model.
The basic model has 0.8281 accuracy, and 0.88 AUC score.
The performance of the baseline model isn’t perfect, but it is strong given its simplicity.
However, there is still room for improvement:
- While 82.81% accuracy and 0.88 AUC socre is promising, it’s likely that incorporating more in-game features could further boost performance.
- Using logistic regression might be too simple in this case.
Final Model
Feature Engineering
firstdragon
and firstblood
are included in the model because they capture early-game advantages that strongly correlate with match outcomes as shown in previous EDA section. Moreover, following new features are created:
Feature | Input Columns | What It Captures | Why It Matters |
---|---|---|---|
xp_per_min | xpdiffat10 , xpdiffat15 , xpdiffat20 |
XP difference per minute | Considers XP difference at all time periods to reflect leveling (dis)advantage |
gold_per_min | golddiffat10 , golddiffat15 , golddiffat20 |
Gold difference per minute | Considers Gold difference at all time periods to reflect economic (dis)advantage |
tower_score | firsttower , firstmidtower , firsttothreetowers |
How many kinds of tower a team firstly taken in total(0-4) | Measures overall map pressure and early tower control |
gold_drop_1015 | golddiffat10 , golddiffat15 |
Gold lead change (10–15 mins) | Indicates gold economy shift from 10 to 15 mins |
gold_drop_1520 | golddiffat15 , golddiffat20 |
Gold lead change (15–20 mins) | Indicates gold economy shift from 15 to 20 mins |
xp_drop_1015 | xpdiffat10 , xpdiffat15 |
XP lead change (10–15 mins) | Indicates xp advantage shift from 10 to 15 mins |
xp_drop_1520 | xpdiffat15 , xpdiffat20 |
XP lead change (15–20 mins) | Indicates xp advantage shift from 15 to 20 mins |
In addition to logistic regression, we also trained models using Random Forest and Decision Tree classifiers to explore the impact of non-linear relationships and feature interactions on prediction performance.
Tuning Hyperparameters
We use GridSearchCV to find the optimal tree depth for Random Forest and Decision Tree. Tuning max_depth
helps control model complexity and reduces the risk of overfitting by limiting how deeply the trees can grow. The train result shows that Random Forest’s optimal tree depth is 6, Decision Tree’s optimal tree depth is 5.
Models Performance
The Logistic Regression model performs 85.08 accuracy and 0.93 AUC.
The Random Forest model performs 84.98 accuracy and 0.92 AUC.
The Decision Tree model performs 83.75 accuracy and 0.91 AUC.
Models Comparison
Below shows the comparison of three model’s Accuracy. The final Logistic Regression model has the best performance on accuracy.
Below shows the comparison of three model’s AUC score. The final Logistic Regression model has the best performance on AUC score.
Logistic Regression model playing better than other two tree models suggests that:
- The relationship between features and target is mostly linear
- Feature engineering captured key patterns well
- Tree-based models may have overfit
As a result, the final Logistic Regression model is selected as the final model since it has the highest accuracy and AUC score while it’s also simple and easy to interpret.
Compared to the base logistic regression model, the final model demonstrates a notable improvement in predictive performance:
- Accuracy increased from 82.81% to 85.08%
- AUC score improved from 0.88 to 0.93
Overall, 85.08% accuracy is not perfect for prediction model. But 0.93 AUC indicates excellent performance, with the model having a high ability to distinguish between classes. The final model now is more confident and accurate in ranking match outcomes.
Thanks for reading!
⬆️ Back to Top