top of page

FIFA 20 DATA ANALYSIS USING R

  • Writer: Khanyile Dlamini
    Khanyile Dlamini
  • Jul 25, 2020
  • 3 min read


ree


The Coronavirus pandemic disrupted the world in ways that a select few predicted, and most deemed impossible – especially as having an overarching effect on sports!

Major sporting events were suspended across the world. From the English Premier League to the NBA, MLS, and La Liga; and in some extreme cases, leagues such as the Eredivisie, and the Scottish Premiership, abandoned their 2019-20 season entirely.

Having been gifted more time by the pandemic, why not put it to good use? I decided to combine my passion for football (soccer) and data science to conduct exploratory analysis on a FIFA 20 dataset using R. Here is a snapshot of some player attributes found within this dataset:


Name, Age, Overall, Value, Club, Wage, Player Position, Salary

The dataset was downloaded from Kaggle, and It contains 18,000+ players all featuring in the FIFA 20 video game; each with 100+ attributes; including statistics such as attacking, skills, defense, mentality, etc. – The data was scraped from sofifa.com, a public website. Since the purpose of my project was exploratory analysis here are a few interesting (and not so interesting) finds I came across:


Distribution based on Age.

First off, I visualized the distribution of players based on age. We can see that there is a high number of players around 23 years of age.


ree

Distribution based on Age and Position.

The following plot shows the relationship between the age of the players and their general playing position. According to the graph, there is a higher density of goalkeepers and forwards' aged over 35, compared to defenders and mid-fielders. This supports the claim that goalkeepers and forwards tend to have a longer career span.


ree

Distribution based on Player Nationality (Top 10).

Found that England contributes the most players to the game. Followed by Germany and Spain.


ree

Distribution of Wages between 100K – 500K

A very large number of players have wages which lie between 0–100k, only Lionel Messi has a wage bill of 500K+, next is Cristiano Ronaldo who earns between 400K-500K.

ree

Distribution of Value between 50M – 100M+

We also find that most players range in the 0-50M valuation; Neymar Jnr has the highest valuation of 100M+.


ree

The Distribution between Age and Overall of players based on Wage Bracket.


Next, I created a scatter plot demonstrating the age vs overall player rating(s) amongst wage brackets. Based on this plot, the highest wages are commanded by players with an overall greater than 85, and their typical age is around 30 years. Lionel Messi can be identified as the purple dot on the graph, and Cristiano Ronaldo as the blue.


ree


Position based on Wage (100K +)


I identified an interesting trend in regard to the wage structure. defenders and goalkeepers do not earn over 300K per week – with the majority of that player pool falling into the 100K – 200K income bracket. While on the other side of the spectrum, I found that forwards have players earning 400K +, and midfielders’ have a player earning 500K+.


ree

Top 10 Valuable Club


To conclude this section of my analysis, I found the Top 10 most valuable clubs by summing up the player valuations for each respective club. According to the FIFA 20 dataset, the top three most valuable teams are FC Barcelona, Manchester City, and Real Madrid. Real Madrid leading the pack, closely followed by the other two.



ree

Thanks for reading this piece, and if you're interested in more analysis, check out my Nike x Yeezy sales dashboard courtesy of a StockX dataset; you can also find me on LinkedIn,



Comments


Post: Blog2_Post

Subscribe Form

Thanks for submitting!

©2024 by Khanyile Dlamini.

bottom of page