Background
I decided to see if I could use machine learning (ML) to accurately predict the current ability (CA) of players based upon their attribute values and position ratings. I'd previously attempted to calculate CA using the attribute weights found in the pre-game editor. This was relatively successfully but only worked well for players with a single, natural position. When a player was able to play multiple positions, I wasn't able to analytically figure out how the different position weights combined to create the overall CA. However, this type of task is what machine learning algorithms excel at.
Machine Learning
The task is to supply a dataset of input values (or 'features' in ML parlance) as well as a corresponding output value (or 'target'). The ML algorithm will then learn how to map the features to the target. Once the model is trained, you can then provide the model with a set of features and it will predict the target value.
In terms of the Football Manager current ability task, we need to supply a sample of players to the ML algorithm (their attribute values and position ratings) along with their current ability. The model will then be trained on this sample of players and will learn how the attributes values and positions map to current ability. Once the model is trained, we can then feed the attribute and position values of a player and the model will predict the current ability.
The particular task is a regression problem - if you are interested in more of the background and implementation details then you can search for 'support vector regression' (SVR). This is a type of 'supervised machine learning' and all this means is that we provide the model with the training data from which it learns rather than the algorithm 'teaching itself'.
I wrote the code in Python using the 'scikit-learn' machine learning library.
Training Data
By using a modified version of a 3rd party scouting tool, I was able to export the players along with their attribute values, position ratings and current ability from a save game. This amounted to around 28,000 players. In the first instance I have focussed on outfield players so after filtering out the goalkeepers, I was left with around 25,000 players. This sample of players is further (randomly) split into two groups, 75% of the players act as the training data (the players from which the SVR algorithm learns from) and the remaining 25% acting as 'unseen training data'. These are the players that are used to test the accuracy of the model.
A histogram of the CA distribution of the players is shown below. Note the very few players with high values of CA. This has implications later for predicting the CA of top players; they are essentually outliers to the model so there is not much data for the model to be trained on.
Model Accuracy
Surprisingly, the model only took around 5 minutes to train on my fairly standard laptop. After playing with the model parameters, I was able to obtain a model accuracy of 98%. A plot showing the target CA against the predicted CA is shown below.
Each blue circle represents a player and the red line represents the situation in which the predicted CA is exactly equal to the target CA. In an ideal world, all the blue circles would lie on that red line. Note the small group of players at 175+. Despite the fact there are only a few of them, the model still accurately predicts their CA.
Examples
I used the model to predict the CA of a few specific players. The first player I tested was Ridle Baku.
He is a player with multiple positions which has a strong impact on his CA. His recommended CA in the test save is 152. The ML model predicted a CA of 151. Very promising!
The next player to test was Kevin De Bruyne.
He too can play multiple positions, has a strong weaker foot and is one of the 'outliers' at the top-end of the current ability range. His recommended CA in the test save is 186. The ML model predicted a CA of 180. Not bad for such an outlier!
I hope you find this article interesting and maybe it will help you to better understand how machine learning can be used.
CAE