Clustering soccer players: investigating unsupervised learning on player positions

Publication date

DOI

Document Type

Bachelor Thesis

Collections

Open Access logo

License

CC-BY-NC-ND

Abstract

In this study, we investigate the clustering capability of two unsupervised learning clustering methods: K-means and Expectation Maximization (EM). We train the methods on soccer match data of the Spanish competition La Liga, which contains matches from 2004 to 2019. We classify both clustering methods with soccer player positions to visualize a correlation between player positions using Principal Component Analysis (PCA). In these visualizations, we use 4 and 11 clusters that correspond to player positions in the field. To interpret K-means and EM, we use purity and the silhouette score. Results show that K-means classifies the data better than EM. With the use of feature selection methods Laplacian score and correlation mean, we increase the performance of K-means by 37%. We see that a cluster size of 8 clusters has the best separability, which suggests that there are 8 different types of soccer players on the field during a match.

Keywords

clustering, unsupervised learning, soccer analysis, machine learning, k-means, expectation maximalization, purity, principal component analysis, PCA, silhouette analysis, silhouette score, graph laplacian, laplacian score, soccer analysis

Citation