Clustering soccer players: investigating unsupervised learning on player positions

In this study, we investigate the clustering capability of two unsupervised learning clustering methods: K-means and Expectation Maximization (EM). We train the methods on soccer match data of the Spanish competition La Liga, which contains matches from 2004 to 2019. We classify both clustering methods with soccer player positions to visualize a correlation between player positions using Principal Component Analysis (PCA). In these visualizations, we use 4 and 11 clusters that correspond to player positions in the field. To interpret K-means and EM, we use purity and the silhouette score. Results show that K-means classifies the data better than EM. With the use of feature selection methods Laplacian score and correlation mean, we increase the performance of K-means by 37%. We see that a cluster size of 8 clusters has the best separability, which suggests that there are 8 different types of soccer players on the field during a match.

Keywords

clustering, unsupervised learning, soccer analysis, machine learning, k-means, expectation maximalization, purity, principal component analysis, PCA, silhouette analysis, silhouette score, graph laplacian, laplacian score, soccer analysis

URI

https://studenttheses.uu.nl/handle/20.500.12932/35795

Clustering soccer players: investigating unsupervised learning on player positions

Files

Publication date

Authors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI