Predicting diversity in subtitles of NPO shows

The value of diversity, in terms of representation of people, has recently come to the forefront for public broadcasters, including the Dutch NPO. The NPO measures diversity through a questionnaire, which asks people to what extent they see or hear people from different population groups in an episode. This thesis aims to predict this ‘diversity score’ using TF, TF-IDF and LDA, to gain insight into the predictive capacity of words and topics for diversity in media content. Both words and topics are found that predict this measure of diversity: the diversity score can be predicted with explained variances between 8% and 49.7%, depending on the dataset.

Keywords

machine learning; data mining; natural language processing; diversity; public broadcasting;

URI

https://studenttheses.uu.nl/handle/20.500.12932/42557

Predicting diversity in subtitles of NPO shows

Files

Publication date

Authors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI