Identifying and improving quality issues in Google Semantic Location History DDPs for public transport activities

Publication date

DOI

Document Type

Master Thesis

Collections

Open Access logo

License

CC-BY-NC-ND

Abstract

More and more human life takes place online, resulting in an increasing role of digital privacy in society. New laws are created to protect people’s privacy. As a response to these laws, companies now give their users the opportunity to download their personal data as Data Download Packages (DDPs). A recent study used the Google Semantic Location History DDPs to investigate how the COVID-19 pandemic changed travel behaviour. However, these DDP suffer from potential quality issues, influencing the data quality and inferences made on these data. The aim of this project is to identify these potential quality issues, take them into account with data imputation where possible, and see if this makes a difference. This thesis will focus on errors in public transport activity types found in Google Semantic Location History. A Python script will check if different parts of the data meet set requirements to locate the quality issues. This script will count the number of errors and use data imputation where possible, resulting in a more accurate data extraction. This, in turn, leads to a better understanding of travel behaviours. While multiple steps are still needed to make the extraction as accurate to reality as possible, this is a first step towards improving the accuracy of inferences with Google Semantic Location History data.

Keywords

Citation