Grouped Lasso Regression in Multiple System Estimation

Publication date

DOI

Document Type

Master Thesis
Open Access logo

License

CC-BY-NC-ND

Abstract

Multiple System Estimation (MSE) is a crucial statistical method used for population estimation in fields like human rights, ecology, and epidemiology, particularly when complete population counts are unfeasible. Traditional methods, such as log-linear analysis, often face computational challenges due to the exponential increase in model possibilities with additional registers and covariates. To address this, grouped lasso as a regularization technique has been explored, aiming at streamlining model selection by penalizing less impactful coefficients. This study evaluated various grouped lasso models through simulations, adjusting parameters like the number of registers, covariates, and population samples. Three contrast methods (treatment, sum, and polynomial) were used and evaluated against traditional log-linear regression that used the AIC and BIC criteria. Results indicated that when using the optimal lambda values, grouped lasso methods consistently achieved low medians relative to the true population, with narrow interquartile ranges and minimal outliers, demonstrating high precision but low accuracy. AIC/BIC based model selection showed high variation and outliers, however with significantly higher precision once outliers have been removed. Results suggest for further possibilities in the exploration of grouped lasso in more simulated datasets of different parameter combinations, as well as the use of other regularization based methods such as ridge and elastic net. This thesis project is completed in collaboration with group member Yanwen Zhang (Student Number 9087605), who investigated the effect of higher sample population percentage and higher number of registers, whilst a common baseline dataset has been used between the two theses.

Keywords

Lasso regression; Grouped lasso regression, Multiple system estimation;

Citation