Feature Selection techniques for Evolutionary Feature Synthesis

Publication date

DOI

Document Type

Master Thesis

Collections

Open Access logo

License

CC-BY-NC-ND

Abstract

We develop a novel feature scoring method for regression problems, termed Boresco (Bootstrapped regularization path scoring). Boresco is a derivative of FeaLect, an algorithm that uses a scoring scheme and the bootstrap to score its features. It is claimed that FeaLect is able to find useful features. We extend an existing algo- rithm, called Evolutionary Feature Synthesis (EFS) with Boresco. EFS is a multiple linear regression tool that generates linear models with non-linear transformations on the predictor variables. We hypothesize that the extension of Boresco with EFS improves predictive power and generates useful and interpretable models. We com- pare Boresco with the scoring algorithms already present in the EFS algorithm. To test our hypothesis, we use three types of datasets and perform numerous experi- ments on these datasets. We find that there is no clear distinction between predictive performance of Boresco and the original feature scoring methods and that Boresco generally does not improve predictive performance. In terms of model interpretabil- ity, Boresco from time to time creates useful models, but does not do this structurally better than the original scoring methods and still suffers from adding redundant and irrelevant features to its models.

Keywords

Citation