Binary Classification on a Highly Imbalanced Dataset
Publication date
Authors
DOI
Document Type
Master Thesis
Metadata
Show full item recordCollections
License
CC-BY-NC-ND
Abstract
Credit card fraud is a growing field of crime. Data-drive detection of fraudulent transactions can be viewed as a binary classification problem, where the two outcome classes are highly imbalanced.
To overcome the difficulties that arise from this imbalance, multiple solution are described and explored. Furthermore, accompanied statistical arguments, a novel method using subgroup discovery is introduced. Finally, all methods are empirically tested on an actual credit card transaction dataset.
Keywords
Classification; Imbalanced Data; Fraud; Bump Hunting;