Abstract:
In the modern banking industry, customers have a plethora of options when it
comes to deciding where to invest their money. As a result, customer retention and churn
have become significant challenges for most banks. In an effort to address the issue of
customer churn, this research employs various machine learning algorithms such as
Logistic Regression, Support Vector Machine, Random Forest, Gradient Boosting,
eXtreme Gradient Boosting, and Light Gradient Boosting.
The study utilizes a feature selection technique to remove irrelevant features and
identify the most relevant ones. Additionally, the resulting dataset is balanced using the
SMOTE method. The performance of classifiers on balanced and imbalanced datasets is
compared in terms of accuracy, recall, precision, and overall performance. The results
demonstrate that no classifier outperformed others when dealing with imbalanced data
(before SMOTE is applied). However, in the case of balanced data (after SMOTE is
applied), the Random Forest classifier outperformed other classifiers by a significant
margin.