Vol.13, No.3, August 2024.                                                                                                                                                                               ISSN: 2217-8309

                                                                                                                                                                                                                        eISSN: 2217-8333

 

TEM Journal

 

TECHNOLOGY, EDUCATION, MANAGEMENT, INFORMATICS

Association for Information Communication Technology Education and Science


Enhancing Customer Churn Prediction With Resampling: A Comparative Study

 

Jia-Xuan Ong, Gee-Kok Tong, Kok-Chin Khor, Su-Cheng Haw

 

© 2024 Gee-Kok Tong, published by UIKTEN. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. (CC BY-NC-ND 4.0)

 

Citation Information: TEM Journal. Volume 13, Issue 3, Pages 1927-1936, ISSN 2217-8309, DOI: 10.18421/TEM133-20, August 2024.

 

Received: 17 January 2024.

Revised:   09 May 2024.
Accepted: 05 July 2024.
Published: 27 August 2024.

 

Abstract:

 

In this competitive business world, accurately predicting customer churn is crucial to maintaining and preventing revenue loss. However, due to the imbalanced nature of customer churn data, traditional machine learning algorithms often fail to identify churned customers accurately. This has led to exploring resampling techniques, demonstrating their efficacy in addressing this issue. However, current studies in the customer churn prediction field frequently overlook the untapped potential of comprehensive investigation and comparison of resampling techniques. Instead of exploring and comparing various resampling methods, many studies predominantly rely on a single resampling method, such as SMOTE. Hence, this paper aims to compare and evaluate the effectiveness of several resampling methods, including oversampling, undersampling, and hybrid techniques. We utilized the benchmark dataset, telecommunication customer churn, from IBM Watson, where approximately 26.5% of the customers have churned, indicating that the data is imbalanced. Our results demonstrate that the combination of random forest with a hybrid sampling method – SMOTE-ENN obtained the best result. The combination yields an F1 score of 95.3% and an accuracy of 96.0%, surpassing the studies that utilized the same dataset. This highlights the benefits of comparing resampling techniques in predicting customer churn, specifically in imbalanced datasets.

 

Keywords – Customer churn prediction, imbalance datasets, resampling, oversampling, undersampling.

 

-----------------------------------------------------------------------------------------------------------

Full text PDF >  

-----------------------------------------------------------------------------------------------------------

 


Copyright © 2024 UIKTEN
Copyright licence: All articles are licenced via Creative Commons CC BY-NC-ND 4.0 licence