Vol.13, No.3, August 2024. ISSN: 2217-8309 eISSN: 2217-8333
TEM Journal
TECHNOLOGY, EDUCATION, MANAGEMENT, INFORMATICS Association for Information Communication Technology Education and Science |
Enhancing Customer Churn Prediction With Resampling: A Comparative Study
Jia-Xuan Ong, Gee-Kok Tong, Kok-Chin Khor, Su-Cheng Haw
© 2024 Gee-Kok Tong, published by UIKTEN. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. (CC BY-NC-ND 4.0)
Citation Information: TEM Journal. Volume 13, Issue 3, Pages 1927-1936, ISSN 2217-8309, DOI: 10.18421/TEM133-20, August 2024.
Received: 17 January 2024. Revised: 09 May 2024.
Abstract:
In this competitive business world, accurately predicting customer churn is crucial to maintaining and preventing revenue loss. However, due to the imbalanced nature of customer churn data, traditional machine learning algorithms often fail to identify churned customers accurately. This has led to exploring resampling techniques, demonstrating their efficacy in addressing this issue. However, current studies in the customer churn prediction field frequently overlook the untapped potential of comprehensive investigation and comparison of resampling techniques. Instead of exploring and comparing various resampling methods, many studies predominantly rely on a single resampling method, such as SMOTE. Hence, this paper aims to compare and evaluate the effectiveness of several resampling methods, including oversampling, undersampling, and hybrid techniques. We utilized the benchmark dataset, telecommunication customer churn, from IBM Watson, where approximately 26.5% of the customers have churned, indicating that the data is imbalanced. Our results demonstrate that the combination of random forest with a hybrid sampling method – SMOTE-ENN obtained the best result. The combination yields an F1 score of 95.3% and an accuracy of 96.0%, surpassing the studies that utilized the same dataset. This highlights the benefits of comparing resampling techniques in predicting customer churn, specifically in imbalanced datasets.
Keywords – Customer churn prediction, imbalance datasets, resampling, oversampling, undersampling. |
----------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------- |