Customer churn is one of the major economic concerns of many companies, including banks, and banks have focused their attention on customer retention, because the cost of attracting a new customer is much higher than the cost of keeping a customer.
Customer churn prediction and profiling are two major economic concerns for many companies.
Different learning approaches have been proposed; however, a priori choice of the most suitable model to perform both tasks remains non-trivial as it is highly dependent on the intrinsic characteristics of the churn data.
Our study compares several machine learning methods with several resampling approaches for data balancing of a public bank data set.
Our evaluations, reported in terms of area under the curve (AUC) and sensitivity, explore the influence of rebalancing strategies and difference machine learning methods.
This work identifies the most appropriate methods in an attrition context and an effective pipeline based on an ensemble approach and clustering. Our strategy can enlighten marketing or human resources services on the behavioral patterns of customers and their attrition probability.