This study presents a machine learning approach for bank customer churn prediction using a leakage-controlled preprocessing pipeline and engineered behavioral features. The dataset consisted of 10,000 customer records, where identifier and leakage-prone variables were removed to improve methodological validity. Five classification models—Logistic Regression, Naive Bayes, Support Vector Machine (SVM), Random Forest, and Gradient Boosting—were evaluated. Feature engineering introduced composite variables, including BalanceSalaryRatio, ProductsPerTenure, CreditScoreAgeIndex, ActivityAdjustedBalance, and PointsPerProduct, to capture customer behavior patterns. The dataset was split into 80% training and 20% testing sets using stratified sampling with a fixed random state. Model performance was assessed using Accuracy, Precision, Recall, F1 Score, and ROC AUC. Results show that Gradient Boosting achieved the best overall balance, with Accuracy = 0.8690, F1 Score = 0.6124, and ROC AUC = 0.8753. The findings indicate that leakage-controlled preprocessing and behavior-based feature engineering provide a practical and interpretable approach for customer churn prediction and retention analytics.
Customer Churn Prediction, Machine Learning, Feature Engineering, Predictive Analytics, Customer Retention