Customer-Churn-Prediction-Analysis

Customer Churn Prediction Analysis

Project Overview

A comprehensive machine learning project that predicts customer churn using classification models including Logistic Regression, Decision Trees, and Random Forest. The project achieves 84% accuracy through hyperparameter tuning and handles class imbalance using SMOTE technique.

πŸ“Š Key Achievements

πŸ› οΈ Technologies Used

πŸ“ Project Structure

customer-churn-prediction/
β”‚
β”œβ”€β”€ customer_churn_prediction.py    # Main analysis script
β”œβ”€β”€ visualizations.py                # Visualization generation
β”œβ”€β”€ requirements.txt                 # Python dependencies
β”œβ”€β”€ README.md                        # Project documentation
β”‚
β”œβ”€β”€ Output Files:
β”‚   β”œβ”€β”€ model_comparison.csv         # Model performance metrics
β”‚   β”œβ”€β”€ feature_importance.csv       # Feature importance rankings
β”‚   β”œβ”€β”€ predictions.csv              # Test set predictions
β”‚   β”œβ”€β”€ churn_analysis_visualizations.png
β”‚   └── detailed_analysis_plots.png

πŸš€ Getting Started

Prerequisites

pip install pandas numpy scikit-learn matplotlib seaborn imbalanced-learn

Or use the requirements file:

pip install -r requirements.txt

Running the Analysis

  1. Run the main analysis:
    python customer_churn_prediction.py
    
  2. Generate visualizations:
    python visualizations.py
    

πŸ“‹ Dataset Features (20+ Variables)

Demographics

Account Information

Services

Financial

Engagement Metrics

Engineered Features

πŸ€– Machine Learning Pipeline

1. Data Preprocessing

2. Train-Test Split

3. Class Imbalance Handling

4. Model Training

Baseline Models:

  1. Logistic Regression
  2. Decision Tree Classifier
  3. Random Forest Classifier (100 estimators)

5. Hyperparameter Tuning

Used GridSearchCV on Random Forest with:

Cross-validation: 5-fold CV Scoring metric: Accuracy

πŸ“ˆ Model Performance

Model Accuracy Precision Recall F1-Score ROC-AUC
Logistic Regression ~78% ~0.76 ~0.72 ~0.74 ~0.85
Decision Tree ~80% ~0.78 ~0.75 ~0.76 ~0.83
Random Forest ~82% ~0.81 ~0.78 ~0.79 ~0.88
Random Forest (Tuned) ~84% ~0.83 ~0.81 ~0.82 ~0.90

🎯 Key Findings

Top Risk Factors for Churn:

  1. Contract Type (Month-to-Month highest risk)
  2. Tenure (< 12 months)
  3. Customer Service Calls (> 3)
  4. Payment Method (Electronic Check)
  5. Monthly Charges (> $80)
  6. Lack of Tech Support
  7. Late Payments

Model Insights:

πŸ“Š Visualizations

The project generates comprehensive visualizations including:

  1. Model Comparison Charts - Accuracy across all models
  2. Feature Importance - Top 10 predictive features
  3. Confusion Matrix - Classification performance breakdown
  4. ROC Curve - Model discrimination capability
  5. Prediction Distribution - Probability distributions by class
  6. Precision-Recall Curve - Trade-off analysis

πŸ’Ό Business Applications

Proactive Retention Strategy

Risk Segmentation

Feature Monitoring

πŸ”„ Model Deployment Recommendations

  1. Real-time Scoring: Deploy model as API endpoint
  2. Batch Processing: Weekly churn risk assessments
  3. Monitoring: Track model performance metrics
  4. Retraining: Quarterly model updates with new data
  5. A/B Testing: Compare intervention strategies

πŸ“ Code Highlights

SMOTE Implementation

from imblearn.over_sampling import SMOTE

smote = SMOTE(random_state=42)
X_train_balanced, y_train_balanced = smote.fit_resample(X_train_scaled, y_train)

GridSearchCV for Hyperparameter Tuning

param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth': [10, 20, 30, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'max_features': ['sqrt', 'log2']
}

rf_grid = GridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid,
    cv=5,
    scoring='accuracy',
    n_jobs=-1
)

Feature Engineering Example

df['loyalty_score'] = (df['tenure_months'] * 0.5) - (df['customer_service_calls'] * 2)
df['high_risk'] = ((df['contract_type'] == 'Month-to-Month') & 
                   (df['tenure_months'] < 12)).astype(int)

πŸ“š Model Evaluation Metrics

Results

churn_analysis_visualizations detailed_analysis_plots

πŸŽ“ Learning Outcomes

This project demonstrates:

🀝 Contributing

Suggestions for improvements:

πŸ“ž Contact

For questions or feedback about this project, please reach out through GitHub issues.

πŸ“„ License

This project is open source and available for educational purposes.


Note: This analysis uses synthetic data generated to match real-world churn patterns. For production use, replace with actual customer data while ensuring proper data privacy and compliance.