top of page

Customer Churn Analysis for three Banks.

Jason Ismail

Updated: May 26, 2021

Customer churn is a common concern for businesses. The problem we have set out to solve is trying to identify customers that are considering leaving the bank. For this project, I tackled the issue from many different directions. For this project there was a bit of a Learning Curve.



The dataset that I used can be found here:


My Jupyter Notebook can be downloaded here:



My first attempt at solving this problem was to rule out some algorithms that we were recently studying.


I started with a linear regression. As a former math teacher, I have to admit I was very skeptical about using a linear regression model. I was surprised at first to see high accuracy with my model but was convinced that the model was not a good match for my data. This is where I learned first hand what overfitting looked like. I was working with an unbalanced dataset. Approximately 8000 people who stayed with the branch and 2000 people that had left the banks. I remember thinking wow 80% is not a bad start for my project. But it did not sit well with me so I dug deeper and discovered that for the most part the algorithm was for the most part just voting with the majority. So there was no predictive power.


I noticed that some of the data was quite skewed so I employed logarithms to even things out.



But eventually decided to abandon this strategy because I felt that the data was not representative after making those changes.


PCA dimension reduction at 95% only removed one feature from my dataset and obscured the results of the dataset.




I rebalanced the dataset briefly so there was a more even distribution between customers staying and leaving the bank. But this did not help solve the problem. No matter how I sliced it regression models were not the solution.



“I tried many different algorithms and solved the churn analysis problem in many different ways.”

Random Forrest


My first successful algorithm was the random Forrest. I obtained about an 85% accuracy and my model began to start choosing values from both populations of customers. Keep in mind this is an unbalanced dataset.



There was also a reasonable learning rate according to the ROC curve.

I built a pipeline and did a gridsearchCV


This was a very large search space. I learned later how to determine better value choices for my grid searches.

I was left with the best parameters for my Random Forrest.


But did not see a huge difference in the results.



Decision Tree



I built a decision tree but really only used it for the nice plot. I was running low on time for the semester and I had my heart set on seeing the results from a neural network.


“I built my first neural network right before deciding to get my masters in data science.”

Neural Network


This model was my favorite. It was a Sequential deep neural network with three layers.



I built a pipeline and used it to make predictions on whether or not a person would leave the bank.

This allowed me to make up new customers and test whether or not they would leave the bank.


I even built a report showing which customers should be contacted by the bank to try and save.


Comentarios


Los comentarios se han desactivado.

DON'T MISS THE FUN.

Thanks for submitting!

Looking to Hire?

Connect with a Versatile Data Scientist

 

 


Are you in need of tailored data science solutions for your business? I'm here to help. With a Master's Degree in Data Science and a Bachelor's in Mathematics, I bring a blend of academic rigor and practical experience to the table.

Expertise in Building Comprehensive Data Solutions:

Proficient in developing end-to-end data science projects, including the collection, cleaning, and analysis of raw data.
Specialized in Python.


Technical Proficiencies:

Skilled in using Pandas, Yolo, NumPy, PyTorch and Keras/TensorFlow for creating sophisticated Deep Neural Networks.
Experienced in computer vision and leveraging Nvidia CUDA for high-performance computing tasks.


Personal Qualities:

Recognized by peers, mentors, and students as a dedicated and hardworking professional. I come with a long list of references.


Known for facing challenges head-on and being a supportive team player.
Skilled at making complex concepts accessible and relatable, with a passion for continuous learning.


Contact Information:

Jason Ismail
Masters in Data Science, Bachelors in Mathematics
LinkedIn Profile
Phone (Text Only): 719-322-8479

About Me

Data Science

Data Science isn't just my career; it's the realization of a lifelong passion where my love for mathematics, programming, and technology converge. Over the past 20 years, I've nurtured a deep fondness for computers, starting from building them to exploring their immense capabilities.

My academic path initially led me to programming and then chemistry, where I excelled nationally in the 98th percentile. This experience, however, led to an epiphany - it was the mathematical elements within chemistry that truly captivated me. This revelation steered me towards a scholarship in Mathematics and a subsequent career in teaching.

But the true calling came with Data Science. Here, I found an exhilarating opportunity to transform abstract mathematical theories into impactful, real-world applications. My focus now is on cutting-edge areas such as Artificial Intelligence, Neural Networks, Computer Vision, and Reinforcement Learning - fields where I can blend my analytical skills with creative problem-solving to innovate and advance the boundaries of technology.

Data Science for me is more than a profession; it's a canvas where I paint with numbers and algorithms, creating solutions that matter.

POST ARCHIVE

bottom of page