Exploratory Data Analysis in Lake Lure, NC.
Updated: May 12, 2021
This Project featured quite a bit of cleanup. I was working with a Airbnb dataset for home rentals near Lake Lure NC. There was quite a bit of information that did not apply to what I was interested in for this project.
My Notebook can be found here:
Photo Courtesy of Romanticasheville.com
I started out with 74 features in my dataset. I looked over every column and made quite a few tough choices. The dataset had quite a few null values. Many of the columns were unrelated to my topic.
“I think it is important when possible for a Data Scientist to do their own EDA. That way they can build a deeper understanding of the dataset.”
For this project I used my skills as a Mathematician to help me decide which values were considered outliers.
Here I was calculating quartiles and finding the inner quartile range. The assumption is that if you go 1.5 times the Inner quartile range past the first and third quartiles respectively you can find a good spot to start considering whether or not you have outliers in the dataset. In this case I was only considering the upper bound for prices because low prices could mean a hidden deal.
There were some real hidden gems in this dataset.
I found some awesome reviews as well as descriptions of the properties from the owners.
So I cleaned the data up using some tricks I had learned while learning about Natural Language Processing.
Once the data was cleaned up I eventually used fuzzy matching to try and see which of the descriptions featured keywords that would indicate that the property was a waterfront property. After all typically the owners would brag about the lake if they were in fact situated on the lake. This helped me zero in on prospective homes to visit.
Comments