Instacart Customer Profiling
I know, I'm throwing you straight into the deep end of the project with this image, but for me, this image represents the answer to my major quandary:
How can I compare numerous variables unbiasedly?
The focal point of this project was to find distinct customer profile types to empower Instacart in better serving its diverse customer base. Given the multitude of variables at hand, I grappled with the challenge of ensuring an unbiased and comprehensive analysis. While you'll later see a common sense approach of martial status for profiling, that only works because of its intuitive nature.
For me, data analysis is discovering the unknown. Which means I needed a way to go above my own limited knowledge. And this picture? It was the answer.
This project was one of my most extensive endeavors to date, consuming over 120+ hours of meticulous work spanning slightly over two months. A significant portion of this time was dedicated to diving headfirst into machine learning, an exciting but demanding aspect of the analysis. Consequently, I did not create an extensive dashboard as I have in some of my previous projects. Rest assured, though, that I have a collection of captivating visuals below to illustrate the rich insights unearthed during this journey.
Instacart Variable Correlation via a Heat Map
The Steps I Took:
Basic exploratory analysis
Visualized busy times by hour of day and day of week
Visualized departments by order count and total sales
Checked order region differences
Made customer profile tags for martial status and loyalty
Made product pricing tags
Visualized all variables to see correlation as a heatmap.
Made an elbow chart
Ran a K-Mean cluster on computer
Assigned a third tag to customer based on segment.
Tools I Used:
Python
Sikit Learn
Pandas
Seaborn
Matplotlib
Excel
Why I took these step:
Gained a feel for the data behavior
Determined when the store would have the most foot traffic.
Checked if the most bought was also the most profitable.
Determined if Instacart was more profitable in specific U.S. regions.
Defined customer type to better market to
Used to determine if price was a factor in what customers bought
Able to see unbiased relationships between variables
Needed to determine appropriate amount of clusters
Useful to further profile customers
Saved results of K-Mean analysis
Challenges:
1. When I saw how many variables I had to make profiles off of I knew I needed a better way to compare them without making a hundred line charts. That’s how I learned about K-mean clustering and a way to use heat maps for variables.
2. I also ran into long run times (every K-mean run took about 16min. to complete) and not enough memory space. I fixed the memory space by converting float64’s into the correct int type. Also I learned to zip my files. That helped a lot. As for the long run times I plan on getting a better gpu next time I buy a computer.
K-Mean segmentation as a scatter plot
Here are some of my wonderful visuals from this project.
Customer Segments by segment and income with mean age and income
Order Counts by Region
Most Profitable Time of Day
Check out Single Parent and Segment 1
VS
The Most Ordered and Most Profitable are not the same Departments
Full Gallery
Click to Enlarge
Results:
Customer 1 : The single parent with the lowest income. They spend half as much as the other three groups.
Customer 2 : The partnered parent that has the highest mean prices and spends the most on Instacart.
Customer 3 & 4: The single adult and retired adult who spend about the same but have different buying needs.
Recommendations:
With 21% of Instacart’s customer base being new customers, its vital that Instacart converts them to loyal customers by offering targeted promotions.
Cluster 3 is a customer type that plans for large purchases. Sales for large items should be advertised in advance so they can plan accordingly
Region three has a much higher volume of orders compared to the other regions. Instacart should look into why they have been more successful.
The Produce department has the most volume of sales but Dairy and Eggs have the highest sales revenue. Instacart should investigate how their profit margins align with this.
Data Sources:
They have since taken it down, below is the full original citation.
“The Instacart Online Grocery Shopping Dataset 2017”, Accessed from https://www.instacart.com/datasets/grocery-shopping-2017 on 1 April 2023
Retrospective:
This was my first major project with python and the challenge and satisfaction of it made it one of the best learning experiences of my life. The infinite scope that python lets you take on can be daunting but also immensely satisfying when you figure something out.
The results of this analysis showed how it’s the bottom earners that really limit their spend at markets. The max price of $15,000 for segment 1 combined with the single parent max price of $15 per item shows the difficulty single income households go through.
If I were to continue working on this project in the future I’d start by comparing the different customer profile types to the four segment types. I also want to use this data set in the future to learn more basic machine learning.