Instacart Customer Profiling

I know, I'm throwing you straight into the deep end of the project with this image, but for me, this image represents the answer to my major quandary:

How can I compare numerous variables unbiasedly?

The focal point of this project was to find distinct customer profile types to empower Instacart in better serving its diverse customer base. Given the multitude of variables at hand, I grappled with the challenge of ensuring an unbiased and comprehensive analysis. While you'll later see a common sense approach of martial status for profiling, that only works because of its intuitive nature.

For me, data analysis is discovering the unknown. Which means I needed a way to go above my own limited knowledge. And this picture? It was the answer.

This project was one of my most extensive endeavors to date, consuming over 120+ hours of meticulous work spanning slightly over two months. A significant portion of this time was dedicated to diving headfirst into machine learning, an exciting but demanding aspect of the analysis. Consequently, I did not create an extensive dashboard as I have in some of my previous projects. Rest assured, though, that I have a collection of captivating visuals below to illustrate the rich insights unearthed during this journey.

Instacart Variable Correlation via a Heat Map

The Steps I Took:

  1. Basic exploratory analysis

  2. Visualized busy times by hour of day and day of week

  3. Visualized departments by order count and total sales

  4. Checked order region differences

  5. Made customer profile tags for martial status and loyalty

  6. Made product pricing tags

  7. Visualized all variables to see correlation as a heatmap.

  8. Made an elbow chart

  9. Ran a K-Mean cluster on computer

  10. Assigned a third tag to customer based on segment.

Tools I Used:

  • Python

  • Sikit Learn

  • Pandas

  • Seaborn

  • Matplotlib

  • Excel

Why I took these step:

  1. Gained a feel for the data behavior

  2. Determined when the store would have the most foot traffic.

  3. Checked if the most bought was also the most profitable.

  4. Determined if Instacart was more profitable in specific U.S. regions.

  5. Defined customer type to better market to

  6. Used to determine if price was a factor in what customers bought

  7. Able to see unbiased relationships between variables

  8. Needed to determine appropriate amount of clusters

  9. Useful to further profile customers

  10. Saved results of K-Mean analysis

Challenges:

1. When I saw how many variables I had to make profiles off of I knew I needed a better way to compare them without making a hundred line charts. That’s how I learned about K-mean clustering and a way to use heat maps for variables.

2. I also ran into long run times (every K-mean run took about 16min. to complete) and not enough memory space. I fixed the memory space by converting float64’s into the correct int type. Also I learned to zip my files. That helped a lot. As for the long run times I plan on getting a better gpu next time I buy a computer.

K-Mean segmentation as a scatter plot

Here are some of my wonderful visuals from this project.

Customer Segments by segment and income with mean age and income
Order Counts by Region
Most Profitable Time of Day
Check out Single Parent and Segment 1
VS
The Most Ordered and Most Profitable are not the same Departments

Full Gallery

  • Click to Enlarge

Results:

Customer 1 : The single parent with the lowest income. They spend half as much as the other three groups.

Customer 2 : The partnered parent that has the highest mean prices and spends the most on Instacart.

Customer 3 & 4: The single adult and retired adult who spend about the same but have different buying needs.

Recommendations:

With 21% of Instacart’s customer base being new customers, its vital that Instacart converts them to loyal customers by offering targeted promotions.

Cluster 3 is a customer type that plans for large purchases. Sales for large items should be advertised in advance so they can plan accordingly

Region three has a much higher volume of orders compared to the other regions. Instacart should look into why they have been more successful.

The Produce department has the most volume of sales but Dairy and Eggs have the highest sales revenue. Instacart should investigate how their profit margins align with this.

Data Sources:

  1. Instacart

Retrospective:

This was my first major project with python and the challenge and satisfaction of it made it one of the best learning experiences of my life. The infinite scope that python lets you take on can be daunting but also immensely satisfying when you figure something out.

The results of this analysis showed how it’s the bottom earners that really limit their spend at markets. The max price of $15,000 for segment 1 combined with the single parent max price of $15 per item shows the difficulty single income households go through.

If I were to continue working on this project in the future I’d start by comparing the different customer profile types to the four segment types. I also want to use this data set in the future to learn more basic machine learning.