How to Leverage LLMs for automated Hyper Personalized Offer Generation

In this blog, you will learn how to get important information about customer’s activity using techniques like RFM analysis and then how to categorize these customers into different groups using a clustering technique.

On top of that, We will discuss how you can leverage the power of LLMs to analyze your clusters and provide personalization to your customers. We will also discuss different use cases where the combination of clustering and LLMs can help you.

We will also build a small tool to generate personalized offers for our e-commerce platform customers. We will first categorize them using RFM analysis and clustering techniques and then we will use an LLM model to analyze the cluster data and generate offers (tool demo added at the end).

So let’s get started! 🚀

Introduction

If you are a business owner or an entrepreneur, then you already know how challenging and hard is to retain your customers so that they can stick to your product for a long time. It is hard to acquire a totally new customer than to convert an existing customer to a loyal customer for your product.

As a good product owner, you don’t want to lose such customers who are loyal to your business or potential loyal customers. As they are very important aspect of your product, you need to keep them stick to your product.

But how do you find the customers who are loyal to your product and spending money frequently on your product? 🤔

This is where customer segmentation comes into picture. To find out the customers who are satisfied with your product and buying it from your platform frequently, you need to categorize all of your customers into different customer groups and then you can plan your strategies to target different customer groups.

Why Customer Segmentation is Needed

In a world where customers are bombarded with countless messages, offers, and ads, a one-size-fits-all approach no longer works. You need to consider preferences, needs and behaviors of different customer groups to increase the chances of conversion from a normal customer to a loyal customer.

‍

Here are the key reasons why customer segmentation is needed in today’s businesses 👇

🎯 Personalized Marketing

We all know that every person have a different preference or interest so segmentation allows you to group customers based on the interest or preference which increases the chances of a conversion. For example, on an e-commerce platform you only want to send electronic product offers to the customers who are interested in electronics.

There is no sense to offer 50% off on beauty products to a boy who only buys shoes from your platform 💀.

📈 Improved Customer Retention

If a customer is frequently buying products from your platform then there are high chances of getting an order from that customer if we give them a good offer on their favorite products than a customer who is barely active on the platform. This way they will stay loyal to your platform or product.

You might need to offer different discount to the customers who are barely active so that they can be active on your platform again.

For example, you found a group of customers who purchase high value items frequently from your platform then these customers might respond well to loyalty exclusive rewards and offers than a customer who purchases less products.

💡 Better Decision Making

Having information and insights about different customer groups and their needs allow you to target each of them differently. It also helps you to make your business decision as it gives you feedback and insights about your product as well.

💸 Optimal Resource Allocation

By identifying high value customer groups, you can allocate more marketing budget or offers to them because they are more likely to generate more revenue than normal customers. This ensures better ROI and avoids wasted effort on less impactful campaigns.

Customer Segmentation Techniques

Now you know that why customer segmentation is needed but the question is “How can i segment my customers or on what basis I need to categorize them? 🤔”.

So the most common customer segmentation techniques involve categorizing a customer by the following

Demographic Segmentation: Categorizing customers based on their age, income or gender. For example, a luxury car company might target people with high income where a budget car making company might target entry-mid level professionals.
Geographic Segmentation: Categorizing customers based on their geolocation. For example, you might want to target different customers in different countries.
Psychographic Segmentation: Categorizing customers based on their interests and preferences.

The above 3 methods are good to quickly categorize your customers in different groups but as your customers and data grows, you need to use techniques like RFM analysis to get more detailed insights on user’s spending, activity and loyalty towards your business.

RFM Analysis

RFM stands for Recency, Frequency and Monetary. These three values are critical values to find high value customers. This method segments customers based on the following 👇

Recency: How recently a customer made any purchase on your platform
Frequency: How frequently a customer is buying products from your platform
Monetary: How much money a customer spends on your platform.

We generally need a customer and invoice data to calculate the above three values. Each customer is scored based on these 3 values and then we can find out high value customers.

‍

Why RFM analysis is useful for customer segmentation?

As discussed above, we generally get the RFM values from the customer data and assign a RFM score to each customer. These scores are very useful to get information and insights about each customer.

For example,

A customer with high frequency and monetary score is likely a loyal and high value customer.
A customer with high recency and monetary score is likely an active and high value customer.
A customer with low recency but with high monetary score shows that the customer used to be a high value customer but can be converted to an active customer with targeted offers and campaigns.

To find the RFM values, we need to consider how much orders they have made, how frequently they purchase from our platform and how much money they have spent on the platform. We can calculate all these values using libraries like pandas and numpy which are most popular data analysis libraries in python.

Let’s discuss how to do it!

How to Perform RFM Analysis using Python

Let’s suppose we have an e-commerce platform with around 10k users who purchase different products in different categories from our platform. Now we have some coupon codes or offers to provide some discount in each category but we want to ensure maximum utilization of these codes.

The optimal way of doing this will be giving category specific coupon codes to the customers who purchase items from that same category. For example, we will give more offers on electronic items to the customers who are actively purchasing electronic items.

That way we can utilize all these offers and also increase the revenue because if a person likes any category then there are high chances of that person spending more money on that category with a discount offer.

Suppose we have user details and their invoices in CSV files. The user CSV file contains columns like User ID, Name, Email and Creation date whereas invoice CSV file contains columns like Invoice ID, User ID, Timestamp, Category, Total amount, Discount amount etc.

To find the RFM values, we will need to consider the following fields 👇

Timestamp: To find the most recent purchase from the user (recency)
Invoice ID: To find the count of total invoices made by the user (frequency)
Total Amount: To find the total money spent on the platform (monetary)

We also need to consider other fields like category, total coupons used and total discount to get more information about each customer and it will be helpful when we will apply clustering algorithm on these customers to categorize them in different categories.

Here is how it will look like in code


    # Suppose you have user csv and invoice csv in pandas dataframe	
    rfm = data.groupby('User ID').agg({
        'Timestamp': lambda x: (reference_date - x.max()).days,  # Recency
        'Invoice ID': 'count',                                   # Frequency
        'Total Amount': 'sum',                                   # Monetary
        'Discount Amount' : 'sum', # Total Discount
        'Coupon Used': 'sum', # Total coupon used
        'RecentPurchase': 'sum',  # Total purchases in last 3 months
        'Total Items Purchased': 'sum', # Total items purchased
        'Category': lambda x: x.mode()[0] if len(x.mode()) > 0 else 'None' # Favorite category
    }).reset_index()

    # Rename columns
    rfm.columns = ['UserId', 'Recency', 'Frequency', 'Monetary', 'TotalDiscount','TotalCoupons','RecentPurchases3Months', 'TotalItems','FavoriteCategory']

    rfm['R_Score'] = pd.qcut(rfm['Recency'], q=5, labels=[5, 4, 3, 2, 1],duplicates='drop')  # Lower recency is better
    rfm['F_Score'] = pd.qcut(rfm['Frequency'], q=5, labels=[1, 2, 3, 4, 5], duplicates='drop')  # Higher frequency is better
    rfm['M_Score'] = pd.qcut(rfm['Monetary'], q=5, labels=[1, 2, 3, 4, 5],duplicates='drop')  # Higher monetary is better

As you can see, I have given R,F and M score to each user out of 5 which will look like this 👇

‍

After RFM analysis is completed for all users then you can easily categorize these users in different customer groups using any clustering algorithm.

Now let’s discuss how we can further categorize them using clustering algorithms!

Clustering the Customers

Clustering is an unsupervised machine learning technique which is used to group data points into clusters. That means data points which are similar to each other stays in same cluster and different data points stays into different clusters. The accuracy of these algorithms depends on the fields you are providing into clustering algorithm.

‍

Once we get the RFM scores for each customer, we then need to categorize them in different customer groups using a clustering algorithm. There are many clustering algorithms which we can use to categorize our customers like 👇

K-means Clustering: K-Means groups customers into a predefined number of clusters by minimizing the variance within each cluster. It iteratively adjusts the cluster centroids to improve clustering accuracy.
Hierarchical Clustering: Hierarchical clustering builds a hierarchy of clusters by either merging smaller clusters (agglomerative) or splitting larger ones (divisive). The result is a tree-like structure called a dendrogram.
DBSCAN: DBSCAN groups customers based on the density of data points in a region. It can identify clusters of varying shapes and sizes while marking outliers as noise.

For our use case, we will use “K-means clustering” but don’t worry I will compare other 2 techniques as well in a separate section so just stay with me.

Preparing Data for Clustering

In our case, we need to pass the RFM score along with the favorite category of each user to clustering algorithm because we have offers for different category so user’s favorite category plays a big role while categorizing customers.

Before passing this data to K-Means algorithm, we need to convert this category field to integer because it doesn’t allows string fields and we also need to normalize every value so that we don’t get too much noise in our clusters.

You can use “One Hot Encoding” Method to convert categories into numbers and also normalize each data field.

Your data will look like this after one hot encoding 👇

‍

Finding The Optimal Number of Clusters

K-means algorithm requires a predefined number of clusters and based on that value it will categorize the customer data. Now to find the value of k (number of clusters), there are many methods like 👇

Elbow Method
Silhouette Score Method

In this blog, we are going to use elbow method but feel free to use other methods as well. The main goal of this is to find the optimal number of clusters.

In elbow method, we find the sum of squared distances between each point and its cluster’s centroid. This distance is called Within-Cluster Sum of Squares (WCSS). WCSS indicates how compact the clusters are and we want to make our clusters as compact as possible so that each point in cluster stay close to the center of cluster.

In this method, we find the WCSS for the range of K values (for example 2 to 11) and then plot these values on a graph against k. For initial k values, the graph will change in higher amount but as the value of k increases, the reduction in WCSS becomes minimal.

The point where the change of WCSS becomes minimal is called “Elbow Point” and that’s the optimal value of k.

After running this method on our customer data, I got the following graph 👇

‍

As you can see, the inertia (WCSS) is changing significantly for the initial k values but after 8 the reduction is minimal. So we can consider 8 as an optimal value for K.

Applying K-means Algorithm

K-means is the most popular clustering algorithm which is used to group data points into distinct clusters. It first takes a fixed amount of clusters and then groups the data into given clusters by calculating the Euclidean distance.

We already have found the optimal number of clusters for our case using elbow method which is 8 so let’s use it as k and apply k-means algorithm on it.


from sklearn.cluster import KMeans
optimal_k = 8
kmeans = KMeans(n_clusters=optimal_k, random_state=42)
rfm['Cluster'] = kmeans.fit_predict(rfm_transformed)

If you print the rfm dataframe then you will see the associated cluster(customer group) for each customer.

‍

Let’s plot this on a 2D graph using TSNE and see the actual clustering

As we can see, all of the customers are perfectly grouped in total 8 customer groups.

After this, we can then find some aggregated data for each customer group so that we can get more insights about each group and also we can pass it to our LLM model to generate offers for each group.

But before that let’s discuss other clustering algorithms as well and see if we can get better clusters using other algorithms 🤔

Comparing different Clustering Algorithms

We have already tried k-means algorithm to create our clusters but we also discussed other algorithms like hierarchical clustering and DBSCAN so let’s see how we can implement these algorithms and what are their results.

Hierarchical clustering is a method that builds a hierarchy of clusters either by merging smaller clusters into larger ones (agglomerative) or by splitting larger clusters into smaller ones (divisive). The most common approach is agglomerative clustering, which we’ll focus on here.

Here also we need to find the predefined number of clusters which we can find it using either with elbow method or with other methods.

So let’s apply Hierarchical clustering on our customer data points


from sklearn.cluster import AgglomerativeClustering
agg_clustering = AgglomerativeClustering(n_clusters=optimal_k)
rfm['Cluster'] = agg_clustering.fit_predict(rfm_transformed)

And here are the results 👇

‍

As we can see, it is very similar to K-means results but not as good as k-means.

On the other hand, DBSCAN is a density-based clustering algorithm that groups together points that are closely packed and marks points in sparse regions as outliers (noise). It’s particularly effective for detecting clusters of arbitrary shapes and handling noisy data.

Unlike other clustering methods, DBSCAN don’t require predefined number of clusters but we do need to find out the optimal number of eps which is the radius of the neighborhood around each data point.

You can find the optimal value for eps using k-nearest neighbors algorithm.

Let’s apply DBSCAN on our customer data


from sklearn.cluster import DBSCAN
dbscan = DBSCAN(eps=0.75, min_samples=4) 
rfm['Cluster'] = dbscan.fit_predict(rfm_transformed)

Let’s see the results 👇

‍

As we can see, The clusters are not formed well because this algorithm is not that good for our dataset and use case.

So we will stick with k-means algorithm for clustering our customers.

Generating Personalized Offers using LLMs

Now we have the clusters associated with each customer so its time to get some insights about individual clusters and we can do that in 2 steps

Get the aggregated data using the pandas library (like average spending, favorite category etc) which is useful for basic analysis
Get more detailed insights and information about each cluster using LLMs

We will first get all the aggregated fields that can be enough to differentiate each cluster and then we will pass this information to our LLM to generate detailed insights and perform action according to the given requirements, In our case to generate offers.

Generating Cluster Insights

We will find the following information for each cluster 👇

Average RFM score: Average Individual R,F and M scores
Average Discount: Average discount used by customers in each cluster
Average Coupons: Average coupons used by customers in each cluster
Average Items: Average Items purchased per invoice for each cluster
Favorite Category: Favorite Category of each customer in each cluster
Purchases in last 3 months: Average purchases made by customers in last 3 months for each cluster
Monetary: Average money spent by customers in each cluster

The above information will be enough to generate insights and offers for each customer group. If you have any other use case or requirement then you can generate more or less information for each cluster.

Here are the insights for our clusters 👇

‍

You can also make some basic insights just by looking at these results.

For example, we can see that for each category we have 2 customer groups, one customer group which is very active and spends more amount on that category and other which spends very less. So you can craft personalized offers or emails to target these customer groups.

So let’s generate offer data for each cluster using LLM.

Generating Offers for Each Cluster

We already have 8 customer groups with information about each group and now we want to target these individual customer groups with personalized discount offers to each customer group to increase the conversion ratio and retain loyal customers.

One way to address this will be providing discount on customer’s favorite category to retain the active and loyal customers and providing higher discounts on inactive customers to increase the chances of conversion.

In our case, we will let LLM decide the better way to generate offers for each cluster based on the given cluster information.

Here is the prompt I am going to use 👇


You are an data analysis expert, you will be given an analysis data which was gathered by performing RFM analysis on the customer invoice data and then performing clustering algorithm to categorize the customers in different groups.
                                               
Now your job is to generate offers and insights for all of these customer groups based on the given analysis data.
                                               
Notes:
- Make sure that you are not giving more than {offer_limit} percent off to any customer

Here is the analysis data:
---
{analysis_data}
---

We will get the maximum offer limit from user and we will get the analysis data from our clustering algorithms. Feel free to use your custom prompt according to your use case.

We will use OpenAI structured models to generate offer data for each cluster or customer group. We will generate the following fields for each cluster 👇

Category: The category of the offer
Offer text: The offer text which we can send to the customers directly as an email or in app message.
Insights: Some insights about the customer group and why the given offer is more effective for these customers.

The output schema of our LLM model will look like this 👇


class Offer(BaseModel):
    category: str = Field(..., description="category of the offer")
    offer_text: str =  Field(..., description="offer headline to send to the user")
    insights: str = Field(..., description="insights about this customer groups")

class Offers(BaseModel):
    offers: list[Offer]

After running the LLM, we will get the list of offers where each offer is an object of “Offer” class and we can easily get all the properties from it

Here is how this whole process look like 👇

‍

Lastly, let’s create a streamlit application to combine this whole process and see the final results!

Streamlit Application Demo

We will take the following information from users 👇

Users: It will contain the information about each user of your application
Invoices: It will contain the invoices made by each users in CSV format
Max Offer: The maximum discount percentage they want to offer.

Once user clicks submit, we will first perform RFM analysis to calculate R,F and M scores and then we will use our clustering algorithm to assign clusters for each customer.

At last, we will pass this cluster information to our LLM and get the offer information for each customer group.

Let’s see it in action 🚀!

‍

As we can see, we are able to generate effective and personalized offers for individual customer groups using our LLM and the analysis data 🎉!

Business Use Case

Whatever you saw in this blog was just the surface of this and there is so much to dive deeper into this domain. As you have seen it is very easy to segment your customers and perform any action on that individual segment data to provide more personalization to your customers using LLMs.

Let’s discuss some of the business use cases where you can utilize LLMs and customer segmentation techniques.

💡 Customer Churn Prediction and Retention

You can leverage LLMs and your customer data to predict which customers are at risk of leaving based on their customer support interactions and usage patterns.

You can create or let LLM create personalized retention strategies for each customer segment.

For example, For a subscription-based service, you can identify high-risk customers based on their activity and usage and offer customized discount offers or recommendations for these customers.

📢 Generating Personalized Recommendations and Marketing Campaigns

As we already discussed in the blog, you can segment customers based on their spending, activity and interests to generate personalized marketing campaigns using LLMs.

You can also provide personalized product recommendations to each customer based on their purchase history and favorite category.

For example, If you have an e-commerce platform then you can get all this information from user’s purchase history and activity and then use this data to provide personalized product recommendations in user’s favorite category which can increase your revenue.

Also, you can create personalized marketing campaigns to target your loyal customers and active customers.

💬 Analyzing Customer Feedback and Reviews

Businesses receive thousands of customer reviews and feedback across various channels like social media, your platform, emails or review sites and it becomes harder to analyze and address each review.

LLMs can process unstructured text data, extract key points, and categorize customers based on sentiment and feedback patterns which makes this task easier.

You can use LLMs to segment these users based on their reviews and channels and then target these customer segments instead of targeting each user individually which saves a lot of time

You can also use this information to make any decision for your product or platform based on each segment’s information.

For example, you can segment your email reviews in 2 categories called satisfied email reviews and unsatisfied email reviews and you can do same with other channels as well. The best thing is you don’t need a separate solution for each channel but you can do this single-handedly using LLMs.

Conclusion

As we saw in the blog, It is very easy to segment your customers using customer segmentation techniques like RFM and clustering but it is more effective when used with LLMs to achieve your end goal and to provide more personalization to your customer segments.

So if you are a business owner then it is necessary for you to segment your customers so that you can identify how customers are interacting with your platform. Ultimately, It can help you to retain your active and loyal customers and increase the conversion ratio for less active customers.

Have an Idea for Your Business?

If you want to implement something like this in your business or you have an idea to automate any workflow using LLMs then feel free to book a call with us and we will be more than happy to help you.

Thanks for reading 😄