When you try to focus on the target segments with a high response rate, RFM is one of the most useful methods. Most of all, RFM is intuitive and easy to get results in a way that it is a kind of heuristic analytics, which is different from a regression model. RFM is an acronym of “Recency”, “Frequency”, and “Monetary”, meaning “how recently did the customer purchase?”, “how often do they purchase?”, and “how much do they spend on average?” separately according to Wikipedia.
RFM is simply to find out the groups with higher response rates based on empirical data and to spend money or put more efforts into these groups, resulting in higher profitability when it comes to sales promotion. This analysis is similar to segmentation methods, and is actually more streamlined method. Each variable of RFM plays a role to put a code or criteria, which makes one group differentiated from others. For example, customers who have the same group with “231” are assumed to have same purchasing pattern while they are different from other customers who have a code of “233”.
How to codify customers? This is simple. RFM analysis divides customers into 5 (quintile) or 10 (decile) groups in each variable. This grouping follows the order of each variable, meaning the customer who purchased 10 times is ahead of the other who purchased less than 10 times in terms of frequency. Let’s say, we follow the rule of decile. The total number of groups is 1,000 (=10*10*10). If it’s quintile, the total number of groups is 125 (=5*5*5)
Now, start the way RFM works in R.
I will use the sample data, which consists of several variables representing recency, frequency, and monetary. After reading and sorting data, classify customers based on quintile. Actually, there are three approaches to classify customers: sequential, independent, and intuitive. I will use the independent approach, which classifies customers with no consideration of the relation or importance of variables.
# set a working directory
setwd("D:/Rstudio/R files/Project/RFM")
# read a data file from the working directory
d1 <- read.csv("sample.csv")
head(d1)
# make data frame with relevant columns
d2 <- as.data.frame(cbind(d1[,1],d1[,4],d1[,2],d1[,3],d1[,5],d1[,6]))
# change the column names
names <- c("ID", "Recency", "Frequency", "Monetary", "Buy", "avgExpense")
names(d2) <- names
head(d2)
dim(d2)
# 5 quantile for recency
Rq <- quantile(d2$Recency, probs = seq(0, 1, 0.2), na.rm = FALSE, names = TRUE)
d2$R_Score[d2$Recency >= Rq[5]] <- "1"
d2$R_Score[d2$Recency < Rq[5] & d2$Recency >= Rq[4]] <- "2"
d2$R_Score[d2$Recency < Rq[4] & d2$Recency >= Rq[3]] <- "3"
d2$R_Score[d2$Recency < Rq[3] & d2$Recency >= Rq[2]] <- "4"
d2$R_Score[d2$Recency < Rq[2]] <- "5"
# 5 quantile for frequency
Fq <- quantile(d2$Frequency, probs = seq(0, 1, 0.2), na.rm = FALSE, names = TRUE)
d2$F_Score[d2$Frequency >= Fq[5]] <- "5"
d2$F_Score[d2$Frequency < Fq[5] & d2$Frequency >= Fq[4]] <- "4"
d2$F_Score[d2$Frequency < Fq[4] & d2$Frequency >= Fq[3]] <- "3"
d2$F_Score[d2$Frequency < Fq[3] & d2$Frequency >= Fq[2]] <- "2"
d2$F_Score[d2$Frequency < Fq[2]] <- "1"
# 5 quantile for monetary
Mq <- quantile(d2$Monetary, probs = seq(0, 1, 0.2), na.rm = FALSE, names = TRUE)
d2$M_Score[d2$Monetary >= Mq[5]] <- "5"
d2$M_Score[d2$Monetary < Mq[5] & d2$Monetary >= Mq[4]] <- "4"
d2$M_Score[d2$Monetary < Mq[4] & d2$Monetary >= Mq[3]] <- "3"
d2$M_Score[d2$Monetary < Mq[3] & d2$Monetary >= Mq[2]] <- "2"
d2$M_Score[d2$Monetary < Mq[2]] <- "1"
# convert character to numeric
d2$R_Score <- as.numeric(d2$R_Score)
d2$F_Score <- as.numeric(d2$F_Score)
d2$M_Score <- as.numeric(d2$M_Score)
# calculate the total score
Total_Score <- c(100*d2$R_Score + 10*d2$F_Score+d2$M_Score)
d3 <- cbind(d2,Total_Score)
head(d3)
After getting codified segments, you need to get the highly responsive groups who will account for better profitability. To that end, it is required to find a breakeven point, which will divide sample groups into two segments: a target group and a non-target group. The breakeven point is the response rate that makes profits from marketing activities equivalent to costs for those activities.
Number of target customers * Breakeven point * Profit – Number of target customers * Marketing cost = 0
From the above formula, the breakeven point is obtained by dividing costs with profits. We need to apply this response rate to R, resulting in the target customers who have higher than the rate. Before this process, you may need to check how customer groups account for response rates. Higher number of groups show higher response rate from marketing activities in a bar chart. i.e. the group of 555 has higher possibility to buy some stuff than the group of 111.
# draw a bar chart to check the relation between group and response rate
y1 <- aggregate(d3$Buy, by=list(d3$Total_Score), FUN=mean, na.rm=TRUE)
head(y1)
barplot(y1$x, names.arg=y1$Group.1, ylim=c(1,1.10), col=rainbow(25), ylab="Average response rate", xlab="Groups", xpd=F)
# if you make the value of x be between 0 and 1, but this job requires a lot of RAM memory
sapply(y1$x,function(x) {as.numeric(y1[,2])-1})
# find the highly responsive customers above the break-even point
# break even can be calculated by dividing costs by profits, let's say 2%
y2 <- sapply(y1$x, function(x) {x>1.02})
head(y2)
y2 <- cbind(y1,y2)
names <- c("Group", "R_Rate", "Target")
names(y2) <- names
head(y2)
Now you know who the target customers are, and you can save money for your marketing activities such as mailing or distributing coupons.

