03. Market Segmentation (W3)

Haiyue3/23/24About 3 min

Descriptive Analytics - Market Segmentation

Market segmentation

Segmentation
- Males and females of different shapes and sizes, children – clothing, shoes
- Climate or cultural differences
Targeting or Positioning
- Target some distinctive segment
- Position your company as belonging or appealing to some segment
Examples
- Peroni Beer – Italian beer
- Italian pizza

How might researcher choose to segment?

Measurable, Durable, Accessible, Different, Substantial

Segmentation base/variable	Examples
Demographics	Age, gender, income, occupation, education, family size, geography
Psychographics	Attitudes, opinions, activities, personality, lifestyle, interests, values
Behavioural	Usage rate, main brand, media used
Other	Occasion/situation, benefits sought, media habits

Database-driven targeting

Recency, Frequency, and Monetary value (RFM)
Existing customers base VS new customers
Heavy VS Light customers

Methods for segmentation

Business rules
- Arbitrary criteria from demographics, psychographics and/or behavior
- Arbitrary quantile /threshold membership from above characteristics
Clustering
- Unsupervised methods of data analysis

Review Clustering

Unsupervised classification

Choose meaningful variables
Select a measure of a distance or similarity/dissimilarity
Maximize between group distance and minimize within group distance
Interpret resulted clusters

Selection of variables

Your variables:

are meaningful for the analysis objective
are (relatively) independent
are limited in number
are numerical (one hot encoding for categorical)
have low kurtosis and skewness statistics (at least in the training set)

One-hot encoding

library(data.table)
library(mltools)
customers <- data.frame(
    id=c(1, 2, 3, 4),  
    gender=c('M', 'M', 'M', 'F'), 
    mood=c('happy’, 'sad', 'happy’, 'sad')
)

customers <- one_hot(as.data.table(customers))
customers

Measure of distance

Manhattan, Euclidian, Minkowski distance: $D (X, Y) = (\sum_{i = 1}^{n} | x_{i} - y_{i} |^{p})^{\frac{1}{p}}$
Cosine distance: $s i m i l a r i t y = c o s (θ) = \frac{A \cdot B}{‖ A ‖ ‖ B ‖} = \frac{\sum_{i = 1}^{n} A_{i} B_{i}}{\sqrt{\sum_{i = 1}^{n} A_{i}^{2}} \sqrt{\sum_{i = 1}^{n} B_{i}^{2}}}$
Jaccard distance: $J (A, B) = \frac{A \cap B}{A \cup B} = \frac{| A \cap B |}{| A | + | B | - | A \cap B |}$

Clustering types and algorithms

Hierarchical clustering (AGNES, DIANA)
Partition-based clustering (k-Means, k-Medoids)
Mean-shift clustering
Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
Expectation–Maximization (EM) Clustering

Example: Tourist Risk Taking

The data set contains 563 respondents who state how often they take risks from the following six categories:

recreational risks: e.g., rock-climbing, scuba diving
health risks: e.g., smoking, poor diet, high alcohol consumption
career risks: e.g., quitting a job without another to go to
financial risks: e.g., gambling, risky investments
safety risks: e.g., speeding
social risks: e.g., standing for election, publicly challenging a rule or decision

Respondents are presented with an ordinal scale consisting of five answer options [NEVER(1), RARELY(2), QUITE OFTEN(3), OFTEN(4), VERY OFTEN(5)].

Code On Google Colab

Hierarchical clustering

risk <- read_csv("risk.csv")
dim(risk)
head(risk)

colMeans(risk)

risk.dist <- dist(risk, method = "manhattan")
risk.hcl <- hclust(risk.dist, method = "complete")
risk.hcl

plot(risk.hcl, main = "", labels = FALSE)

c2 <- cutree(risk.hcl, h = 20)
table(c2)
c6 <- cutree(risk.hcl, k = 6)
table(c6)

c2.means <- aggregate(risk, list(Cluster = c2), mean)
round(c2.means[, -1], 1)
c6.means <- aggregate(risk, list(Cluster = c6), mean)
round(c6.means[, -1], 1)

DBScan

library(dbscan)
library(readr)
library("factoextra")
risk <- read_csv("risk.csv")

res.db <- dbscan(risk, eps= 1.5, minPts = 5)
res.db

fviz_cluster(res.db, risk, geom = "point")

EMCluster

library(EMCluster)

emobj <- simple.init(risk, nclass = 6)
risk.em <- emcluster(risk, emobj, assign.class = TRUE)

par(mfrow = c(1, 1))
plotem(risk.em, risk)

summary(risk.em)

Cluster Plot

library(cluster) 
clusplot(risk, risk.em$class, color=TRUE, 
         shade=TRUE, labels=2, lines=0)

em.means <- aggregate(risk, 
                      list(Cluster = risk.em$class), 
                      mean)
round(em.means[, -1], 1)

Summary

Segmentation – to do or not to do?
Methods for segmentation: business rules vs clustering
Variables to consider
Distance measures
One-hot encoding
Clustering algorithms
Interpretation of clusters

Reference

Tim's slides of Week 3