03. Market Segmentation (W3)
3/23/24About 3 min
Descriptive Analytics - Market Segmentation
Market segmentation
- Segmentation
- Males and females of different shapes and sizes, children – clothing, shoes
- Climate or cultural differences
- Targeting or Positioning
- Target some distinctive segment
- Position your company as belonging or appealing to some segment
- Examples
- Peroni Beer – Italian beer
- Italian pizza
How might researcher choose to segment?
Measurable, Durable, Accessible, Different, Substantial
| Segmentation base/variable | Examples |
|---|---|
| Demographics | Age, gender, income, occupation, education, family size, geography |
| Psychographics | Attitudes, opinions, activities, personality, lifestyle, interests, values |
| Behavioural | Usage rate, main brand, media used |
| Other | Occasion/situation, benefits sought, media habits |
Database-driven targeting
- Recency, Frequency, and Monetary value (RFM)
- Existing customers base VS new customers
- Heavy VS Light customers
Methods for segmentation
- Business rules
- Arbitrary criteria from demographics, psychographics and/or behavior
- Arbitrary quantile /threshold membership from above characteristics
- Clustering
- Unsupervised methods of data analysis
Review Clustering
Unsupervised classification
- Choose meaningful variables
- Select a measure of a distance or similarity/dissimilarity
- Maximize between group distance and minimize within group distance
- Interpret resulted clusters
Selection of variables
Your variables:
- are meaningful for the analysis objective
- are (relatively) independent
- are limited in number
- are numerical (one hot encoding for categorical)
- have low kurtosis and skewness statistics (at least in the training set)
One-hot encoding
library(data.table)
library(mltools)
customers <- data.frame(
id=c(1, 2, 3, 4),
gender=c('M', 'M', 'M', 'F'),
mood=c('happy’, 'sad', 'happy’, 'sad')
)
customers <- one_hot(as.data.table(customers))
customersMeasure of distance
- Manhattan, Euclidian, Minkowski distance:
- Cosine distance:
- Jaccard distance:
Clustering types and algorithms
- Hierarchical clustering (AGNES, DIANA)
- Partition-based clustering (k-Means, k-Medoids)
- Mean-shift clustering
- Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
- Expectation–Maximization (EM) Clustering
Example: Tourist Risk Taking
The data set contains 563 respondents who state how often they take risks from the following six categories:
- recreational risks: e.g., rock-climbing, scuba diving
- health risks: e.g., smoking, poor diet, high alcohol consumption
- career risks: e.g., quitting a job without another to go to
- financial risks: e.g., gambling, risky investments
- safety risks: e.g., speeding
- social risks: e.g., standing for election, publicly challenging a rule or decision
Respondents are presented with an ordinal scale consisting of five answer options [NEVER(1), RARELY(2), QUITE OFTEN(3), OFTEN(4), VERY OFTEN(5)].
Hierarchical clustering
risk <- read_csv("risk.csv")
dim(risk)
head(risk)
colMeans(risk)
risk.dist <- dist(risk, method = "manhattan")
risk.hcl <- hclust(risk.dist, method = "complete")
risk.hcl
plot(risk.hcl, main = "", labels = FALSE)
c2 <- cutree(risk.hcl, h = 20)
table(c2)
c6 <- cutree(risk.hcl, k = 6)
table(c6)
c2.means <- aggregate(risk, list(Cluster = c2), mean)
round(c2.means[, -1], 1)
c6.means <- aggregate(risk, list(Cluster = c6), mean)
round(c6.means[, -1], 1)DBScan
library(dbscan)
library(readr)
library("factoextra")
risk <- read_csv("risk.csv")
res.db <- dbscan(risk, eps= 1.5, minPts = 5)
res.db
fviz_cluster(res.db, risk, geom = "point")EMCluster
library(EMCluster)
emobj <- simple.init(risk, nclass = 6)
risk.em <- emcluster(risk, emobj, assign.class = TRUE)
par(mfrow = c(1, 1))
plotem(risk.em, risk)
summary(risk.em)Cluster Plot
library(cluster)
clusplot(risk, risk.em$class, color=TRUE,
shade=TRUE, labels=2, lines=0)
em.means <- aggregate(risk,
list(Cluster = risk.em$class),
mean)
round(em.means[, -1], 1)Summary
- Segmentation – to do or not to do?
- Methods for segmentation: business rules vs clustering
- Variables to consider
- Distance measures
- One-hot encoding
- Clustering algorithms
- Interpretation of clusters
