top of page

Marketing Segmentation + Machine Learning

Objective
 

Use machine learning technics including hierarchical clustering, K-means clustering, Naive-Bayes clustering and Random Forest clustering to predict segment membership and predict who is likely to subscribe to the cable service. 

 
Dataset Summary

​

Simulate customer segmentation data for a cable company so that 300 customers fall into 4 segments: "Suburb Mix", "Urban hip", "Travelers", "Moving-Up". Those segments are defined based on the following features:

  • Age

  • Gender

  • Income

  • Kids Count

  • Home Owner or Renter

  • Subscribed to the cable or not.

​

Content

​

  1. Descriptive Data Analysis

    • Using command such as summary(), aggregate(), xtabs() to arrange and reshape dataset into the form desired for straightforward description.

  2. Discreet Data Visualization

    • In histogram and bar chart

  3. Continuous Data Visualization

    • In bar chart, boxplot and box-and-whiskers

  4. Statistical Test

    • Chi-square test​

    • T-test: Testing Group Means

    • ANOVA: Testing Multiple Group Means

    • Bayes Statistics

  5. Predict Membership and Segment Using:

    • Hierarchy Clustering​

    • K-Means Clustering

    • Naive-Bayes Clustering

    • Random Forest Clustering

​

Tools

​

R + RStudio

​

Library

 

library(cluster)

library(ggplot2)

library(factoextra)

library(mclust)

library(scatterplot3d)

library(MASS)

library(poLCA)

library(gplots)

library(e1071)

library(RColorBrewer)

library(psych)

​

Descriptive Data Analysis
Statistical Test
Hierarchical Clustering & K-Means Clustering
Naive Bayes
Random Forest

Crystal Wang @ 2017

bottom of page