08. GLOBEM Analysis
INS-W 1
Step 1: The merging data of (Step&Sleep&engagement)
Loading Steps, Sleep and Engagement. using pid and date as the pid to merge the two dataset (columns has been shorten). Just repeat engagement and add to the merged data as the final dataset (details like the picture below).

Info
Variables selection based on the dataset from Ken.
Step 2: Handling Missing Value
According to the result from Step 1, existing the missing value. Here using two approach to handle the missing values. Because of stdsumsteps columns is totally empty, it will be deleted during the following steps.
1. Imputing the missing value

2. Drop Missing value

Step 3: Clustering
Scores for Multiple Models
Using three score to evaluate the performance for each clustering model.


The three scores serve as metrics to assess clustering performance, with higher values desired for the Silhouette Score and lower values preferred for the other two scores. However, minimal changes are observed across the spectral clustering, hierarchical, and k-means models, indicating insufficient clustering capability. Notably, the Silhouette Score drops significantly for the Fuzzy C-Means model, suggesting an inability to effectively partition the dataset. Consequently, the Gaussian Mixed Model (GMM) is chosen as the final clustering model.
Scores for GMM


Based on the two scores, clustering models with 5, 6, and 7 clusters emerge as potential candidates.
Clustering Selection






The means for each clustering indicate that only steps and engagement are pivotal factors in distinguishing each cluster.
