Machine Learning Part 3: Unsupervised Learning & Dimensionality Reduction

2019-06

Source Code: https://github.com/rylew2/Machine-Learning-Assignments

1️⃣ Part 1: Supervised Learning & Neural Networks

3️⃣ Part 3: Unsupervised Learning & Dimensionality Reduction

4️⃣ (Coming soon)

Introduction

Unlike supervised learning—where models learn from labeled examples—unsupervised learning aims to uncover patterns in data without any ground-truth labels. The goal is to find natural groupings, meaningful structure, or lower-dimensional representations hidden inside high-dimensional datasets.

In this part of the Machine Learning series, I focused on two major unsupervised learning themes:

🧩 1. Clustering Algorithms

Clustering groups similar data points based on their position in feature space. The project evaluates two core methods:

K-Means — partitions points into (k) compact clusters by minimizing distance to cluster centroids.
Gaussian Mixture Models (GMM) — a probabilistic approach that models the data as a mixture of Gaussians, allowing soft membership and more flexible cluster shapes.

These methods help answer:

Do natural subgroups exist in this dataset?
How well separated—or overlapping—are the clusters?

🔻 2. Dimensionality Reduction

Dimensionality reduction (DR) transforms the data into a smaller set of informative features, often revealing structure that clustering alone can’t capture. The techniques explored include:

PCA — captures directions of highest variance
ICA — separates independent latent signals
Random Projection — compresses features while preserving distances
Random Forest feature selection — keeps only the most informative features

These DR methods are then combined with clustering—and later, a neural network—to evaluate how reduced representations affect performance.

Two datasets were used:

🍷 Wine Quality — chemical properties predicting quality
🐚 Abalone — physical measurements predicting age

Part 1: Baseline Clustering Performance

Before applying dimensionality reduction, I evaluated how K-Means and GMM perform on the raw datasets.

A consistent pattern emerged: As the number of clusters increases, the silhouette score — which measures how well a point fits within its own cluster compared to the nearest other cluster — tends to drop, since clusters naturally become more compressed.

Abalone – Baseline Silhouette Scores

Abalone Silhouette Baseline

Part 2: Dimensionality Reduction

Each dimensionality reduction method reshapes the feature space differently:

PCA rotates the data to maximize variance retention
ICA attempts to separate independent signals
Random Projection (RP) compresses dimensionality with JL guarantees
Random Forest (RF) selects features based on importance (Gini)

The goal was to see whether these transforms help clustering uncover more meaningful patterns, or reduce noise that might blur cluster boundaries.

📉 t-SNE Visualization (Exploratory)

Although not used directly for clustering or neural networks, t-SNE (t-Distributed Stochastic Neighbor Embedding) is a nonlinear technique for visualizing high-dimensional structure in 2D. It preserves local relationships, making it useful for checking whether a dataset has visually separable clusters before performing formal analysis.

Example – Abalone t-SNE 2D Plot

Abalone TSNE

t-SNE reveals why Abalone is difficult to cluster: the classes heavily overlap, even under a nonlinear embedding. This visual intuition aligns with the modest improvements seen from PCA, RP, and RF later in the analysis.

Part 3: Clustering on Reduced Dimensions

Wine – Silhouette Score Comparison

Random Projection surprisingly produces the most distinct clusters early on, likely because it breaks noisy feature correlations.

Wine Silhouette

Abalone – Silhouette Score Comparison

Abalone is noisier and lower-dimensional, so improvement is more modest, but ICA consistently underperforms.

Abalone Silhouette

Wine – K-Means Cluster Label Accuracy

Random Forest feature selection slightly outperforms other DR methods, suggesting that filtering noisy features helps more than rotating or projecting them.

Wine KMeans Accuracy

Part 4: GMM Performance on Reduced Data

Gaussian Mixture Models capture cluster shape better than K-Means, especially where clusters overlap — like in the Wine dataset.

Wine – GMM Label Accuracy

RF again yields the strongest or near-strongest accuracy across cluster sizes.

Wine GMM Accuracy

Part 5: Abalone – DR Performance Highlights

While Wine benefits noticeably from DR, Abalone is lower-dimensional and noisier, meaning differences across DR methods were smaller.

The one clear takeaway: ICA consistently performs the worst, both in silhouette and label accuracy.

Abalone – Silhouette Comparison With DR

Visible struggle from ICA compared to PCA, RP, or RF.

Abalone DR Silhouette

Part 6: Neural Network Performance on DR + Cluster Features

I trained a small neural network using:

Original features
PCA-reduced features
ICA-reduced features
RP-reduced features
RF-selected features
Cluster assignments as features (K-Means labels)

Two key observations:

RF-selected features almost always gave the best NN test accuracy. Removing noisy or irrelevant inputs proved more effective than transforming all features.
Cluster labels as features yielded the fastest runtime but weaker accuracy, since cluster IDs alone discard too much information.

Abalone – NN Performance on DR Feature Sets

Abalone NN Performance

Abalone – NN Mean Fit Time

Abalone NN Fit Time

Conclusion

Across both datasets, dimensionality reduction had mixed but insightful effects:

🔹 Wine Dataset 🍷

RF feature selection provided the strongest results for both K-Means and GMM.
Random Projection produced the highest silhouette scores early on.
Dimensionality reduction clearly improved cluster stability and separability.

🔹 Abalone Dataset 🐚

Much less improvement across techniques — the dataset is noisy and nearly low-dimensional already.
ICA performed the worst across all metrics.
RF and PCA showed small but consistent gains.

🔹 Neural Networks on Reduced Data

RF-selected features delivered the best test accuracy.
Cluster labels were extremely fast but cost accuracy.

🧠 Final Takeaway

This part of the project reinforced the No Free Lunch Theorem: No single dimensionality reduction method is universally best.

Performance depends heavily on dataset structure:

🍷 Wine’s correlated chemical signals → DR reveals clearer structure
🐚 Abalone’s noisy, compact feature space → limited benefit from DR