K-means Clustering: Sorting Data Like Magic!

YASH GUPTA
2 min readJul 22, 2023

--

K-means clustering is an awesome way to group similar data points together automatically. Imagine you have a bunch of data points, and you want to find cool patterns and put them into “clusters.” That’s where K-means steps in!

K-means: The Game Plan 🏀

1. Setting Up: First, we randomly pick some cluster centers and call them centroids. These centroids will guide our data sorting adventure.

2. Guess & Refine: Now, we play a guessing game! We assign each data point to the nearest centroid, creating clusters based on their closeness. After that, we recalculate new centroids based on the data points in each cluster.

3. Loop ’til Perfect: We keep playing the guessing game and recalculating centroids until they stop changing much. This is the magic moment when K-means is done sorting!

Let’s Get Coding! 🖥️

We’ll use Python to write some easy-peasy functions for K-means. First, we’ll find the closest centroid for each data point:

import numpy as np

def find_closest_centroid(X, centroids):
m = X.shape[0]
K = centroids.shape[0]
idx = np.zeros(m, dtype=int)

for i in range(m):
distances = np.linalg.norm(X[i] - centroids, axis=1)
idx[i] = np.argmin(distances)

return idx

Next, we’ll compute new centroids based on the mean of data points in each cluster:

def compute_centroids(X, idx, K):
n = X.shape[1]
centroids = np.zeros((K, n))

for k in range(K):
k_indices = np.where(idx == k)
centroids[k] = np.mean(X[k_indices], axis=0)

return centroids

Fine-Tuning the Magic

Sometimes, the initial centroids can play tricks on us, leading to not-so-great results. So, we can run K-means multiple times with different starting points and pick the one that sorts best!

In a Nutshell: K-means Clustering

K-means is your secret weapon to reveal amazing patterns in data. By understanding the steps and playing around with your code, you’ll become a sorting genius!

Time to Get Sorted! ✨

By: Yash Gupta

References:
[1] K-means Clustering — Wikipedia.
[2] Andrew Ng’s Machine Learning Course — Coursera.

--

--

YASH GUPTA
YASH GUPTA

No responses yet