Breaking News: Grepper is joining You.com. Read the official announcement!

k-means clustering

Add Answer

Innocent Iguana answered on June 11, 2022 Popularity 10/10 Helpfulness 2/10

answer k-means clustering

related k-means clustering

k-means clustering

Comment

Tip Innocent Iguana 1 GREPCC

- Partition-based clustering that produces sphere-like clusters
- Divides data into non-overlapping subsets with distance metric
- Examples in a clusters share similar pattern
	(Can be found by minimizing intra-cluster distances)
- Examples in different clusters are different
	(Can be found by maximizing inter-cluster distances)
- see : https://dribbble.com/shots/20642101-k-means-clustering-objective

Process:
1. Randomly select 3 points/centroids for clustering
2. Calculate distance of each datapoint from the centroids. The datapoint
	belongs to the centroid closest to it.
3. Calculate SSE. The goal is to reduce the SSE (sum of squared errors)
4. Update centroids by estimating the mean of each cluster
5. Iterate from step 2 unless the value converges for centroids (centroids no
	longer move much)

Issue: Result may or may not be the best outcome (may converge to local optima)
Resolve: Run the algorithm multiple times with different random starting points

Evaluation metric:
	1. Compare with ground truth (If available. Normally no ground truth exists)
    2. Cluster error  : (How losely data form a cluster)
    	a. Average distance between data points 
        b. Average distance between centroid and datapoints
Choosing best K: (The elbow method) 
	- run the clustering algorithm for different values of k and choose the k
    	that gives the best accuracy and forms an elbow point in visualization.

xxxxxxxxxx

- Partition-based clustering that produces sphere-like clusters

- Divides data into non-overlapping subsets with distance metric

- Examples in a clusters share similar pattern

    (Can be found by minimizing intra-cluster distances)

- Examples in different clusters are different

    (Can be found by maximizing inter-cluster distances)

- see : https://dribbble.com/shots/20642101-k-means-clustering-objective

Process:

1. Randomly select 3 points/centroids for clustering

2. Calculate distance of each datapoint from the centroids. The datapoint

    belongs to the centroid closest to it.

3. Calculate SSE. The goal is to reduce the SSE (sum of squared errors)

4. Update centroids by estimating the mean of each cluster

5. Iterate from step 2 unless the value converges for centroids (centroids no

    longer move much)

Issue: Result may or may not be the best outcome (may converge to local optima)

Resolve: Run the algorithm multiple times with different random starting points

Evaluation metric:

    1. Compare with ground truth (If available. Normally no ground truth exists)

    2. Cluster error  : (How losely data form a cluster)

        a. Average distance between data points

        b. Average distance between centroid and datapoints

Choosing best K: (The elbow method)

    - run the clustering algorithm for different values of k and choose the k

        that gives the best accuracy and forms an elbow point in visualization.

Popularity 10/10 Helpfulness 2/10 Language whatever

Source: Grepper

Tags: k-means whatever

Link to this answer
Share Copy Link

Contributed on Feb 13 2023

Innocent Iguana

0 Answers Avg Quality 2/10

Closely Related Answers

k-means clustering

Comment

Tip Josh.ipynb 1 GREPCC

# Import the kmeans and vq functions
from scipy.cluster.vq import kmeans, vq

# Set up a random seed in numpy
random.seed([1000,2000])

# Fit the data into a k-means algorithm
cluster_centers,distortion = kmeans(fifa[['scaled_def', 'scaled_phy']], 3)

# Assign cluster labels
fifa['cluster_labels'],distortion_list = vq(fifa[['scaled_def', 'scaled_phy']], cluster_centers)

# Display cluster centers 
print(fifa[['scaled_def', 'scaled_phy', 'cluster_labels']].groupby('cluster_labels').mean())

# Create a scatter plot through seaborn
sns.scatterplot(x='scaled_def', y='scaled_phy', hue='cluster_labels', data=fifa)
plt.show()

xxxxxxxxxx

# Import the kmeans and vq functions

from scipy.cluster.vq import kmeans, vq

# Set up a random seed in numpy

random.seed([1000,2000])

# Fit the data into a k-means algorithm

cluster_centers,distortion = kmeans(fifa[['scaled_def', 'scaled_phy']], 3)

# Assign cluster labels

fifa['cluster_labels'],distortion_list = vq(fifa[['scaled_def', 'scaled_phy']], cluster_centers)

# Display cluster centers

print(fifa[['scaled_def', 'scaled_phy', 'cluster_labels']].groupby('cluster_labels').mean())

# Create a scatter plot through seaborn

sns.scatterplot(x='scaled_def', y='scaled_phy', hue='cluster_labels', data=fifa)

plt.show()

Popularity 10/10 Helpfulness 1/10 Language python

Source: campus.datacamp.com

Tags: k-means

Link to this answer
Share Copy Link

Contributed on Jun 11 2022

josh.ipynb

0 Answers Avg Quality 2/10

k-means clustering

Contents

More Related Answers

k-means clustering

Closely Related Answers

k-means clustering

Grepper

Documentation

Social

Legal

Contact

Oops, You will need to install Grepper and log-in to perform this action.