Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I need to cluster data using the
Fuzzy C-Means
. So, I use
fcm
from
pyclustering.cluster.fcm
. So, I would like to know if there is a way to get the labels.
import numpy as np
import pandas as pd
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.cluster.fcm import fcm
import random
coords = [(random.random()*2.0, random.random()*2.0) for _ in range(100)]
dfcluster = pd.DataFrame(coords, columns = ['x','y'])
sample = dfcluster.to_numpy()
# initialize
initial_centers = kmeans_plusplus_initializer(sample, 5, kmeans_plusplus_initializer.FARTHEST_CENTER_CANDIDATE).initialize()
# create instance of Fuzzy C-Means algorithm
fcm_instance = fcm(sample, initial_centers)
# run cluster analysis and obtain results
fcm_instance.process()
clusters = fcm_instance.get_clusters()
print(clusters)
I have tried it this way, and it works, but I do not think that it is a perfect answer
import pandas as pd
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.cluster.fcm import fcm
import random
coords = [(random.random()*2.0, random.random()*2.0) for _ in range(100)]
dfcluster = pd.DataFrame(coords, columns = ['x','y'])
sample = dfcluster.to_numpy()
# initialize
initial_centers = kmeans_plusplus_initializer(sample, 5, kmeans_plusplus_initializer.FARTHEST_CENTER_CANDIDATE).initialize()
# create instance of Fuzzy C-Means algorithm
fcm_instance = fcm(sample, initial_centers)
# run cluster analysis and obtain results
fcm_instance.process()
clusters = fcm_instance.get_clusters()
cluster=0
dfclusternew = pd.DataFrame(columns = ['cluster','x', 'y'])
for index, i in enumerate(clusters):
for j in i:
dfclusternew = dfclusternew.append(
pd.Series([cluster, dfcluster['x'].iloc[j], dfcluster['y'].iloc[j]], index=['cluster', 'x', 'y']),
ignore_index=True)
cluster += 1
dfcluster =dfclusternew
print(dfcluster)
However, I think I have another way to do that, and it is faster. As the result is the index in every cluster. So, I used loc[df.index[results[i]]
import pandas as pd
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.cluster.fcm import fcm
import random
coords = [(random.random()*2.0, random.random()*2.0) for _ in range(100)]
dfcluster = pd.DataFrame(coords, columns = ['x','y'])
dfcluster['cluster'] = 0
sample = dfcluster.to_numpy()
# initialize
initial_centers = kmeans_plusplus_initializer(sample, 5, kmeans_plusplus_initializer.FARTHEST_CENTER_CANDIDATE).initialize()
# create instance of Fuzzy C-Means algorithm
fcm_instance = fcm(sample, initial_centers)
# run cluster analysis and obtain results
fcm_instance.process()
dfcluster.reset_index()
results=fcm_instance.get_clusters()
for i in range(len(results)):
dfcluster.loc[dfcluster.index[results[i]], 'cluster'] = i
print(dfcluster)
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.