High quality clustering of big data and solving empty-clustering problem with an evolutionary hybrid algorithm
Özbayoğlu, Ahmet Murat
MetadataShow full item record
Achieving high quality clustering is one of the most well-known problems in data mining. k-means is by far the most commonly used clustering algorithm. It converges fairly quickly, but achieving a good solution is not guaranteed. The clustering quality is highly dependent on the selection of the initial centroid selections. Moreover, when the number of clusters increases, it starts to suffer from "empty clustering". The motivation in this study is two-fold. We not only aim at improving the k-means clustering quality, but at the same time not being effected by the empty cluster issue. For achieving this purpose, we developed a hybrid model, H(EC)S-2, Hybrid Evolutionary Clustering with Empty Clustering Solution. Firstly, it selects representative points to eliminate Empty Clustering problem. Then, the hybrid algorithm uses only these points during centroid selection. The proposed model combines Fireworks and Cuckoo-search based evolutionary algorithm with some centroid-calculation heuristics. The model is implemented using a Hadoop Mapreduce algorithm for achieving scalability when faced with a Big Data clustering problem. The advantages of the developed model is particularly attractive when the amount, dimensionality and number of cluster parameters tend to increase. The results indicate that considerable clustering quality performance improvement is achieved using the proposed model.