Which clustering algorithm starts with each data example in its own cluster?

Get ready for the CertNexus Certified Data Science Practitioner Test. Practice with flashcards and multiple choice questions, each question has hints and explanations. Excel in your exam!

The correct answer is hierarchical agglomerative clustering (HAC). This clustering algorithm begins by treating each individual data point as a separate cluster. As the algorithm progresses, it merges these clusters based on a specified linkage criterion, gradually reducing the total number of clusters until the desired number is achieved or until all points are merged into a single cluster.

This approach is particularly useful when the structure of the data is hierarchical in nature, allowing for a detailed exploration of how clusters are formed at various levels of granularity. The initial state of having each data point in its own cluster enables HAC to build a dendrogram, which visually represents the merging of clusters and can help to reveal the data's intrinsic structure.

K-means clustering, by contrast, starts with predefined cluster centroids and assigns data points to the nearest centroid, which is a fundamentally different approach. DBSCAN forms clusters based on density and does not require a predetermined number of clusters, while mean shift finds modes in the data distribution rather than starting with individual points as clusters. Each of these algorithms has distinct methodologies and objectives that set them apart from hierarchical agglomerative clustering.

Subscribe

Get the latest from Examzify

You can unsubscribe at any time. Read our privacy policy