All About Programming: Clustering - MLlib - Spark 1.1.0 Documentation

Clustering - MLlib - Spark 1.1.0 Documentation

MLlib supports k-means clustering, one of the most commonly used clustering algorithms that clusters the data points into predefined number of clusters. The MLlib implementation includes a parallelized variant of the k-means++ method called kmeans|| . The implementation in MLlib has the following parameters: k is the number of desired clusters. maxIterations is the maximum number of iterations to run. initializationMode specifies either random initialization or initialization via k-means||. runs is the number of times to run the k-means algorithm (k-means is not guaranteed to find a globally optimal solution, and when run multiple times on a given dataset, the algorithm returns the best clustering result). initializationSteps determines the number of steps in the k-means|| algorithm. epsilon determines the distance threshold within which we consider k-means to have converged. Examples spark-shell . In the following example after loading and parsing data, we use the import org.apache.

Read full article from Clustering - MLlib - Spark 1.1.0 Documentation

Clustering - MLlib - Spark 1.1.0 Documentation

No comments:

Post a Comment

Labels

Popular Posts