parallel-kmeans

A Parallel Implementation of the K-Means Clustering Algorithm

Introduction

This package is a set of C programs designed to perform K-Means clustering in parallel. The program supports OpenMP shared-memory parallel systems as well as MPI distributed-memory parallel systems.

Using parallel-kmeans on RCC Resources

The parallel-kmeans program comes in three varieties. These include a multicore parallel version which uses OpenMP for its parallelization, a distributed parallel version which uses OpenMPI and a sequential version. These can be accessed from the HPC system using the following commands:

module load gnu-openmpi

# For OpenMP Parallel Version
omp_main OPTIONS -i INFILE -n N_CLUSTERS

#For MPI Parallel Version
mpi_main OPTIONS -i INFILE -n N_CLUSTERS

# For Sequential Version
seq_main OPTIONS -i INFILE -n N_CLUSTERS

A complete list of the options available can be found on the main website here or by typing one of the above commands with the -h option and no INFILE or N_CLUSTERS.

Running Parallel K-Means through SLURM

It is possible to submit a job for parallel-kmeans using the SLURM submission system. This can be done using the following code:

#!/bin/bash

#SBATCH --job-name="ParallelKmeans"
#SBATCH --mail-type=ALL
#SBATCH -n 4
#SBATCH -p genacc_q
#SBATCH -t 14-00:00:00
#SBATCH --mem-per-cpu=3900M

module load gnu-openmpi

## For the MPI Version
mpirun -np 4 mpi_main OPTIONS -i INFILE -n N_CLUSTERS

# For OpenMP Parallel Version
omp_main OPTIONS -i INFILE -n N_CLUSTERS