ELPH

Motif Finding by Gibbs Sampling in DNA and Protein Sequences

Introduction

ELPH is one of the bioinformatics programs available at RCC. It is designed to perform Gibbs Sampling on DNA and protein sequence data in order to find patterns and motifs in the sequences. The program can handle as many as thousands of sequences at a time.

Using ELPH on RCC Resources

Running ELPH in Serial on RCC Resources

ELPH does not require a module to be loaded in order to run on HPC login nodes or Spear. In order to run ELPH from the command-line, simply type elph [FILES] -[OPTIONS]. A list of options can be found either by typing elph into the command line with no arguments or on the user manual for ELPH. As a quick example, a very simple run with a test file in FASTA format could be run like this:

elph TEST.fa LEN=10 -o OUTFILE.txt

Running ELPH in Parallel on RCC Resources

ELPH can also be run in parallel with the module load gnu-openmpi command for the GNU OpenMPI module. This can then be submitted via a slurm script to HPC. As an example, this might be:

#! /bin/bash

#SBATCH -J ELPH_TEST
#SBATCH -n 4
#SBATCH -p genacc_q
#SBATCH -t 00:10:00
#SBATCH --mail-type=ALL

module load gnu-openmpi
elph TEST.fa LEN=10 -o OUTFILE.txt