MAFFT

A Program for Aligning Multiple Sets of Genetic Sequence Data

Introduction

MAFFT is a powerful bioinformatics tool designed to take in multiple sets of genetic sequence data and align them. The program provides several different algorithms for doing this some of which are better suited to smaller sequence reads (such as L-INS-i) and some of which are better suited to larger sequence reads (such as FFT-NS-2).

Using MAFFT on RCC Resources

Running MAFFT in Serial on HPC Login Nodes and Spear

MAFFT does not require a module to be loaded in order to run on HPC login nodes and Spear. In order to begin running MAFFT, simply type mafft -[OPTS] INPUT > OUTPUT where -[OPTS] is a list of command line options you wish to run your job with and INPUT > OUTPUT are the required input and output files. For detailed usage documentation, see the main website. MAFFT also contains a number of other related programs including linsi, ginsi and mafft-profile. Detailed information on these can be found at the manual page. As a short example, if you have a FASTA formatted file of genetic sequence data, you could align it and output it using:

mafft TEST.fa > OUTPUT

Running MAFFT in Parallel on RCC Resouces

If you wish to run MAFFT in parallel on the RCC machines, you will need to load the GNU OpenMPI module using the command: module load gnu-openmpi. This will give you access to the mpirun command. An example of a run using SLURM could be as follows using TEST.fa FASTA data file and outputting to OUTPUT.

#! /bin/bash

#SBATCH -J MAFFT_Test
#SBATCH -p genacc_q
#SBATCH -n 4
#SBATCH -t 00:10:00
#SBATCH --mail-type=ALL

module load gnu-openmpi
mpirun -np 4 mafft TEST.fa > OUTPUT

For additional information on usage and for example files to test MAFFT with, please refer to the main website.