MAFFT
Introduction
MAFFT is a powerful bioinformatics tool designed to take in multiple sets of genetic sequence data and align them. The program provides several different algorithms for doing this some of which are better suited to smaller sequence reads (such as L-INS-i) and some of which are better suited to larger sequence reads (such as FFT-NS-2).
Using MAFFT on RCC Resources
Running MAFFT in Serial on HPC Login Nodes and Spear
MAFFT does not require a module to be loaded in order to run on HPC login nodes and Spear. In order to begin running MAFFT, simply type mafft -[OPTS] INPUT > OUTPUT
where -[OPTS]
is a list of command line options you wish to run your job with and INPUT > OUTPUT
are the required input and output files. For detailed usage documentation, see the main website. MAFFT also contains a number of other related programs including linsi, ginsi and mafft-profile. Detailed information on these can be found at the manual page. As a short example, if you have a FASTA formatted file of genetic sequence data, you could align it and output it using:
mafft TEST.fa > OUTPUT
Running MAFFT in Parallel on RCC Resouces
If you wish to run MAFFT in parallel on the RCC machines, you will need to load the GNU OpenMPI module using the command: module load gnu-openmpi
. This will give you access to the mpirun
command. An example of a run using SLURM could be as follows using TEST.fa FASTA data file and outputting to OUTPUT.
#! /bin/bash
#SBATCH -J MAFFT_Test
#SBATCH -p genacc_q
#SBATCH -n 4
#SBATCH -t 00:10:00
#SBATCH --mail-type=ALL
module load gnu-openmpi
mpirun -np 4 mafft TEST.fa > OUTPUT
For additional information on usage and for example files to test MAFFT with, please refer to the main website.