FASTA

Searching Biological Sequence Data for Regions of Similarity

Introduction

FASTA is a set of bioinformatics programs available on the RCC Systems at FSU. The programs are designed to take in biological sequence data consisting of either DNA or Protein sequences and then search through them to find regions of similarity. The programs can find both locally similar regions or globally similar regions.  RCC also has a parallel version available which uses MPI.

Using FASTA on RCC Resources

There are a number of programs included in the FASTA software package. No specific module needs to be loaded to run these programs unless you want to run it in parallel in which case one of the available MPI implementations must be loaded such as GNU OpenMPI. Refer to the official documentation for a complete list of the programs included in the FASTA package. These programs include fasta36 which does sequence comparison. This can be run on HPC as follows.

fasta36 -OPTIONS QUERY.fa LIBRARY.fa

In order to run this in parallel on RCC systems, you can either do a call to mpirun or submit it as a job script. A sample job script for the fasta36 program would look like the following:

#!/bin/bash
#SBATCH --job-name=FASTA_Test
#SBATCH --mail-type=ALL
#SBATCH -n 4
#SBATCH -p genacc_q
#SBATCH -t 00-04:00:00
#SBATCH --mem-per-cpu=3900M

module load gnu-openmpi

mpirun -np 4 fasta36 -OPTIONS QUERY.fa LIBRARY.fa 

Note that the above examples can be applied to any of the other FASTA programs. For detailed usage information specific to each of the programs and a more detailed idea of what each program is designed to do refer to the official website.