ABySS

Assembly By Short Sequences - a de novo, parallel, paired-end sequence assembler

Introduction

ABySS is a Bioinformatics program designed to assemble genomes from small paired-end sequence reads. It can be run either in serial or in parallel, though the parallel version is capable of efficiently assembling larger genomes than the serial one is.

Running ABySS on RCC Resources

ABySS can be run in HPC serially as well as in parallel. The abyss module needs to be loaded before running ABySS.

Serially running ABySS on Spear

Download and assemble a small synthetic data set.

module load abyss
abyss-pe k=25 name=test se=https://raw.github.com/dzerbino/velvet/master/data/test_reads.fa

Calculate assembly contiguity statistics

abyss-fac test-unitigs.fa

To assemble paired reads in two files named test-1.fa and test-3.fa into contigs in a file named test-contigs.fa, run the command:

abyss-pe name=test k=64 in='test-1.fa test-3.fa'

Further details about the commands can be found in the ABySS documentation.

Parallaly Running ABySS on HPC

Following SLURM submit script can be used as a template to submit a parallel ABySS job in HPC.

#!/bin/bash
#
# Name your job
#SBATCH -J abyss
#
#Change the queue
#SBATCH -p genacc_q
#
#Change the number of nodes and processes per node as necessary
#SBATCH -N 2
#SBATCH --ntasks-per-node=4
#
#Change the wall time
#SBATCH -t 00:30:00
#
module load abyss
#
#Run your ABySS commands
abyss-pe name=test k=48 n=8 in='test-1.fa test-3.fa'