CUSHAW - fast short read alignment for CUDA-enabled GPUs

Introduction

CUSHAW is a well-established leading next-generation sequencing read alignment software package based on multi-core and many-core computing. CUSHAW 1.0, the first release of CUSHAW software package for next-generation sequencing read alignment, is a CUDA compatible short read alignment algorithm for multiple GPUs sharing a single host.This aligner is designed based on the Burrows-Wheeler transform (BWT) and programmed using CUDA C++ parallel programming language. Performance evaluation, using simulated as well as real short read datasets, reveals that our aligner achieves significant speedups in terms of execution time, while yielding comparable or even better alignment quality for paired-end alignments, compared to three popular BWT-based aligners: Bowtie, BWA and SOAP2. This aligner only provides support for ungapped alignment and has been incorporated to NVIDIA Tesla Bio Workbench. This algorithm is presented in the paper "CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform".

Downloads

Latest source code (release 1.0.40)

more details about the changes in this version are availabe at changelog.
Simulated reads from the human genome
Ready-made BWT indexes of genomes

Citation

Yongchao Liu, Bertil Schmidt, and Douglas L. Maskell: " CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform". Bioinformatics, 2012, 28(14): 1830-1837

Other related papers

Yongchao Liu, Bertil Schmidt, and Douglas L. Maskell: "A fast CUDA compatible short read aligner to large genomes". GPU Technology Conference 2012 (GTC 2012), San Jose, USA, 2012
Yongchao Liu and Bertil Schmidt: "Long read alignment based on maximal exact match seeds". Bioinformatics, 2012, 28(18): i318-324
Yongchao Liu and Bertil Schmidt: "CUSHAW2-GPU: empowering faster gapped short-read alignment using GPU computing". IEEE Design & Test, 2014, 31(1): 31-39
Yongchao Liu, Bernt Popp, and Bertil Schmidt: "CUSHAW3: sensitive and accurate base-space and color-space short-read alignment with hybrid seeding." PLOS ONE, 2014, 9(1):e86869
Yongchao Liu and Bertil Schmidt: "CUSHAW Software Package: harnessing CUDA-enabled GPUs for next generation sequencing read alignment". GPU Technology Conference 2014 (GTC 2014), San Jose, USA, 2014
Jorge González-Domínguez, Yongchao Liu, Bertil Schmidt: "Parallel and scalable short-read alignment on multi-core clusters using UPC++". PLoS One, 2016, 11(1): e0145490.
Yongchao Liu, Thomas Hankeln, and Bertil Schmidt: "Parallel and space-efficient construction of Burrows-Wheeler transform and suffix array for big genome data". IEEE Transactions on Computational Biology and Bioinformatics, 2016, 13(3): 592-598.
Yongchao Liu and Bertil Schmidt: "CUSHAW Suite: parallel and efficient algorithms for NGS read alignment". Algorithms for Next-Generations Sequencing Data: Techniques, Approaches and Applications, edited by Mourad Elloumi, Springer, 2017.
Yuandong Chan, Kai Xu, Haidong Lan, Weiguo Liu, Yongchao Liu and Bertil Schmidt: "PUNAS: a parallel ungapped-alignment-featured seed verification for next-generation sequencing read alignment", 31st IEEE International Parallel and Distributed Processing Symposium (IPDPS 2017), 2017, pp. 52-61.

Parameters

Input:

-fasta file1 [file2] (input sequences in FASTA format)
-fastaPaired file1 file2 (input sequences in FASTA format)
-fastq file1 [file2] (input sequences in FASTQ format)
-fastqPaired file1 file2 (input sequences in FASTQ format)
-trim3 / -trim5 <int> (the number of trimmed bases from the 3'/5' end)

Output:

-all_in_sam (output both aligned and unaligned reads to file "Aligns.sam" in SAM format)
-aln / -unaln file <string> (output file for aligned / unaligned reads)

Align:

-s <int> (seed size, default 32 and 0 disabling seeding)
-i <int> (maximum insert size for paired-end mapping, default 300 )
-mms <int> (maximal number of mismatches in the seed, default 2 )
-mmr <int> (maximal number of mismatches in the full length, default 7 )
-qss <int> (quality score sum limit at mismatched positions in the seed, default 70 )
-qsr <int> (quality score sum limit at mismatched positions in the full length, default 210 )
-e <float> (uniform base error rate, default 0.04 )
-b <int> (give the top #int alignments, default 1 )
-disable_sw (disabling the use of the Smith-Waterman for the paired-end alignment)
-nofw / -norc (do not align forward / reverse strand)

Compute:

-g <int> (the number of GPUs used, default 1)
-gi <int> (the index of the selected single GPU (default 0), effective when #g=1
-t <int> (number of threads for post-alignment process, default 4)

Others:

-v (print out the software version)
-h | -? (print out the software usage)

Installation and Usage

Preparation

Install CUDA 3.x or higher toolkits and SDK.
NVIDIA CUDA-enabled GPUs based on the Fermi architecture or newer.
Turn OFF ECC using the command "nvidia-smi -e 0".

Download

The sub-directory "bwt_index" contains the source code for the BWT construction program. This program is taken from the open-source software BWA, since the BWT construction has been extensively researched to reduce the working space and increase the execution speed in different literatures.
The sub-directory "cushaw" constains the source code for the short read aligner.

Compilation

Go to each directory (i.e. either bwt_index or cushaw) and simply type "make" to compile them.
In the sub-directory "bwt_index", an executable binary bwt_index will be created.
In the sub-directory "cushaw", two executable binaries "cushaw" and "cushaw-long" will be created. "cushaw" binary gives better alignment quality with fast speed for short reads of lengths < 70bps, while "cushaw-long" binary runs faster for short reads of lengths >= 70 bps with a slight loss of alignment quality.

Construct BWT indices of genomes

To create BWT for large genomes, run command "./bwt_index -a bwtsw gdir/genome.fasta".
To create BWT for small genomes, run command "./bwt_index -a is gdir/genomes.fasta".
Note: all the BWT index files are in the gdir directory

Typical Commands

./cushaw directory gdir/genome.fasta -fasta reads.fa
./cushaw directory gdir/genome.fasta -fastq reads.fq
./cushaw directory gdir/genome.fasta -fasta reads.fa -s 32
./cushaw directory gdir/genome.fasta -fasta reads.fa -s 32 -g 2
./cushaw directory gdir/genome.fasta -fasta reads.fa -s 32 -g 2 -t 2
Note: When the user wants to use only one GPU in his host computer, where more than one GPUs are installed, the user can choose of the use of a specific GPU through the option "-gi". This option speficies the GPU index (starting from 0) in the host and only has effect when only one GPU is used.

Important notes

In general, the default parameters give good performance for most reads produced from the currently mainstream Illunima sequencers. However, when users intend to tune the parameters, it is generally sufficient by only specifying two parameters "-mms" and "-mmr" for single-end alignment. In addition, for paired-end alignment,the maximal insert size should also be properly specified, which is usually calculated as ins_mean + 4*ins_std (ins_mean is the mean insert size and ins_std is the insert-size standard deviation).
The alignment results are stored in the "Aligns.sam" file in the "directory" directory. The downstream analysis can be done using SAMtools.
"cushaw" binary executable gives better alignment quality with fast speed for short reads of lengths <70bps, while "cushaw-long" binary executable runs faster for short reads of lengths >=70bps with a slight loss of alignment quality. However, we recommend the use of cushaw-long for reads of >70bps and otherwise, the use of cushaw
CUSHAW will frequently store data to hard disks at runtime. This is because CUSHAW does not entirely load the input short reads into memory, only loading a batch of reads at a time. In this case, the intermediate results of processed reads must be stored onto disk for the subsequent use. Thus, we strongly recommend using a working directory in a local hard disk for the consideration of fast speed. I have to admit that the framework of CUSHAW was not well built and would like to re-write this program in the future.
I found that the SAM output of CUSHAW is incompatible with the requirements of DWGSIM short read simulator.Hence, the evaluation results using the program from DWGSIM software package are generally incorrect.I would recomment the use of WGSIM in the SAMtools package to simulate reads.

Change Log

June 2, 2012
1. Provides a new option "-all_in_sam" to allow the output of unaligned read information in the final SAM file
2. Provides a new optiont "-disable_sw" to disable the use of Smith-Waterman in the paired-end alignment

Contact

If any questions or improvements, please feel free to contact Liu, Yongchao.

CUSHAW - fast short read alignment for CUDA-enabled GPUs

Site Map

Project Links

List of My Software

Big Data

Machine Learning

Scientific Computing

Sequence Alignment

Motif Discovery

NGS Read Alignment

NGS Read Error Correction

NGS de novo Assembly

NGS SNV calling

NGS Metagenomics

Inspire Innovation