CUSHAW (the first release of CUSHAW software package for next-generation sequencing read alignment) is a CUDA compatible short read alignment algorithm for multiple GPUs sharing a single host.This aligner is designed based on the Burrows-Wheeler transform (BWT) and programmed using CUDA C++ parallel programming language. Performance evaluation, using simulated as well as real short read datasets, reveals that our aligner achieves significant speedups in terms of execution time, while yielding comparable or even better alignment quality for paired-end alignments, compared to three popular BWT-based aligners: Bowtie, BWA and SOAP2. This aligner only provides support for ungapped alignment and has been incorporated to NVIDIA Tesla Bio Workbench. This algorithm is presented in the paper "CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform".
more details about the changes in this version are availabe at changelog.
- Yongchao Liu, Bertil Schmidt, and Douglas L. Maskell: " CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform". Bioinformatics, 2012, 28(14): 1830-1837 [PDF]
Other related papers
- Yongchao Liu and Bertil Schmidt: "Long read alignment based on maximal exact match seeds". Bioinformatics, 2012, 28(18): i318-324 [PDF]
- Yongchao Liu and Bertil Schmidt: "CUSHAW2-GPU: empowering faster gapped short-read alignment using GPU computing". IEEE Design & Test of Computers, 2014, 31(1): 31-39
- Yongchao Liu, Bernt Popp, and Bertil Schmidt: "CUSHAW3: sensitive and accurate base-space and color-space short-read alignment with hybrid seeding." PLOS ONE, 2014, 9(1):e86869
- -fasta file1 [file2] (input sequences in FASTA format)
- -fastaPaired file1 file2 (input sequences in FASTA format)
- -fastq file1 [file2] (input sequences in FASTQ format)
- -fastqPaired file1 file2 (input sequences in FASTQ format)
- -trim3 / -trim5 <int> (the number of trimmed bases from the 3'/5' end)
- -all_in_sam (output both aligned and unaligned reads to file "Aligns.sam" in SAM format)
- -aln / -unaln file <string> (output file for aligned / unaligned reads)
- -s <int> (seed size, default 32 and 0 disabling seeding)
- -i <int> (maximum insert size for paired-end mapping, default 300 )
- -mms <int> (maximal number of mismatches in the seed, default 2 )
- -mmr <int> (maximal number of mismatches in the full length, default 7 )
- -qss <int> (quality score sum limit at mismatched positions in the seed, default 70 )
- -qsr <int> (quality score sum limit at mismatched positions in the full length, default 210 )
- -e <float> (uniform base error rate, default 0.04 )
- -b <int> (give the top #int alignments, default 1 )
- -disable_sw (disabling the use of the Smith-Waterman for the paired-end alignment)
- -nofw / -norc (do not align forward / reverse strand)
- -g <int> (the number of GPUs used, default 1)
- -gi <int> (the index of the selected single GPU (default 0), effective when #g=1
- -t <int> (number of threads for post-alignment process, default 4)
- -v (print out the software version)
- -h | -? (print out the software usage)
- Install CUDA 3.x or higher toolkits and SDK.
- NVIDIA CUDA-enabled GPUs based on the Fermi architecture or newer.
- Turn OFF ECC using the command "nvidia-smi -e 0".
- The sub-directory "bwt_index" contains the source code for the BWT construction program. This program is taken from the open-source software BWA, since the BWT construction has been extensively researched to reduce the working space and increase the execution speed in different literatures.
- The sub-directory "cushaw" constains the source code for the short read aligner.
- Go to each directory (i.e. either bwt_index or cushaw) and simply type "make" to compile them.
- In the sub-directory "bwt_index", an executable binary bwt_index will be created.
- In the sub-directory "cushaw", two executable binaries "cushaw" and "cushaw-long" will be created. "cushaw" binary gives better alignment quality with fast speed for short reads of lengths < 70bps, while "cushaw-long" binary runs faster for short reads of lengths >= 70 bps with a slight loss of alignment quality.
Construct BWT indices of genomes
- To create BWT for large genomes, run command "./bwt_index -a bwtsw gdir/genome.fasta".
- To create BWT for small genomes, run command "./bwt_index -a is gdir/genomes.fasta".
Note: all the BWT index files are in the gdir directory
- ./cushaw directory gdir/genome.fasta -fasta reads.fa
- ./cushaw directory gdir/genome.fasta -fastq reads.fq
- ./cushaw directory gdir/genome.fasta -fasta reads.fa -s 32
- ./cushaw directory gdir/genome.fasta -fasta reads.fa -s 32 -g 2
- ./cushaw directory gdir/genome.fasta -fasta reads.fa -s 32 -g 2 -t 2
Note: When the user wants to use only one GPU in his host computer, where more than one GPUs are installed, the user can choose of the use of a specific GPU through the option "-gi". This option speficies the GPU index (starting from 0) in the host and only has effect when only one GPU is used.
- In general, the default parameters give good performance for most reads produced from the currently mainstream Illunima sequencers. However, when users intend to tune the parameters, it is generally sufficient by only specifying two parameters "-mms" and "-mmr" for single-end alignment. In addition, for paired-end alignment,the maximal insert size should also be properly specified, which is usually calculated as ins_mean + 4*ins_std (ins_mean is the mean insert size and ins_std is the insert-size standard deviation).
- The alignment results are stored in the "Aligns.sam" file in the "directory" directory. The downstream analysis can be done using SAMtools.
- "cushaw" binary executable gives better alignment quality with fast speed for short reads of lengths <70bps, while "cushaw-long" binary executable runs faster for short reads of lengths >=70bps with a slight loss of alignment quality. However, we recommend the use of cushaw-long for reads of >70bps and otherwise, the use of cushaw
- CUSHAW will frequently store data to hard disks at runtime. This is because CUSHAW does not entirely load the input short reads into memory, only loading a batch of reads at a time. In this case, the intermediate results of processed reads must be stored onto disk for the subsequent use. Thus, we strongly recommend using a working directory in a local hard disk for the consideration of fast speed. I have to admit that the framework of CUSHAW was not well built and would like to re-write this program in the future.
- I found that the SAM output of CUSHAW is incompatible with the requirements of DWGSIM short read simulator.Hence, the evaluation results using the program from DWGSIM software package are generally incorrect.I would recomment the use of WGSIM in the SAMtools package to simulate reads.
- June 2, 2012
- Provides a new option "-all_in_sam" to allow the output of unaligned read information in the final SAM file
- Provides a new optiont "-disable_sw" to disable the use of Smith-Waterman in the paired-end alignment
If any questions or improvements, please contact Liu Yongchao (Email: yliu860 (at) gatech (dot) edu).