CUSHAW is a well-established leading next-generation sequencing read alignment software package based on multi-core and many-core computing. CUSHAW 1.0, the first release of CUSHAW software package for next-generation sequencing read alignment, is a CUDA compatible short read alignment algorithm for multiple GPUs sharing a single host.This aligner is designed based on the Burrows-Wheeler transform (BWT) and programmed using CUDA C++ parallel programming language. Performance evaluation, using simulated as well as real short read datasets, reveals that our aligner achieves significant speedups in terms of execution time, while yielding comparable or even better alignment quality for paired-end alignments, compared to three popular BWT-based aligners: Bowtie, BWA and SOAP2. This aligner only provides support for ungapped alignment and has been incorporated to NVIDIA Tesla Bio Workbench. This algorithm is presented in the paper "CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform".


  1. Latest source code (release 1.0.40)

    more details about the changes in this version are availabe at changelog.

  2. Simulated reads from the human genome

  3. Ready-made BWT indexes of genomes


Other related papers







Installation and Usage


  1. Install CUDA 3.x or higher toolkits and SDK.
  2. NVIDIA CUDA-enabled GPUs based on the Fermi architecture or newer.
  3. Turn OFF ECC using the command "nvidia-smi -e 0".


  1. The sub-directory "bwt_index" contains the source code for the BWT construction program. This program is taken from the open-source software BWA, since the BWT construction has been extensively researched to reduce the working space and increase the execution speed in different literatures.
  2. The sub-directory "cushaw" constains the source code for the short read aligner.


  1. Go to each directory (i.e. either bwt_index or cushaw) and simply type "make" to compile them.
  2. In the sub-directory "bwt_index", an executable binary bwt_index will be created.
  3. In the sub-directory "cushaw", two executable binaries "cushaw" and "cushaw-long" will be created. "cushaw" binary gives better alignment quality with fast speed for short reads of lengths < 70bps, while "cushaw-long" binary runs faster for short reads of lengths >= 70 bps with a slight loss of alignment quality.

Construct BWT indices of genomes

  1. To create BWT for large genomes, run command "./bwt_index -a bwtsw gdir/genome.fasta".
  2. To create BWT for small genomes, run command "./bwt_index -a is gdir/genomes.fasta".
    Note: all the BWT index files are in the gdir directory

Typical Commands

  1. ./cushaw directory gdir/genome.fasta -fasta reads.fa
  2. ./cushaw directory gdir/genome.fasta -fastq reads.fq
  3. ./cushaw directory gdir/genome.fasta -fasta reads.fa -s 32
  4. ./cushaw directory gdir/genome.fasta -fasta reads.fa -s 32 -g 2
  5. ./cushaw directory gdir/genome.fasta -fasta reads.fa -s 32 -g 2 -t 2
    Note: When the user wants to use only one GPU in his host computer, where more than one GPUs are installed, the user can choose of the use of a specific GPU through the option "-gi". This option speficies the GPU index (starting from 0) in the host and only has effect when only one GPU is used.

Important notes

  1. In general, the default parameters give good performance for most reads produced from the currently mainstream Illunima sequencers. However, when users intend to tune the parameters, it is generally sufficient by only specifying two parameters "-mms" and "-mmr" for single-end alignment. In addition, for paired-end alignment,the maximal insert size should also be properly specified, which is usually calculated as ins_mean + 4*ins_std (ins_mean is the mean insert size and ins_std is the insert-size standard deviation).
  2. The alignment results are stored in the "Aligns.sam" file in the "directory" directory. The downstream analysis can be done using SAMtools.
  3. "cushaw" binary executable gives better alignment quality with fast speed for short reads of lengths <70bps, while "cushaw-long" binary executable runs faster for short reads of lengths >=70bps with a slight loss of alignment quality. However, we recommend the use of cushaw-long for reads of >70bps and otherwise, the use of cushaw
  4. CUSHAW will frequently store data to hard disks at runtime. This is because CUSHAW does not entirely load the input short reads into memory, only loading a batch of reads at a time. In this case, the intermediate results of processed reads must be stored onto disk for the subsequent use. Thus, we strongly recommend using a working directory in a local hard disk for the consideration of fast speed. I have to admit that the framework of CUSHAW was not well built and would like to re-write this program in the future.
  5. I found that the SAM output of CUSHAW is incompatible with the requirements of DWGSIM short read simulator.Hence, the evaluation results using the program from DWGSIM software package are generally incorrect.I would recomment the use of WGSIM in the SAMtools package to simulate reads.

Change Log


If any questions or improvements, please feel free to contact Liu, Yongchao.