Cloud-scale BWAMEM (CS-BWAMEM) is an ultrafast and highly scalable aligner built on top of cloud infrastructures, including Spark and Hadoop distributed file system (HDFS). It leverages the abundant computing resources in a public or private cloud to fully exploit the parallelism obtained from the enormous number of reads. With CSBWAMEM, the pair-end whole-genome reads (30x) can be aligned within 80 minutes in a 25-node cluster with 300 cores. The features include: 1) support both pair-end and single-end alignment; 2) achieve similar quality to BWA-MEM; 3) Input: FASTQ files and 4) output: SAM (single-node) or ADAM (cluster) format.
Below is the overall design of CS-BWAMEM:
Postdoc and Students: Yu-Ting Chen, Sen Li, Myron Peto, Peng Wei, and Peipei Zhou
Faculty: Jason Cong, Jie Lei, Paul Spellman
CS-BWAMEM is open source and available for download at github: https://github.com/ytchen0323/cloud-scale-bwamem