Jump to content

List of sequence alignment software

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 193.62.194.241 (talk) at 09:44, 17 December 2013 (Update URLs for services provided by EMBL-EBI, and switch to use the EMBL-EBI name instead of EBI.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple sequence alignment. See structural alignment software for structural alignment of proteins.

Database search only

Name Description Sequence Type* Link Authors Year
BLAST local search with fast k-tuple heuristic (Basic Local Alignment Search Tool) Both NCBI EMBL-EBI DDBJ DDBJ (psi-blast) GenomeNet PIR (protein only) Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ[1] 1990
CS-BLAST sequence-context specific BLAST, more sensitive than BLAST, FASTA, and SSEARCH. Position-specific iterative version CSI-BLAST more sensitive than PSI-BLAST Protein CS-BLAST server download Angermueller C, Biegert A, Soeding J[2] 2013
CUDASW++ GPU accelerated Smith Waterman algorithm for multiple shared-host GPUs Protein homepage publication Liu Y, Maskell DL and Schmidt B 2009/2010
FASTA local search with fast k-tuple heuristic, slower but more sensitive than BLAST Both EMBL-EBI DDBJ GenomeNet PIR (protein only)
GGSEARCH / GLSEARCH Global:Global (GG), Global:Local (GL) alignment with statistics Protein FASTA server
HMMER local and global search with profile Hidden Markov models, more sensitive than PSI-BLAST Both download Durbin R, Eddy SR, Krogh A, Mitchison G[3] 1998
HHpred / HHsearch pairwise comparison of profile Hidden Markov models; very sensitive, but can only search alignment databases (Pfam, PDB, InterPro...) Protein server download Söding J[4] 2005
IDF Inverse Document Frequency Both download
Infernal profile SCFG search RNA download Eddy S
PSI-BLAST position-specific iterative BLAST, local search with position-specific scoring matrices, much more sensitive than BLAST Protein NCBI PSI-BLAST Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ[5] 1997
PSI-Search Combining the Smith-Waterman search algorithm with the PSI-BLAST profile construction strategy to find distantly related protein sequences, and preventing homologous over-extension errors. Protein EMBL-EBI PSI-Search Li W, McWilliam H, Goujon M, Cowley A, Lopez R, Pearson WR[6] 2012
ScalaBLAST Highly parallel Scalable BLAST Both ScalaBLAST Oehmen et al.[7] 2011
Sequilab Linking and profiling sequence alignment data from NCBI-BLAST results with major sequence analysis servers/services Nucleotide/peptide server 2010
SAM local and global search with profile Hidden Markov models, more sensitive than PSI-BLAST Both SAM Karplus K, Krogh A[8] 1999
SSEARCH Smith-Waterman search, slower but more sensitive than FASTA Both EMBL-EBI DDBJ
*Sequence Type: Protein or nucleotide

Pairwise alignment

Name Description Sequence Type* Alignment Type** Link Author Year
ACANA fast heuristic anchor based pairwise alignment Both Both download Huang, Umbach, Li 2005
AlignMe Alignments for membrane protein sequences Protein Both download,server M. Stamm, K. Khafizov, R. Staritzbichler, L.R. Forrest 2013
Bioconductor Biostrings::pairwiseAlignment Dynamic programming Both Both + Ends-free site P. Aboyoun 2008
BioPerl dpAlign Dynamic programming Both Both + Ends-free site Y. M. Chan 2003
BLASTZ,LASTZ Seeded pattern-matching Nucleotide Local download,download Schwartz et al. 2004,2009
DNADot Web-based dot-plot tool Nucleotide Global server R. Bowen 1998
DOTLET Java-based dot-plot tool Both Global applet M. Pagni and T. Junier 1998
FEAST Posterior based local extension with descriptive evolution model Nucleotide Local site A. K. Hudek and D. G. Brown 2010
G-PAS GPU-based dynamic programming with backtracking Both Local, SemiGlobal, Global site+download W. Frohmberg, M. Kierzynka et al. 2011
GapMis GapMis is a tool for pairwise sequence alignment with a single gap Both SemiGlobal site K. Frousios, T. Flouri, C. S. Iliopoulos, K. Park, S. P. Pissis, G. Tischler 2012
GGSEARCH, GLSEARCH Global:Global (GG), Global:Local (GL) alignment with statistics Protein Global in query FASTA server W. Pearson 2007
JAligner Open source Java implementation of Smith-Waterman Both Local JWS A. Moustafa 2005
K*Sync Protein sequence to structure alignment that includes secondary structure, structural conservation, structure-derived sequence profiles, and consensus alignment scores Protein Both Robetta server D. Chivian & D. Baker 2003
LALIGN Multiple, non-overlapping, local similarity (same algorithm as SIM) Both Local non-overlapping server FASTA server W. Pearson 1991 (algorithm)
NW-align Standard Needleman-Wunsch dynamic programming algorithm Protein Global server and download Y Zhang 2012
mAlign modelling alignment; models the information content of the sequences Nucleotide Both doc code D. Powell, L. Allison and T. I. Dix 2004
matcher Waterman-Eggert local alignment (based on LALIGN) Both Local Pasteur I. Longden (modified from W. Pearson) 1999
MCALIGN2 explicit models of indel evolution DNA Global server J. Wang et al. 2006
MUMmer suffix tree based Nucleotide Global download S. Kurtz et al. 2004
needle Needleman-Wunsch dynamic programming Both SemiGlobal EMBL-EBIPasteur A. Bleasby 1999
Ngila logarithmic and affine gap costs and explicit models of indel evolution Both Global download R. Cartwright 2007
Path Smith-Waterman on protein back-translation graph (detects frameshifts at protein level) Protein Local server download M. Gîrdea et al. 2009
PatternHunter Seeded pattern-matching Nucleotide Local download B. Ma et al. 2002–2004
ProbA (also propA) Stochastic partition function sampling via dynamic programming Both Global download U. Mückstein 2002
PyMOL "align" command aligns sequence & applies it to structure Protein Global (by selection) site W. L. DeLano 2007
REPuter suffix tree based Nucleotide Local download S. Kurtz et al. 2001
SABERTOOTH Alignment using predicted Connectivity Profiles Protein Global download on request F. Teichert, J. Minning, U. Bastolla, and M. Porto 2009
Satsuma Parallel whole-genome synteny alignments DNA Local download M.G. Grabherr et al. 2010
SEQALN Various dynamic programming Both Local or Global server M.S. Waterman and P. Hardy 1996
SIM, GAP, NAP, LAP Local similarity with varying gap treatments Both Local or global server X. Huang and W. Miller 1990-6
SIM Local similarity Both Local servers X. Huang and W. Miller 1991
SPA: Super pairwise alignment Fast pairwise global alignment Nucleotide Global available upon request Shen, Yang, Yao, Hwang 2002
SSEARCH Local (Smith-Waterman) alignment with statistics Protein Local EMBL-EBI FASTA server W. Pearson 1981 (Algorithm)
Sequences Studio Java applet demonstrating various algorithms from [9] Generic sequence Local and global code applet A.Meskauskas 1997 (reference book)
SWIFT suit Fast Local Alignment Searching DNA Local site K. Rasmussen, W. Gerlach 2005,2008
stretcher Memory-optimized Needleman-Wunsch dynamic programming Both Global Pasteur I. Longden (modified from G. Myers and W. Miller) 1999
tranalign Aligns nucleic acid sequences given a protein alignment Nucleotide NA Pasteur G. Williams (modified from B. Pearson) 2002
UGENE Opensource Smith-Waterman for SSE/CUDA, Suffix array based repeats finder & dotplot Both Both UGENE site UniPro 2010
water Smith-Waterman dynamic programming Both Local EMBL-EBIPasteur A. Bleasby 1999
wordmatch k-tuple pairwise match Both NA Pasteur I. Longden 1998
YASS Seeded pattern-matching Nucleotide Local server download L. Noe and G. Kucherov 2004–2011
*Sequence Type: Protein or nucleotide. **Alignment Type: Local or global

Multiple sequence alignment

Name Description Sequence Type* Alignment Type** Link Author Year License
ABA A-Bruijn alignment Protein Global download B.Raphaelet al. 2004 Proprietary, without charge for educational, research and non profit.
ALE manual alignment ; some software assistance Nucleotides Local download J. Blandy and K. Fogel 1994 (latest version 2007) GPL2
AMAP Sequence annealing Both Global server A. Schwartz and L. Pachter 2006
anon. fast, optimal alignment of three sequences using linear gap costs Nucleotides Global paper software D. Powell, L. Allison and T. I. Dix 2000
BAli-Phy Tree+Multi alignment ; Probabilistic/Bayesian ; Joint Estimation Both Global WWW+download BD Redelings and MA Suchard 2005 (latest version 2010)
Base-By-Base Java-based multiple sequence alignment editor with integrated analysis tools Both Local or Global download R. Brodie et al. 2004 Free, requires registration.
CHAOS/DIALIGN Iterative alignment Both Local (preferred) server M. Brudno and B. Morgenstern 2003
ClustalW Progressive alignment Both Local or Global download EMBL-EBI DDBJ PBIL EMBNet GenomeNet Thompson et al. 1994 GNU Lesser GPL
CodonCode Aligner Multi alignment; ClustalW & Phrap support Nucleotides Local or Global download P. Richterich et al. 2003 (latest version 2009)
Compass COmparison of Multiple Protein sequence Alignments with assessment of Statistical Significance Protein Global download and server R.I. Sadreyev, et al. 2009
DIALIGN-TX and DIALIGN-T Segment-based method Both Local (preferred) or Global download and server A.R.Subramanian 2005 (latest version 2008)
DNA Alignment Segment-based method for intraspecific alignments Both Local (preferred) or Global server A.Roehl 2005 (latest version 2008)
DNA Baser Sequence Assembler Multi alignment; Automatic batch alignment Nucleotides Local or Global www.DnaBaser.com Heracle BioSoft 2012
EDNA Energy Based Multiple Sequence Alignment for DNA Binding Sites Nucleotides Local or Global sourceforge.net/projects/msa-edna/ Salama, RA. et al. 2013
FSA Sequence annealing Both Global download and server R. K. Bradley et al. 2008
Geneious Progressive/Iterative alignment; ClustalW plugin Both Local or Global download A.J. Drummond et al. 2005 (latest version 2009)
Kalign Progressive alignment Both Global serverEMBL-EBI MPItoolkit T. Lassmann 2005
MAFFT Progressive/iterative alignment Both Local or Global GenomeNet MAFFT K. Katoh et al. 2005
MARNA Multiple Alignment of RNAs RNA Local server download S. Siebert et al. 2005
MAVID Progressive alignment Both Global server N. Bray and L. Pachter 2004
MSA Dynamic programming Both Local or Global download D.J. Lipman et al. 1989 (modified 1995)
MSAProbs Dynamic programming Protein Global download Y. Liu, B. Schmidt, D. Maskell 2010
MULTALIN Dynamic programming/clustering Both Local or Global server download F. Corpet 1988
Multi-LAGAN Progressive dynamic programming alignment Both Global server M. Brudno et al. 2003
MUSCLE Progressive/iterative alignment Both Local or Global server R. Edgar 2004
Opal Progressive/iterative alignment Both Local or Global download T. Wheeler and J. Kececioglu 2007
Pecan Probabilistic/consistency DNA Global download B. Paten et al. 2008
Phylo A human computing framework for comparative genomics to solve multiple alignment Nucleotides Local or Global site McGill Bioinformatics 2010
Praline Progressive/iterative/consistency/homology-extended alignment with pre-profiling and secondary structure prediction Protein Global server J. Heringa 1999 (latest version 2009)
POA Partial order/hidden Markov model Protein Local or Global download C. Lee 2002
Probalign Probabilistic/consistency with partition function probabilities Protein Global server Roshan and Livesay 2006
ProbCons Probabilistic/consistency Protein Local or Global server C. Do et al. 2005
PROMALS3D Progressive alignment/hidden Markov model/Secondary structure/3D structure Protein Global server J. Pei et al. 2008
PRRN/PRRP Iterative alignment (especially refinement) Protein Local or Global PRRP PRRN Y. Totoki (based on O. Gotoh) 1991 and later
PSAlign Alignment preserving non-heuristic Both Local or Global download S.H. Sze, Y. Lu, Q. Yang. 2006
RevTrans Combines DNA and Protein alignment, by back translating the protein alignment to DNA. DNA/Protein (special) Local or Global server Wernersson and Pedersen 2003 (newest version 2005)
SAGA Sequence alignment by genetic algorithm Protein Local or Global download C. Notredame et al. 1996 (new version 1998)
SAM Hidden Markov model Protein Local or Global server A. Krogh et al. 1994 (most recent version 2002)
Se-Al Manual alignment Both Local download A. Rambaut 2002
StatAlign Bayesian co-estimation of alignment and phylogeny (MCMC) Both Global download A. Novak et al. 2008
Stemloc Multiple alignment and secondary structure prediction RNA Local or Global download I. Holmes 2005 GPLv3 (parte de DART)
T-Coffee More sensitive progressive alignment Both Local or Global server download C. Notredame et al. 2000 (newest version 2008) GPL2
UGENE Supports multiple alignment with MUSCLE, KAlign, Clustal and MAFFT plugins Both Local or Global download UGENE team 2010 (newest version 2012) GPL2
VectorFriends VectorFriends Aligner, MUSCLE plugin, and ClustalW plugin Both Local or Global download BioFriends team 2013 Proprietary, but free for academic researchers
GLProbs Adaptive pair-Hidden Markov Model based approach Protein Global download Yongtao Ye et al. 2013
*Sequence Type: Protein or nucleotide. **Alignment Type: Local or global

Genomics analysis

Name Description Sequence Type* Link
ACT (Artemis Comparison Tool) Synteny and comparative genomics Nucleotide server
AVID Pairwise global alignment with whole genomes Nucleotide server
BLAT Alignment of cDNA sequences to a genome. Nucleotide
GMAP Alignment of cDNA sequences to a genome. Identifies splice site junctions with high accuracy. Nucleotide http://research-pub.gene.com/gmap
Splign Alignment of cDNA sequences to a genome. Identifies splice site junctions with high accuracy. Able to recognize and separate gene duplications. Nucleotide http://www.ncbi.nlm.nih.gov/sutils/splign
Mauve Multiple alignment of rearranged genomes (also available inside Geneious) Nucleotide download
MGA Multiple Genome Aligner Nucleotide download
Mulan Local multiple alignments of genome-length sequences Nucleotide server
Multiz Multiple alignment of genomes Nucleotide download
PLAST-ncRNA Search for ncRNAs in genomes by partition function local alignment Nucleotide server
Sequerome Profiling sequence alignment data with major servers/services Nucleotide/peptide server
Sequilab Profiling sequence alignment data from NCBI-BLAST results with major servers/services Nucleotide/peptide server
Shuffle-LAGAN Pairwise glocal alignment of completed genome regions Nucleotide server
SIBsim4 / Sim4 A program designed to align an expressed DNA sequence with a genomic sequence, allowing for introns Nucleotide download
SLAM Gene finding, alignment, annotation (human-mouse homology identification) Nucleotide server
*Sequence Type: Protein or nucleotide



Motif finding

Name Description Sequence Type* Link
PMS Motif search and discovery Both server server
FMM Motif search and discovery (can get also positive & negative sequences as input for enriched motif search) Nucleotide server
BLOCKS Ungapped motif identification from BLOCKS database Both server
eMOTIF Extraction and identification of shorter motifs Both servers
Gibbs motif sampler Stochastic motif extraction by statistical likelihood Both server server
HMMTOP Prediction of transmembrane helices and topology of proteins Protein homepage & download
I-sites Local structure motif library Protein server
JCoils Prediction of Coiled coil and Leucine Zipper Protein homepage & download
MEME/MAST Motif discovery and search Both server
CUDA-MEME GPU accelerated MEME (v4.4.0) algorithm for GPU clusters Both homepage
MERCI Discriminative motif discovery and search Both homepage & download
PHI-Blast Motif search and alignment tool Both Pasteur
Phyloscan Motif search tool Nucleotide server
PRATT Pattern generation for use with ScanProsite Protein server
ScanProsite Motif database search tool Protein server
TEIRESIAS Motif extraction and database search Both server
BASALT Multiple motif and regular expression search Both homepage
*Sequence Type: Protein or nucleotide



Benchmarking

Name Link Authors
BAliBASE download Thompson, Plewniak, Poch
HOMSTRAD download Mizuguchi
Oxbench download Raghava, Searle, Audley, Barber, Barton
PFAM download
PREFAB download Edgar
SABmark download Van Walle, Lasters, Wyns
SMART download Letunic, Copley, Schmidt, Ciccarelli, Doerks, Schultz, Ponting, Bork

Alignment Viewers/Editors

Please see the List of alignment visualization software.

Short-Read Sequence Alignment

Name Description paired-end option Use FASTQ quality Gapped Multi-threaded License Link
BarraCUDA A GPGPU accelerated Burrows-Wheeler transform (FM-index) short read alignment program based on BWA, supports alignment of indels with gap openings and extensions. Yes No Yes Yes (POSIX Threads and CUDA) GPL link
BFAST Explicit time and accuracy tradeoff with a prior accuracy estimation, supported by indexing the reference sequences. Optimally compresses indexes. Can handle billions of short reads. Can handle insertions, deletions, SNPs, and color errors (can map ABI SOLiD color space reads). Performs a full Smith Waterman alignment. Yes (POSIX Threads) GPL link
BLASTN BLAST's nucleotide alignment program, slow and not accurate for short reads, and uses a sequence database (EST, sanger sequence) rather than a reference genome. link
BLAT Made by Jim Kent. Can handle one mismatch in initial alignment step. Yes (client/server). Free for academic and non-commercial use. link
Bowtie Uses a Burrows-Wheeler transform to create a permanent, reusable index of the genome; 1.3 GB memory footprint for human genome. Aligns more than 25 million Illumina reads in 1 CPU hour. Supports Maq-like and SOAP-like alignment policies Yes (POSIX Threads) Artistic License link
BWA Uses a Burrows-Wheeler transform to create an index of the genome. It's a bit slower than bowtie but allows indels in alignment. Yes No Yes Yes GPL link
CASHX Quantify and manage large quantities of short-read sequence data. CASHX pipeline contains a set of tools that can be used together or as independent modules on their own. This algorithm is very accurate for perfect hits to a reference genome. No Free for academic and non-commercial use. link
Cloudburst Short-read mapping using Hadoop MapReduce Yes (Hadoop MapReduce) Artistic License link
CUDA-EC Short-read alignment error correction using GPUs. Yes (GPU enabled) CUDA-EC-
CUSHAW A CUDA compatible short read aligner to large genomes based on Burrows-Wheeler transform. Yes Yes No Yes (GPU enabled) GPL link
CUSHAW2 Gapped short-read and long-read alignment based on maximal exact match seeds. This aligner supports both base-space (e.g. from Illumina, 454, Ion Torrent and PacBio sequencers) and ABI SOLiD color-space read alignments. Yes No Yes Yes GPL link
CUSHAW2-GPU GPU-accelerated CUSHAW2 short-read aligner. Yes No Yes Yes GPL link
drFAST Read mapping alignment software that implements cache obliviousness to minimize main/cache memory transfers like mrFAST and mrsFAST, however designed for the SOLiD sequencing platform (color space reads). It also returns all possible map locations for improved structural variation discovery. Yes Yes (for structural variation) Yes No BSD link
ELAND Implemented by Illumina. Includes ungapped alignment with a finite read length.
ERNE Extended Randomized Numerical alignEr for accurate alignment of NGS reads. It can map bisulfite-treated reads. Yes Low quality bases trimming Yes Multithreading and MPI-enabled GPL v3 link
GNUMAP Accurately performs gapped alignment of sequence data obtained from next-generation sequencing machines (specifically that of Solexa/Illumina) back to a genome of any size. Includes adaptor trimming, SNP calling and Bisulfite sequence analysis. Yes (also supports Illumina *_int.txt and *_prb.txt files with all 4 quality scores for each base) Multithreading and MPI-enabled link
GEM High-quality alignment engine (exhaustive mapping with substitutions and indels). More accurate and several times faster than BWA or Bowtie 1/2. Many standalone biological applications (mapper, split mapper, mappability, and other) provided. Yes Yes Yes Yes Dual (free for non-commercial use); GEM source is currently unavailable link
GensearchNGS Complete framework with user-friendly GUI to analyse NGS data. It integrates a proprietary high quality alignment algorithm as well as plug-in capability to integrate various public aligner into a framework allowing to import short reads, align them, detect variants and generate reports. It is geared towards re-sequencing projects, namely in a diagnostic setting. Yes No Yes Yes Commercial; link
GMAP and GSNAP Robust, fast short-read alignment. GMAP: longer reads, with multiple indels and splices (see entry above under Genomics analysis); GSNAP: shorter reads, with a single indel or up to two splices per read. Useful for digital gene expression, SNP and indel genotyping. Developed by Thomas Wu at Genentech. Used by the National Center for Genome Resources (NCGR) in Alpheus. Yes Yes Yes Yes Free for academic and non-commercial use. link
Geneious Assembler Fast, accurate overlap assembler with the ability to handle any combination of sequencing technology, read length, any pairing orientations, with any spacer size for the pairing, with or without a reference genome. Yes Commercial link
iSAAC iSAAC has been designed to take full advantage of all the computational power available on a single server node. As a result iSAAC scales well over a broad range of hardware architectures, and alignment performance improves with hardware capabilities Yes Yes Yes Yes Free for academic and non-commercial use. github

paper

LAST Yes Yes Yes GPL link
MAQ Ungapped alignment that takes into account quality scores for each base. GPL link
mrFAST and mrsFAST Gapped (mrFAST) and ungapped (mrsFAST) alignment software that implements cache obliviousness to minimize main/cache memory transfers. They are designed for the Illumina sequencing platform and they can return all possible map locations for improved structural variation discovery. Yes Yes (for structural variation) Yes No BSD mrFAST mrsFAST
MOM MOM or maximum oligonucleotide mapping is a query matching tool that captures a maximal length match within the short read. Yes link
MOSAIK Fast gapped aligner and reference-guided assembler. Aligns reads using a banded Smith-Waterman algorithm seeded by results from a k-mer hashing scheme. Supports reads ranging in size from very short to very long. Yes link
MPscan Fast aligner based on a filtration strategy (no indexing, use q-grams and Backward Nondeterministic DAWG Matching) link
Novoalign & NovoalignCS Gapped alignment of single end and paired end Illumina GA I & II, ABI Colour space & ION Torrent reads.. High sensitivity and specificity, using base qualities at all steps in the alignment. Includes adapter trimming, base quality calibration, Bi-Seq alignment, and option to report multiple alignments per read. Yes Yes Yes Multi-threading and MPI versions available with paid license. Single threaded version free for academic and non-commercial use. Novocraft
NextGENe NextGENe® software has been developed specifically for use by biologists performing analysis of next generation sequencing data from Roche Genome Sequencer FLX, Illumina GA/HiSeq, Life Technologies Applied BioSystems’ SOLiD™ System, PacBio and Ion Torrent platforms. Yes Yes Yes Yes Commercial Softgenetics
Omixon The Omixon Variant Toolkit includes highly sensitive and highly accurate tools for detecting SNPs and indels. It offers a solution to map NGS short reads with a moderate distance (up to 30% sequence divergence) from reference genomes. It poses no restrictions on the size of the reference, which, combined with its high sensitivity, makes the Variant Toolkit well-suited for targeted sequencing projects and diagnostics. Yes Yes Yes Yes Commercial www.omixon.com
PALMapper PALMapper, efficiently computes both spliced and unspliced alignments at high accuracy. Relying on a machine learning strategy combined with a fast mapping based on a banded Smith-Waterman-like algorithm it aligns around 7 million reads per hour on a single CPU. It refines the originally proposed QPALMA approach. Yes GPL link
Partek Partek® Flow software has been developed specifically for use by biologists and bioinformaticians. It supports un-gapped, gapped and splice-junction alignment from single and paired-end reads from Illumina, Life technologies Solid TM, Roche 454 and Ion Torrent raw data (with or without quality information). It integrates powerful quality control on FASTQ/Qual level and on aligned data. Additional functionality include trimming and filtering of raw reads, SNP and InDel detection, mRNA and microRNA quantification and fusion gene detection. Yes Yes Yes Multiprocessor/Core, Client-Server installation possible Commercial, FREE trial version [1]
PASS Indexes the genome, then extends seeds using pre-computed alignments of words. Works with base space as well as color space (SOLID) and can align genomic and spliced RNA-seq reads. Yes Yes Yes Yes Free for academic and non-commercial use. PASS_HOME
PerM Indexes the genome with periodic seeds to quickly find alignments with full sensitivity up to four mismatches. It can map Illumina and SOLiD reads. Unlike most mapping programs, speed increases for longer read lengths. Yes GPL link
PRIMEX Indexes the genome with a k-mer lookup table with full sensitivity up to an adjustable number of mismatches. It is best for mapping 15-60bp sequences to a genome. No link
QPalma Is able to take advantage of quality scores, intron lengths and computation splice site predictions to perform and performs an unbiased alignment. Can be trained to the specifics of a RNA-seq experiment and genome. Useful for splice site/intron discovery and for gene model building. (See PALMapper for a faster version). Yes (client/server) GPLv2 link
RazerS No read length limit. Hamming or edit distance mapping with configurable error rates. Configurable and predictable sensitivity (runtime/sensitivity tradeoff). Supports paired-end read mapping. LGPL link
REAL, cREAL REAL is an efficient, accurate, and sensitive tool for aligning short reads obtained from next-generation sequencing. The programme can handle an enormous amount of single-end reads generated by the next-generation Illumina/Solexa Genome Analyzer. cREAL is a simple extension of REAL for aligning short reads obtained from next-generation sequencing to a genome with circular structure. Yes Yes GPL link
RMAP Can map reads with or without error probability information (quality scores) and supports paired-end reads or bisulfite-treated read mapping. There are no limitations on read length or number of mismatches. Yes Yes Yes GPL v3 link
rNA A randomized Numerical Aligner for Accurate alignment of NGS reads Yes Low quality bases trimming Yes Multithreading and MPI-enabled GPL v3 link
RTG Investigator Extremely fast, tolerant to high indel and substitution counts. Includes full read alignment. Product includes comprehensive pipelines for variant detection and metagenomic analysis with any combination of Illumina, Complete Genomics and Roche 454 data. Yes Yes, for variant calling Yes Yes Free for individual investigator use. link
Segemehl Can handle insertions, deletions and mismatches. Uses enhanced suffix arrays. Yes No Yes Yes Free for non-commercial use link
SeqMap Up to 5 mixed substitutions and insertions/deletions. Various tuning options and input/output formats. Free for academic and non-commercial use. link
Shrec Short read error correction with a Suffix trie data structure. Yes (Java) link
SHRiMP Indexes the reference genome as of version 2. Uses masks to generate possible keys. Can map ABI SOLiD color space reads. Yes Yes Yes Yes (OpenMP) BSD derivative link
SLIDER Slider is an application for the Illumina Sequence Analyzer output that uses the "probability" files instead of the sequence files as an input for alignment to a reference sequence or a set of reference sequences. link
SOAP, SOAP2, SOAP3 and SOAP3-dp SOAP: Robust with a small (1-3) number of gaps and mismatches. Speed improvement over BLAT, uses a 12 letter hash table. SOAP2: using bidirectional BWT to build the index of reference, and it is much faster than the first version. SOAP3: GPU-accelerated version that could find all 4-mismatch alignments in tens of seconds per one million reads. SOAP3-dp, also GPU accelerated, supports arbitrary number of mismatches and gaps according to affine gap penalty scores. Yes No SOAP3-dp:Yes Yes (POSIX Threads), SOAP3, SOAP3-dp need GPU with CUDA support. GPL link
SOCS For ABI SOLiD technologies. Significant increase in time to map reads with mismatches (or color errors). Uses an iterative version of the Rabin-Karp string search algorithm. Yes GPL link
SSAHA and SSAHA2 Fast for a small number of variants. Free for academic and non-commercial use. link
Stampy For Illumina reads. High specificity, and sensitive for reads with indels, structural variants, or many SNPs. Slow, but speed increased dramatically by using BWA for first alignment pass). Yes Yes Yes No Free for academic and non-commercial use link
SToRM For Illumina or ABI SOLiD reads, with SAM native output. Highly sensitive for reads with many errors, indels (from 1 to 16). Uses spaced seeds and a SSE/SSE2/AVX2 banded alignment filter. Experimental ; Authors recommend SHRiMP2. No Yes Yes Yes (OpenMP) link
Subread and Subjunc Superfast and accurate read aligners. Subread can be used to map both gDNA-seq and RNA-seq reads. Subjunc detects exon-exon junctions and maps RNA-seq reads. They employ a novel mapping paradigm called "seed-and-vote". Yes Yes Yes Yes GPL3 link link
Taipan de-novo Assembler for Illumina reads Free for academic and non-commercial use. link
UGENE Visual interface both for Bowtie and BWA, as well as an embedded aligner Opensource, GPL link
VelociMapper FPGA-accelerated reference sequence alignment mapping tool from TimeLogic. Faster than Burrows-Wheeler transform-based algorithms like BWA and Bowtie. Supports up to 7 mismatches and/or indels with no performance penalty. Produces sensitive Smith-Waterman gapped alignments. Yes Yes Yes Yes Commercial TimeLogic
XpressAlign FPGA based sliding window short read aligner which exploits the embarrassingly parallel property of short read alignment. Performance scales linearly with number of transistors on a chip (i.e. performance guaranteed to double with each iteration of Moore's Law without modification to algorithm). Low power consumption is useful for datacentre equipment. Predictable runtime. Better price/performance than software sliding window aligners on current hardware, but not better than software BWT-based aligners currently. Can cope with large numbers (>2) of mismatches. Will find all hit positions for all seeds. Single-FPGA experimental version, needs work to develop it into a multi-FPGA production version. Free for academic and non-commercial use. link
ZOOM 100% sensitivity for a reads between 15 - 240bp with practical mismatches. Very fast. Support insertions and deletions. Works with Illumina & SOLiD instruments, not 454. Yes (GUI) No (CLI). Commercial link

See also

References

  1. ^ Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990). "Basic local alignment search tool". Journal of Molecular Biology. 215 (3): 403–10. doi:10.1016/S0022-2836(05)80360-2. PMID 2231712. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  2. ^ Angermüller, C.; Biegert, A.; Söding, J. (2012). "Discriminative modelling of context-specific amino acid substitution probabilities". Bioinformatics. 28 (24): 3240–7. doi:10.1093/bioinformatics/bts622. PMID 23080114. {{cite journal}}: Unknown parameter |month= ignored (help)
  3. ^ Durbin, Richard; Eddy, Sean R.; Krogh, Anders; Mitchison, Graeme, eds. (1998). Biological sequence analysis: probalistic models of proteins and nucleic acids. Cambridge, UK: Cambridge University Press. ISBN 978-0-521-62971-3.[page needed]
  4. ^ Söding J (2005). "Protein homology detection by HMM-HMM comparison". Bioinformatics. 21 (7): 951–60. doi:10.1093/bioinformatics/bti125. PMID 15531603. {{cite journal}}: Unknown parameter |month= ignored (help)
  5. ^ Altschul SF; Madden TL; Schäffer AA; et al. (1997). "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs". Nucleic Acids Research. 25 (17): 3389–402. doi:10.1093/nar/25.17.3389. PMC 146917. PMID 9254694. {{cite journal}}: Unknown parameter |author-separator= ignored (help); Unknown parameter |month= ignored (help)
  6. ^ Li W; McWilliam H; Goujon M; et al. (2012). "PSI-Search: iterative HOE-reduced profile SSEARCH searching". Bioinformatics. 28 (12): 1650–1651. doi:10.1093/bioinformatics/bts240. PMC 3371869. PMID 22539666. {{cite journal}}: Unknown parameter |author-separator= ignored (help); Unknown parameter |month= ignored (help)
  7. ^ Oehmen, C.; Nieplocha, J. (2006). "ScalaBLAST: A scalable implementation of BLAST for high-performance data-intensive bioinformatics analysis". IEEE Transactions on Parallel & Distributed Systems. 17 (8): 740–749. doi:10.1109/TPDS.2006.112. {{cite journal}}: Unknown parameter |month= ignored (help)
  8. ^ Hughey, R.; Karplus, K.; Krogh, A. (2003). SAM: sequence alignment and modeling software system. Technical report UCSC-CRL-99-11 (Report). University of California, Santa Cruz, CA.
  9. ^ Gusfield, Dan (1997). Algorithms on strings, trees and sequences. Cambridge university press. ISBN 0-521-58519-8.