Jump to content

De novo sequence assemblers

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Meltonae (talk | contribs) at 22:39, 23 February 2017 (Added in some of the body of text for article draft 1. More information to be added, including links to the various softwares and citations.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

De Novo Sequence Assemblers are a type of program that assembles short nucleotide or

amino acid sequences into longer sequences without the use of a reference sequence. These

are most commonly used in bioinformatic studies to assemble genomes or transcriptome (link

to de novo transcriptome assembly).

Types of de novo assemblers

There are two types of algorithms that are commonly utilized by these assemblers: greedy

(linky), which aim to for local optima, and de bruijn graph algorithms (linky), which aim

for global optima. Different assemblers are tailored for particular needs, such as the

assembly of small, bacterial genomes, large, eukaryotic genomes, or transcriptomes

(linky).     

Greedy algorithm assemblers are assemblers that find local optima in alignments of

    smaller reads.    

    De bruijn graph assemblers assembles a de bruijn graph (linky) to guide the assembly.

    During the assembly of the ggraph, reads are broken into smaller fragments of a

    specified size, k. These k-mers then become "nodes" that are connected by "edges."

Common programs

    Spades (linky)

        Spades is a de bruijn graph method assembler that is designed to assemble

        small genomes, such as bacterial genomes. It uses a multisized de bruijn

        graph to guide assembly.     

    Ray (linky)

        Ray is suite of assemblers that includes: Ray (de novo assembly of single

        genomes), RayMeta (de novo assembly of metagenomes), RayCommunities

        (microbe abundance and taxonomic profiling), RayOntologies (gene ontology

        profiling), and RaySurveyor (compares genomic content between samples).

        Ray also has a web-interface, called Ray Cloud Browser.    

    Abyss (linky)

        de novo, parallel, paired-end sequence assembler designed for the assembly

        of short reads. There are two versions: ABySS (genomic) and Trans-ABySS

        (transcriptomic).    

=== AllPaths-LG (linky) ===            

     Trinity (Linky)

The Assemblathon

    The Assemblathon is a periodic, collaborative effort to test and improve the numerous

assemblers available. Thus far, two assemblathons have been completed (2011, linky and 2013,

linky) and a third is in progress (linky). Teams of researchers from across the world choose

a program and assemble genomes of model organisms whose genomes have been previously assembled

and annotated. The assemblies are then compared and evaluated using numerous metrics.

Assemblathon 1

        - Participants/softwares

            -

        - Results of selected metrics

            - N50 analysis

            - Fragment analysis

            - Gene length analysis

            - Bacterial contamination

            - PCA of metrics                

 Assemblathon 2

        - Participants/softwares

            -

        - Results of selected metrics

This template should only be used in the user namespace.This template should only be used in the user namespace.