An assembly of reads, contigs and scaffolds

A blog on all things newbler and beyond

Archive for the ‘Using newbler’ Category

What is new in newbler 2.6

Posted by lexnederbragt on July 12, 2011

The latest version of newbler, version 2.6, has some welcome additions for input and output. As I have so far only treated de novo assembly, I will skip the updates on the gsMapper (except for mentioning that it is now able to provide a bam file using the -bam option).

Read the rest of this entry »

Posted in Using newbler | Tagged: , , , | 21 Comments »

What is new in newbler version 2.5.3

Posted by lexnederbragt on March 22, 2011

(source: Wikimedia commons)

Recently, newbler version 2.5.3 became available. With this post, I’ll describe the changes between this version, and the previous (2.3). As I have not yet described the gsMapper function of newbler, I here only dicuss changes relevant to assembly (gsAssembler, runAssembly).

Read the rest of this entry »

Posted in Using newbler | Tagged: , , , , | 14 Comments »

Running newbler: de novo transcriptome assembly I

Posted by lexnederbragt on August 31, 2010

RNA (source: wikimedia.org)

Since version 2.3, newbler has a -cdna option for de novo transcriptome assembly. In this post, I’ll explain the principles and setting up the transcriptome assembly. The next post will discuss the output of a transcriptome assembly.

1) Principles of transcriptome assembly

As with other assembly projects, the first steps for transcriptome assembly are identical, and newbler builds a contig graph, see this post. Ideally, the reads coming from the transcript of a certain gene should result in a single contig. However, because of splice-variants (and other sequence particularities), there may be several contigs for each transcript, which themselves form a small contig graph. Splice-variants will result in reads that , relative to other reads have an insert (representing an additional exon in the transcript), thereby breaking the contig graph, see the figure.

Relationship between exons, contigs and isotigs

So, for transcriptome projects, there will be numerous subgraphs each potentially representing one gene. Each of these subgraphs are called an isogroup. Next, newbler will traverse the contigs in the subgraphs of each isogroup to generate transcript variants, which are called isotigs, again, see the figure. There are certain rules for this traversing step, for example, for starting the path and for ending it. Another rule, for complex graphs, is a cutoff such that no more than a maximum number of isotigs are generated per isogroup (by default set to 100 isotigs). If fully traversing the graph will result in more isotigs than this maximum, the contigs of this isogroup are reported in the output instead of the isotigs. Read the rest of this entry »

Posted in How it works, Using newbler | Tagged: , , , , , , | 4 Comments »

Running newbler: more de novo assembly parameters (and a hidden one)

Posted by lexnederbragt on July 16, 2010

Trimming (reads) by running newbler

There is a long list of options/flags/parameters for a newbler assembly, some of which have been treated in the previous post. In this post I will describe some more parameters. At the end, as a bonus, I will share a parameter that is not mentioned in the current documentation…

-ss -sl -sc -ais -ads
These parameters control read overlap detection (there are two more, -mi and -ml, which I described in the previous post). More on seeds and overlap detection is described in the post explaining how newbler works. I never change these parameters as I assume 454 has done a good job optimizing them. But I would love to hear from people that have tried the effect of adjusting these parameters… Read the rest of this entry »

Posted in Using newbler | 8 Comments »

Running newbler: de novo assembly

Posted by lexnederbragt on June 10, 2010

This post is about how to start up newbler for de novo assembly projects. I will describe setting up newbler using the command line. Most of the options I will mention are also available through the GUI version, but I will not describe how to use them here.

For a description of the progress that newbler reports during assembly, please check this post. For a description of the different output files, these are described in a series of previous posts.

1) default newbler on one or more files

runAssembly /data/sff/EYV886410.sff

This is the most simple way of running newbler: just provide it with one sff file. It will generate a folder called along the lines of P_yyyy_mm_dd_hh_min_sec_runAssembly and put all output in there. If you want to have control over the name of this folder, use

runAssembly -o project1 /data/sff/EYV886410.sff

-o describes the name of the folder newbler will provide all output in, in this case ‘project1’

Read the rest of this entry »

Posted in Using newbler | Tagged: , , , , , | 65 Comments »