An assembly of reads, contigs and scaffolds

A blog on all things newbler and beyond

Archive for March, 2010

Newbler output II: contigs and scaffolds sequence files, and the 454Scaffolds.txt file

Posted by lexnederbragt on March 22, 2010

The files most people are after when they do an assembly must be these: the actual contig and scaffold sequences. The contigs are in the files 454AllContigs and 454LargeContigs. ‘All’ indicates by default contigs of at least 100 bp, while ‘Large’ contigs are at least 500 bp. These lower limits can be set during assembly.
The ‘fna’ files contain the sequences (bases) in fasta format (I actually do not why this extension was chosen over ‘fasta’ or ‘fa’ which are most often used). The ‘qual’ files contain phred-like quality scores (see previous post). The contigs are in the same order between fna and qual files, and the quality scores are in the same order as the bases:

Read the rest of this entry »

Posted in Newbler output | Tagged: , , , , , , , | 19 Comments »

Newbler output I: the 454NewblerMetrics.txt file

Posted by lexnederbragt on March 11, 2010

With this post, I’ll start going through the output files newbler generates. Some of these will be described in detail as they contain a lot of important information.

For today’s post, we’ll start with the 454NewblerMetrics.txt file. This file contains a lot of details on the reads used during the assembly, as well as the resulting contigs and, in the case of paired end reads, scaffolds.

The file starts with some metadata, such as the date of the assembly, where is is located, and what version of newbler was used. For this post, I used a file of as assembly generated with version 2.3 and both shotgun and paired end read files. Note that the output will be slightly different for a mapping project (to be described in a later post) than for an assembly project.

Section 1: runData

path = "/your/path/yourfile1.sff";

numberOfReads = ######, ######;
numberOfBases = #########, #########;

For each input file, the numbers mentioned are reads and bases in the file, reads and bases after trimming.

Read the rest of this entry »

Posted in Newbler output | Tagged: , , , , | 36 Comments »