An assembly of reads, contigs and scaffolds

A blog on all things newbler and beyond

Archive for September, 2011

Newbler input III: a quick fix for the new Illumina fastq header

Posted by lexnederbragt on September 1, 2011

One unfortunate drawback of working with Illumina sequences is the many changes to the format of their fastq readfiles. The quality scoring has been changed several times since the first Solexa reads become available. It appears they have now settled on the Sanger style, see this wikipedia entry.

(Source: thepoolandspashoponline.com.au)

Regrettably, with their latest software upgrade (Casava 1.8), the headers (sequence identifiers) in the fastq files have changed. The change is described in the aforementioned wikipedia entry; basically, some elements have been added, some have changed order, and there are now two parts seperated by a space.

I wouldn’t have written this blogpost if this change had not been relevant for newbler: we were lucky enough to enjoy direct reading of Illumina fastq files (with newbler determining the quality scoring type) starting with newbler 2.6. newbler also matches mate-pairs (Illumina read 1 and read 2), so that these can be used as paired-ends by newbler (to build scaffolds). By the way, FASTQ files from the NCBI/EBI Sequence Read Archive are also correctly parsed for mate pairs, but here the filename is used for determining read 1 and read 2.

Read the rest of this entry »

Advertisements

Posted in Newbler input | Tagged: , , , | 5 Comments »