Jump to content

Talk:FASTQ format

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Alexbateman (talk | contribs) at 10:55, 13 June 2011 (assess). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Template:WikiProject Computational Biology

Technically fastq format is multi-lined, but the use of it in short-read sequencing obviously disguises this issue.

Hence sequences may be line-wrapped, and quality values too. Given that @ is a legal quality value and it may occur just after a newline in a line-wrapped quality string, care must be taken when parsing it. The ideal solution here is simply to count the number of bases in the sequence lines and then parse with the expectation of the same number of bases in the quality lines. (If after this there isn't a new sequence header immediately starting after the quality then the format is in error.)

Unfortunately many people have implemented broken parsers and so you'll sometimes see ghastly messes where the first quality value on each line has been changed to zero (ascii '!'). This is just a bug!

193.62.203.214 (talk) 15:36, 16 April 2009 (UTC) jkb[reply]

The Celera Assembler implements yet another quality format based on this theme...

The input for the Celera Assembler is a 'frg' file [1]

Apparently they take the (presumably Phred style) quality score and add 48 before converting to ascii for storage in the frg file. i.e. "chr(ord(0)+$qual)".

--Dan|(talk) 15:27, 30 July 2009 (UTC)[reply]

The AMOS .afg format uses the same encoding