Jump to content

Talk:FASTQ format

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 138.253.68.174 (talk) at 14:28, 18 October 2018 (Would adding color to the FASTQ versions test make it clearer?). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Template:WikiProject Computational Biology

Technically fastq format is multi-lined, but the use of it in short-read sequencing obviously disguises this issue.

Hence sequences may be line-wrapped, and quality values too. Given that @ is a legal quality value and it may occur just after a newline in a line-wrapped quality string, care must be taken when parsing it. The ideal solution here is simply to count the number of bases in the sequence lines and then parse with the expectation of the same number of bases in the quality lines. (If after this there isn't a new sequence header immediately starting after the quality then the format is in error.)

Unfortunately many people have implemented broken parsers and so you'll sometimes see ghastly messes where the first quality value on each line has been changed to zero (ascii '!'). This is just a bug!

193.62.203.214 (talk) 15:36, 16 April 2009 (UTC) jkb[reply]

The Celera Assembler implements yet another quality format based on this theme...

The input for the Celera Assembler is a 'frg' file [1]

Apparently they take the (presumably Phred style) quality score and add 48 before converting to ascii for storage in the frg file. i.e. "chr(ord(0)+$qual)".

--Dan|(talk) 15:27, 30 July 2009 (UTC)[reply]

The AMOS .afg format uses the same encoding

IonTorrent quality range

I've seen some IonTorrent quality values and they seem have different range from sanger or illumina. However I don't have access to such machine or output so can't be sure. Can anyone with the machine confirm and put the range up? — Preceding unsigned comment added by Hena wp (talkcontribs) 18:25, 30 April 2013 (UTC)[reply]

Would adding color to the FASTQ versions test make it clearer?

  SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS.....................................................
  ..........................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX......................
  ...............................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII......................
  .................................JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ......................
  LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL....................................................
  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
  |                         |    |        |                              |                     |
 33                        59   64       73                            104                   126
  0........................26...31.......40                                
                           -5....0........9.............................40 
                                 0........9.............................40 
                                    3.....9.............................40 
  0........................26...31........41                               

 S - Sanger        Phred+33,  raw reads typically (0, 40)
 X - Solexa        Solexa+64, raw reads typically (-5, 40)
 I - Illumina 1.3+ Phred+64,  raw reads typically (0, 40)
 J - Illumina 1.5+ Phred+64,  raw reads typically (3, 40)
    with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator (bold) 
    (Note: See discussion above).
 L - Illumina 1.8+ Phred+33,  raw reads typically (0, 41)

Colors picked at random, and I don't absolutely guarantee that the alignment is correct. And there appears to be a problem with the J alignment in the original figure.

Tnabtaf (talk) 02:17, 22 October 2012 (UTC)[reply]

Got no comments; posting to page.

Tnabtaf (talk) 05:59, 22 January 2013 (UTC)[reply]

The Sanger FASTQ format has no limit on the range - it goes all the way up to ~ (93). After all there is no limit on either the Phred or Solexa quality scale. The same is probably true of the Solexa/Illumina<1.8 versions too, albeit that the sequencing machines never gave a value above X because it could never been *that* confident. It is unlikely that X is 40 for all of these tools. Moreover, it's incorrect to say that the FORMAT doesn't support values larger than 40, just because the tools that produced them do not. — Preceding unsigned comment added by 2A02:8071:B1C0:C01:84E:7023:C079:6527 (talk) 17:45, 11 December 2016 (UTC)[reply]

The alignment of the figure is extremely unclear - it suggests that "I" represents both Phred scores of both 40 and 41 in the two different Phred+33 lines

Sequence letter definitions?

I'm writing a fastq parser for Illumina exome data, and I found this article very useful! Thanks for writing it. The only data I see missing from this article that would aid me in completing the parser is sequence letter definitions. I see ACTG throughout the Illumina data, which makes sense, but I don't know what 'N' stands for. I'll figure it out, but it would be cool if sequence letters were documented here.WaywardGeek (talk) 12:00, 5 August 2013 (UTC)[reply]

Hello fellow Wikipedians,

I have just added archive links to one external link on FASTQ format. Please take a moment to review my edit. If necessary, add {{cbignore}} after the link to keep me from modifying it. Alternatively, you can add {{nobots|deny=InternetArchiveBot}} to keep me off the page altogether. I made the following changes:

When you have finished reviewing my changes, please set the checked parameter below to true to let others know.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—cyberbot IITalk to my owner:Online 19:21, 27 January 2016 (UTC)[reply]