Jump to content

Talk:FASTQ format: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
Hena wp (talk | contribs)
IonTorrent quality range question
m top: removing unsupported parameters in WikiProject banners
 
(12 intermediate revisions by 10 users not shown)
Line 1: Line 1:
{{talkheader}}
{{Talk header}}
{{WikiProject Computational Biology|class=B|importance=mid}}
{{WikiProject banner shell|class=B|
{{WikiProject Molecular Biology|COMPBIO=yes|COMPBIO-importance=mid}}
}}
{{Connected contributor|User1=Divon lan |U1-declared=yes| I have added information about Genozip for the benefit of the community. I am the author of Genozip.}}


==Untitled==
Technically fastq format is multi-lined, but the use of it in short-read sequencing obviously disguises this issue.
Technically fastq format is multi-lined, but the use of it in short-read sequencing obviously disguises this issue.


Line 21: Line 25:


=== IonTorrent quality range ===
=== IonTorrent quality range ===
I've seen some IonTorrent quality values and they seem have different range from sanger or illumina. However I don't have access to such machine or output so can't be sure. Can anyone with the machine confirm and put the range up?
I've seen some IonTorrent quality values and they seem have different range from sanger or illumina. However I don't have access to such machine or output so can't be sure. Can anyone with the machine confirm and put the range up? <small><span class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[User:Hena wp|Hena wp]] ([[User talk:Hena wp|talk]] • [[Special:Contributions/Hena wp|contribs]]) 18:25, 30 April 2013 (UTC)</span></small><!-- Template:Unsigned --> <!--Autosigned by SineBot-->


== Would adding color to the FASTQ versions test make it clearer? ==
== Would adding color to the FASTQ versions test make it clearer? ==
Line 56: Line 60:


[[User:Tnabtaf|Tnabtaf]] ([[User talk:Tnabtaf|talk]]) 05:59, 22 January 2013 (UTC)
[[User:Tnabtaf|Tnabtaf]] ([[User talk:Tnabtaf|talk]]) 05:59, 22 January 2013 (UTC)

The Sanger FASTQ format has no limit on the range - it goes all the way up to ~ (93). After all there is no limit on either the Phred or Solexa quality scale. The same is probably true of the Solexa/Illumina<1.8 versions too, albeit that the sequencing machines never gave a value above X because it could never been *that* confident. It is unlikely that X is 40 for all of these tools. Moreover, it's incorrect to say that the FORMAT doesn't support values larger than 40, just because the tools that produced them do not. <!-- Template:Unsigned IP --><small class="autosigned">—&nbsp;Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/2A02:8071:B1C0:C01:84E:7023:C079:6527|2A02:8071:B1C0:C01:84E:7023:C079:6527]] ([[User talk:2A02:8071:B1C0:C01:84E:7023:C079:6527#top|talk]]) 17:45, 11 December 2016 (UTC)</small> <!--Autosigned by SineBot-->

The alignment of the figure is extremely unclear - it suggests that "I" represents both Phred scores of both 40 and 41 in the two different Phred+33 lines <!-- Template:Unsigned IP --><small class="autosigned">—&nbsp;Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/138.253.68.174|138.253.68.174]] ([[User talk:138.253.68.174#top|talk]]) 14:28, 18 October 2018 (UTC)</small> <!--Autosigned by SineBot-->

== Sequence letter definitions? ==

I'm writing a fastq parser for Illumina exome data, and I found this article very useful! Thanks for writing it. The only data I see missing from this article that would aid me in completing the parser is sequence letter definitions. I see ACTG throughout the Illumina data, which makes sense, but I don't know what 'N' stands for. I'll figure it out, but it would be cool if sequence letters were documented here.[[User:WaywardGeek|WaywardGeek]] ([[User talk:WaywardGeek|talk]]) 12:00, 5 August 2013 (UTC)

== External links modified ==

Hello fellow Wikipedians,

I have just added archive links to {{plural:1|one external link|1 external links}} on [[FASTQ format]]. Please take a moment to review [https://en.wikipedia.org/enwiki/w/index.php?diff=prev&oldid=701986650 my edit]. If necessary, add {{tlx|cbignore}} after the link to keep me from modifying it. Alternatively, you can add {{tlx|nobots|deny{{=}}InternetArchiveBot}} to keep me off the page altogether. I made the following changes:
*Added archive https://web.archive.org/20100610232559/http://genomecenter.ucdavis.edu/dna_technologies/documents/pipeline_1_4.pdf to http://genomecenter.ucdavis.edu/dna_technologies/documents/pipeline_1_4.pdf

When you have finished reviewing my changes, please set the ''checked'' parameter below to '''true''' to let others know.

{{sourcecheck|checked=false}}

Cheers.—[[User:Cyberbot II|<sup style="color:green;font-family:Courier;">cyberbot II</sup>]]<small><sub style="margin-left:-14.9ex;color:green;font-family:Comic Sans MS;">[[User talk:Cyberbot II|<span style="color:green;">Talk to my owner</span>]]:Online</sub></small> 19:21, 27 January 2016 (UTC)

Latest revision as of 02:04, 19 May 2024

Untitled

[edit]

Technically fastq format is multi-lined, but the use of it in short-read sequencing obviously disguises this issue.

Hence sequences may be line-wrapped, and quality values too. Given that @ is a legal quality value and it may occur just after a newline in a line-wrapped quality string, care must be taken when parsing it. The ideal solution here is simply to count the number of bases in the sequence lines and then parse with the expectation of the same number of bases in the quality lines. (If after this there isn't a new sequence header immediately starting after the quality then the format is in error.)

Unfortunately many people have implemented broken parsers and so you'll sometimes see ghastly messes where the first quality value on each line has been changed to zero (ascii '!'). This is just a bug!

193.62.203.214 (talk) 15:36, 16 April 2009 (UTC) jkb[reply]

The Celera Assembler implements yet another quality format based on this theme...

[edit]

The input for the Celera Assembler is a 'frg' file [1]

Apparently they take the (presumably Phred style) quality score and add 48 before converting to ascii for storage in the frg file. i.e. "chr(ord(0)+$qual)".

--Dan|(talk) 15:27, 30 July 2009 (UTC)[reply]

The AMOS .afg format uses the same encoding

[edit]

IonTorrent quality range

[edit]

I've seen some IonTorrent quality values and they seem have different range from sanger or illumina. However I don't have access to such machine or output so can't be sure. Can anyone with the machine confirm and put the range up? — Preceding unsigned comment added by Hena wp (talkcontribs) 18:25, 30 April 2013 (UTC)[reply]

Would adding color to the FASTQ versions test make it clearer?

[edit]
  SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS.....................................................
  ..........................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX......................
  ...............................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII......................
  .................................JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ......................
  LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL....................................................
  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
  |                         |    |        |                              |                     |
 33                        59   64       73                            104                   126
  0........................26...31.......40                                
                           -5....0........9.............................40 
                                 0........9.............................40 
                                    3.....9.............................40 
  0........................26...31........41                               

 S - Sanger        Phred+33,  raw reads typically (0, 40)
 X - Solexa        Solexa+64, raw reads typically (-5, 40)
 I - Illumina 1.3+ Phred+64,  raw reads typically (0, 40)
 J - Illumina 1.5+ Phred+64,  raw reads typically (3, 40)
    with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator (bold) 
    (Note: See discussion above).
 L - Illumina 1.8+ Phred+33,  raw reads typically (0, 41)

Colors picked at random, and I don't absolutely guarantee that the alignment is correct. And there appears to be a problem with the J alignment in the original figure.

Tnabtaf (talk) 02:17, 22 October 2012 (UTC)[reply]

Got no comments; posting to page.

Tnabtaf (talk) 05:59, 22 January 2013 (UTC)[reply]

The Sanger FASTQ format has no limit on the range - it goes all the way up to ~ (93). After all there is no limit on either the Phred or Solexa quality scale. The same is probably true of the Solexa/Illumina<1.8 versions too, albeit that the sequencing machines never gave a value above X because it could never been *that* confident. It is unlikely that X is 40 for all of these tools. Moreover, it's incorrect to say that the FORMAT doesn't support values larger than 40, just because the tools that produced them do not. — Preceding unsigned comment added by 2A02:8071:B1C0:C01:84E:7023:C079:6527 (talk) 17:45, 11 December 2016 (UTC)[reply]

The alignment of the figure is extremely unclear - it suggests that "I" represents both Phred scores of both 40 and 41 in the two different Phred+33 lines — Preceding unsigned comment added by 138.253.68.174 (talk) 14:28, 18 October 2018 (UTC)[reply]

Sequence letter definitions?

[edit]

I'm writing a fastq parser for Illumina exome data, and I found this article very useful! Thanks for writing it. The only data I see missing from this article that would aid me in completing the parser is sequence letter definitions. I see ACTG throughout the Illumina data, which makes sense, but I don't know what 'N' stands for. I'll figure it out, but it would be cool if sequence letters were documented here.WaywardGeek (talk) 12:00, 5 August 2013 (UTC)[reply]

[edit]

Hello fellow Wikipedians,

I have just added archive links to one external link on FASTQ format. Please take a moment to review my edit. If necessary, add {{cbignore}} after the link to keep me from modifying it. Alternatively, you can add {{nobots|deny=InternetArchiveBot}} to keep me off the page altogether. I made the following changes:

When you have finished reviewing my changes, please set the checked parameter below to true to let others know.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—cyberbot IITalk to my owner:Online 19:21, 27 January 2016 (UTC)[reply]