Pileup format
Appearance
Pileup format is a text-based format for summarizing the base calls of aligned reads to a reference sequence. This format facilitates SNP/indel calling and brief alignment viewing by eyes. It was first used by Tony Cox and Zemin Ning at the Wellcome Trust Sanger Institute, but became widely known through its implementation within the SAMtools software suite. [1]
Format
Example
seq1 272 T 24 ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<& seq1 273 T 23 ,.....,,.,.,...,,,.,..A <<<;<<<<<<<<<3<=<<<;<<+ seq1 274 T 23 ,.$....,,.,.,...,,,.,... 7<7;<;<<<<<<<<<=<;<;<<6 seq1 275 A 23 ,$....,,.,.,...,,,.,...^l. <+;9*<<<<<<<<<=<<:;<<<< seq1 276 G 22 ...T,,.,.,...,,,.,.... 33;+<<7=7<<7<&<<1;<<6< seq1 277 T 22 ....,,.,.,.C.,,,.,..G. +7<;<<<<<<<&<=<<:;<<&< seq1 278 G 23 ....,,.,.,...,,,.,....^k. %38*<<;<7<<7<=<<<;<<<<< seq1 279 C 23 A..T,,.,.,...,,,.,..... ;75&<<<<<<<<<=<<<9<<:<<<
The columns
Each line consists of 5 (or optionally 6) tab-separated columns:
- Sequence identifier
- Position in sequence (starting from 1)
- Nucleotide at that position
- Number of aligned reads covering that position (depth of coverage)
- Bases at that position from aligned reads
- Mapping quality of those bases (OPTIONAL)
Column 5: The bases string
- . (dot) means a base that matched the reference on the forward strand
- , (comma) means a base that matched the reference on the reverse strand
- AGTCN denotes a base that did not match the reference on the forward strand
- agtcn denotes a base that did not match the reference on the reverse strand
- +[0-9]+[ACGTNacgtn]+ denotes an insertion of one or more bases
- -[0-9]+[ACGTNacgtn]+ denotes a deletion of one or more bases
- ^ (carat) marks the start of a read segment
- $ (dollar) marks the end of a read segment
- ~ (tilde)
Column 6: The mapping quality string
This is an optional column. If present, the ASCII value of the character minus 33 gives the mapping Phred quality of each of the bases in the previous column 5. This is similar to quality encoding in the FASTQ format.
File extension
There is no standard file extension for a Pileup file, but .pileup is commonly used.