
We are searching data for your request:
Upon completion, a link will appear to access the found materials.
I've used BWA to map my NGS reads against the hg38 genome, and I have a BAM file. I'm not doing genome assembly, and my reference genome file has the human chromosomes. Thus, I shouldn't have "contigs". But…
https://broadinstitute.github.io/picard/command-line-overview.html#ReorderSam
and quote:
ReorderSam reorders reads in a SAM/BAM file to match the contig ordering in a provided reference file, as determined by exact name matching of contigs
Q: What doescontig ordering
mean, for my whole-genome-sequencing experiment? In particular, what does matching the contig against a reference file mean?
I'm not familiar with picard and their reorderSam function, but as far as I know/understand from their documentation they mean this:
The ordering of the contigs while using a reference sequence. Like this:
Figure 5: Anatomy of whole-genome assembly. In whole-genome assembly, the BAC fragments (red line segments) and the reads from five individuals (black line segments) are combined to produce a contig and a consensus sequence (green line). The contigs are connected into scaffolds, shown in red, by pairing end sequences, which are also called mates. If there is a gap between consecutive contigs, it has a known size. Next, the scaffolds are mapped to the genome (gray line) using sequence tagged site (STS) information, represented by blue stars. © 2001 American Association for the Advancement of Science Venter, C. et al. The sequence of the human genome. Science 291, 1304-1351 (2001). All rights reserved. (source)
ReorderSAM (Picard) So in Picard you have yourINPUT (File)
, the reads in this file are then mapped on theREFERENCE (File)
. This can also be seen in their code:
// write the reads in contig order 109 for (final SAMSequenceRecord contig : refDict.getSequences() ) { 110 final SAMRecordIterator it = in.query(contig.getSequenceName(), 0, 0, false); 111 writeReads(out, it, newOrder, contig.getSequenceName()); 112 }
(code source)
ReorderSam reorders reads in a SAM/BAM file to match the contig ordering in a provided reference file
Some more background
There are two main approches two obtain a genome sequence:
there are two "main" approches for this:
g. Second-generation sequencing technologies produce millions of short(a few hundred bp) strings of nucleotides (reads), which is ideal for resequencing when reads are mapped to a reference genome (reference-based assembly). De novo genome assembly based on second-generation sequencing is challenging due to difficulties with GC- or AT-rich and homonucleotide DNA stretches, which are under-represented in the sequencing output (source)
The characteristics of these are:
de novo
- no bias towards a reference genome
- no template to adapt to
- the assembly is normally more fragmented
- it normally works better for large-scale/median scale differences (source)
reference mapping
- less contigs
- in most methods the reads that don't map are not used in the final sequence (this is also the case with reorderSAM:
Reads mapped to contigs absent in the new reference are dropped
- you look what is similar to your reference genome
- SNPs and very small veriations are more easily positioned and compared among groups (source)
I would highly recommend to watch this short animation to differentiate between these two and understand what reference genome mapping is.