Basic Pipeline
alignment
- build index (if needed - how to check?)
$ bowtie2-build -f <reference.fa>
- align reads
$ bowtie2 -p <num_cores> \
-x <index> \
-U <unpaired_reads.fq> \
-S <output.sam>
- QC alignment
- samtools flagstat etc.
- %ages mapped, unmapped, duplicated, etc.
- http://jura.wi.mit.edu/bio/education/hot_topics/NGS_QC_mapping_Feb2014/NGS2014.pdf
- http://bioinformatics.org.au/ws14/wp-content/uploads/ws14/sites/5/2014/07/Felicity-Newell_presentation.pdf
- indel realignment?
- SAM -> BAM
$ samtools view -@ <num_cores> \
-b <input.sam> \
-o <output.bam>
- sort BAM
$ samtools sort -@ <num_cores> <input.bam> <output.sorted.bam>
- index BAM
$ samtools index <input.sorted.bam>
Call variants
- filter by BED (optional)
- call using Freebayes
- quite sensitive
- actual variants will have quite high score
- possibly parallelisable, using freebayes-parallel.sh
$ freebayes -f <ref.fa> <alignment.sorted.bam> > <output.vcf>
Filter results
- Apply a simple filter to the results - remove all variants with a QUAL scores less than 30
- 2-step process:
- apply filter to VCF file - does not remove any lines
$ cat <variants.vcf> | vcf-annotate -f Q=30 > <variants.filtered.vcf>
- Can then remove all failing variants
$ grep -E "^#" <variants.filtered.vcf> > > <final.vcf> # select lines starting with '#' - VCF header lines
$ grep "PASS" <variants.filtered.vcf> >> <final.vcf> # vcf-annotate adds "PASS" to FILTER column of variants scoring above threshold
# the double redirect ">>" appends an existing file.
page revision: 13, last edited: 11 Jun 2015 13:39