WRGL Genotyping Pipeline

Part of the WRGL pipeline for NGS analysis.

GenotypingPIpelineWrapper class

  • Created for entire run with a "G" in the sample sheet.
  • Samples can be mixed G/P, later stages double-check for G before continuing.
  • GenotypingPipelineWrapper()
    • constructor
    • gets some parameters as arguments (from Programme.cs main method)
    • triggers ExecutePanelPipeline()
  • ExecutePanelPipeline()
    • Creates the genotyping analysis directory in the local run folder (<run>\Genotyping_<pipeline_version>)
    • Writes to log to indicate start of pipeline
    • If directory can't be created, throws an error.
    • Calls GetGenotypingRegions()
    • For each sample, creates a GenerateGenotypingVCFs object called genotypeAnalysis.
    • starts a new threaded process to align reads using genotypeAnalysis.MapReads()
    • Waits until all tasks are completed (i.e. all samples are aligned)
    • Calls GenerateGenotypingVCFs.CallSomaticVariants()
      • NOTE: called directly from the class, not the individual sample genotypeAnalysis objects
    • Calls WriteGenotypingReport()
    • Copies files to network (i.e. to Z:\ from local D:\)
    • Sends run completion email using AuxillaryFunctions.SendRunCompletionEmail()
  • GetGenotypingRegions()
    • Creates a BED file?
    • Pulls information from sampleSheet.getAnalyses - worth checking!
    • sampleSheet is a ParseSampleSheet object
    • Contains an ROIfile (possibly also a BED file? So reads a BED to make a BED…)
    • Works on objects at the class level, so there are no args or returns.
  • WriteGenotypingReport()
    • Uses GenerateGenotypingVCFs.CompressVariants() to get a make a file listing al QC-passing variants in the run
    • Calls GenerateGenotypingVCFs.CallSNPEff() to create a ParseVCF object, annotatedVCFFile, with annotations
    • Calls AnalyseCoverageFromAlignerOutput()
    • Writes a report header (columns for sample info, variant info, annotation…)
    • Iterates over each sample for genotyping (with "G" in samplesheet)
      • Then over each variant in that sample (using VCFFile.getVCFRecords["SampleID"])
        • Checks the vcf to confirm 1) filter passes 2) quality >= 30 3) depth >= 1000 (genotyping can get much higher depths than panels)
        • if OK, variant is added to mutantAmplicons list
        • if variant is annotated (i.e. it is found in annotatedVCFFile.getSnpEffAnnotations.ContainsKey(tempGenomicVariant)), writes each annotated variant (there may be more than one, possibly due to multiple transcripts?) to the report file.
        • If not annotation available, writes a shortened version to the file (this split is necessary to account for the different number of fields, and the source of the annotated data)
      • Checks every BED record for the sample - from BEDRecords
        • If not found in mutantAmplicons, writes to the report that region is failed.
  • AnalyseCoverageFromAlignerOutput()
    • For each genotyping sample, opens the stats file produced by amplicon aligner v2
    • extracts a field containing depth information from each line of that file (CHECK THE FILE??)
    • records in ampliconMinDP list.
  • AnalyseCoverageData()
    • WHERE IS THIS GETTING CALLED FROM? DOESN'T SEEM TO BE ANYWHERE IN THE GENOTYPING PIPELINE…
    • Uses samtools depth and the GenotypingRegions BED file produced by GetGenotypingRegions()
    • Runs samtools depth for each genotyping sample
    • NOT CALLED IN GENOTYPING PIPELINE - SEEMS TO BE BASED ON FUNCTION OF SAME NAME IN PANELS PIPELINE
    • POSSIBLY COPIED IN AS A STARTING POINT FOR AnalyseCoverageFromAlignerOutput BUT NOT REMOVED???

+GenerateGenotypingVCFs class

  • Object created for each genotyping sample in samplesheet
  • GenerateGenotypingVCFs()
    • constructor
    • loads some parameters that are passed as args on creation
    • No calls to any functions - just creates the object, all calls are external
  • MapReads()
    • Constructs a set of commands for alignment tools:
      • alignmentParameters
      • samtoolsSamtoBamParameters
      • samtoolsSortBamParameters
      • realignerTargetCreatorParameters
      • indelRealignerParameters
    • Uses Amplicon Aligner v2 (Source???) and GATK
    • Calls aligner using Process module - ? better handling of resources when object being spun up for each sample ?
    • Then processes SAM file - converts to BAM, sorts, and realigns around indels
    • Deletes all files except the final sorted, realigned BAM
    • ?some process are closed explicitly, but others aren't. Is there any good reason for this?
  • CallSomaticVariants()
    • Run level? Not called through the GenerateGenotypingVCFs object created for each sample - called through the class
    • Sets up the illumina somatic variant caller (built in to MiSeq reporter software)
    • Could list params, but not going to. Look it up.
    • somatic variant caller designed to detect low level (~5%) variants
    • Higher read depth leads to higher confidence in calls, hence 1000x minimum depth in GenotypingPipelineWrapper.WriteGenotypingReport()
  • CompressVariants()
    • Run level? Not called through the GenerateGenotypingVCFs object created for each sample - called through the class
    • For each genotyping sample, reads in the VCF file
    • Generates a list of all filter-passing variants for that sample
    • uniqueGenomicVariants is a hashset, so contains duplicate variants are not added
    • Prints each unique variant to a file in VCF format
    • NOTE: This VCF is not yet annotated
  • CallSNPEff()
    • Run level? Not called through the GenerateGenotypingVCFs object created for each sample - called through the class
    • Generates the command to run SNPEff
    • writes to a VCF file, and also to a local object
    • uses ParseVCF.ParseVCF() to interpret this object, and returns the ParseVCF object annotatedVCFFile
    • This object is used by GenotypingPipelineWrapper.WriteGenotypingReport() to make the final report output.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License