/software-guides

How to use GATK best practices for calling variants?

Learn to efficiently call variants using GATK's best practices, from environment setup and data preparation to variant evaluation and documentation.

Get free access to thousands LifeScience jobs and projects!

Get free access to thousands of LifeScience jobs and projects actively seeking skilled professionals like you.

Get Access to Jobs

How to use GATK best practices for calling variants?

 

Set Up Your Environment

 

  • Ensure you have the latest version of GATK installed. You can download it from the GATK website.
  •  

  • Install Java Runtime Environment (recommended version 8 or later).
  •  

  • Have access to necessary reference files like the human genome reference sequence (e.g., hg19 or hg38), and known sites of variation (e.g. dbSNP, Mills and 1000G gold standard indels).

 

Prepare Your Data

 

  • Obtain high-quality FASTQ files from your sequencer, ensuring the data is correctly paired (for paired-end reads).
  •  

  • Use BWA (Burrows-Wheeler Aligner) to map the reads to the reference genome. The command line might look like: bwa mem -t 4 reference.fa sample_R1.fastq sample_R2.fastq > alignment.sam.
  •  

  • Convert the SAM files to BAM, and sort them using samtools: samtools sort alignment.sam -o sorted\_alignment.bam.

 

Mark Duplicates

 

  • Use GATK's MarkDuplicates to mark duplicate reads in the BAM file: gatk MarkDuplicates -I sorted\_alignment.bam -O deduped.bam -M metrics.txt.
  •  

  • This step helps to reduce bias in variant calling by eliminating PCR duplicates.

 

Base Quality Score Recalibration (BQSR)

 

  • Create a recalibration table using known sites of variation: gatk BaseRecalibrator -I deduped.bam -R reference.fa --known-sites dbsnp.vcf --known-sites mills.vcf -O recal\_data.table.
  •  

  • Apply the recalibration to your BAM file: gatk ApplyBQSR -R reference.fa -I deduped.bam --bqsr-recal-file recal\_data.table -O recalibrated.bam.

 

Variant Calling

 

  • Use HaplotypeCaller to call variants on the recalibrated data: gatk HaplotypeCaller -R reference.fa -I recalibrated.bam -O raw\_variants.vcf.
  •  

  • This generates a VCF file containing raw variant calls.

 

Variant Filtering

 

  • Use VariantFiltration to apply filters to your raw VCF file, based on quality metrics: gatk VariantFiltration -R reference.fa -V raw_variants.vcf --filter-expression "QUAL < 30.0" --filter-name "LowQual" -O filtered_variants.vcf.
  •  

  • To refine results for SNPs, use GATK’s VariantRecalibrator for SNPs and indels if additional annotations are available.

 

Evaluate and Validate Variants

 

  • Ensure the integrity of your variant calls by comparing against a gold-standard dataset if available.
  •  

  • Use tools such as vcftools or bcftools to analyze your VCF file and extract insights or generate reports.

 

Document and Archive

 

  • Keep detailed records of the parameters and versions of software used in your variant calling process for replication and review purposes.
  •  

  • Store your results, along with reference data and BAM files, in a secure location in compliance with data management policies.

 

Explore More Valuable LifeScience Software Tutorials

How to optimize Bowtie for large genomes?

Optimize Bowtie for large genomes by tuning parameters, managing memory, building indexes efficiently, and using multi-threading for improved performance and accuracy.

Read More

How to normalize RNA-seq data in DESeq2?

Guide to normalizing RNA-seq data in DESeq2: Install DESeq2, prepare data, create DESeqDataSet, normalize, check outliers, and use for analysis.

Read More

How to add custom tracks in UCSC Browser?

Learn to add custom tracks to the UCSC Genome Browser. This guide covers data preparation, uploading, and customization for enhanced genomic analysis.

Read More

How to interpret Kraken classification outputs?

Learn to interpret Kraken outputs for taxonomic classification, from setup and input preparation to executing commands, analyzing results, and troubleshooting issues.

Read More

How to fix STAR index generation issues?

Learn to troubleshoot STAR index generation by checking software compatibility, verifying input files, adjusting memory settings, and consulting documentation for solutions.

Read More

How to boost HISAT2 on HPC systems?

Boost HISAT2 on HPC by optimizing file I/O, tuning parameters, leveraging scheduler features, utilizing shared memory, monitoring performance, executing in parallel, and fine-tuning indexing.

Read More

Join as an expert
Project Team
member

Join Now

Join as C-Level,
Advisory board
member

Join Now

Search industry
job opportunities

Search Jobs

How It Works

1

Create your profile

Sign up and showcase your skills, industry, and therapeutic expertise to stand out.

2

Search Projects

Use filters to find projects that match your interests and expertise.

3

Apply or Get Invited

Submit applications or receive direct invites from companies looking for experts like you.

4

Get Tailored Matches

Our platform suggests projects aligned with your skills for easier connections.