View on GitHub

schoolwiki

Reference Generation

Trim Galore

No References

DNA ALignment

Human Ref:

mkdir humanref
cp GRCh38.fa humanref/genome.fa
cd humanref
bwa index -a bwtsw genome.fa
cd ..
tar cfz humanref.tar.gz humaref

Virus Ref:

mkdir virusref
cp GRCh38.fa virusref/genome.fa
cd virusref
bwa index -a bwtsw genome.fa
cd ..
tar cfz virusref.tar.gz humaref

Mark Duplicates

mkdir humanref
cp GRCh38.fa humanref/genome.fa
cd humanref
bwa index -a bwtsw genome.fa
cd ..
tar cfz humanref.tar.gz humaref

DNA QC

Reference Genome

mkdir references
cp GRCh38.fa references/genome.fa
samtools faidx references/genome.fa
cut -f 1,2 references/genome.fa.fai > genomefile.txt
tar cfz ref.tar.gz references

Panel Reference

tar cfz panel.tar.gz targetpanel.bed

GATK BQSR

Reference Genome

mkdir reference
mv GRCh38.fa reference/genome.fa

SNV and Indel

Reference Databases

Assembly-Based Reference

mkdir reference
mv GRCh38.fa reference/genome.fa
cd reference
java -jar picard.jar CreateSequenceDictionary R=genome.fa O=genome.dict
samtools faidx genome.fa
cut -f 1,2 genome.fa.fai > genomefile.txt
bedtools makewindows -g genomefile.chr.txt -w 5000000 | awk '{print ":""-"}'|sed 's/:0-/:1-/' > genomefile.5M.txt

Create Tar Gzip File

cd ..
tar cfz ref.tar.gz reference

Panel Reference

mv 1000g_pon.hg38.vcf.gz mutect.pon.vcf.gz
tar cfz panel.tar.gz mutect.pon.vcf.gz targetpanel.bed

SV Calling

Reference Genome

mkdir reference
mv GRCh38.fa reference/genome.fa
cd reference
java -jar picard.jar CreateSequenceDictionary R=genome.fa O=genome.dict
samtools faidx genome.fa
cut -f 1,2 genome.fa.fai > genomefile.txt
fasta_generate_regions.py genome.fa.fai 5000000 > genomefile.5M.txt
pindel_genes.bed
cd ..
tar cfz ref.tar.gz reference

Panel Reference

tar cfz panel.tar.gz cnvkit.targets.bed cnvkit.antitargets.bed pon.cnn targetpanel.bed

CVNKit Reference Files can be generated using the cnvkit_createpanelref app

PINDEL can be very slow – if you just want to use PINDEL only for ITD detection, then you can create file with the positions of the relavent genes

Union

Reference

java -jar picard.jar CreateSequenceDictionary R=genome.fa O=genome.dict