jpayne@69: #!/bin/bash jpayne@69: jpayne@69: usage(){ jpayne@69: echo " jpayne@69: Written by Brian Bushnell jpayne@69: Last modified October 6, 2020 jpayne@69: jpayne@69: Description: Calls variants from sam or bam input. jpayne@69: In default mode, all input files are combined and treated as a single sample. jpayne@69: In multisample mode, each file is treated as an individual sample, jpayne@69: and gets its own column in the VCF file. Unless overridden, input file jpayne@69: names are used as sample names. jpayne@69: Please read bbmap/docs/guides/CallVariantsGuide.txt for more information, jpayne@69: or bbmap/pipelines/variantPipeline.sh for a usage example. jpayne@69: jpayne@69: Usage: callvariants.sh in= ref= vcf= jpayne@69: jpayne@69: Input may be sorted or unsorted. jpayne@69: The reference should be fasta. jpayne@69: jpayne@69: I/O parameters: jpayne@69: in= Input; may be one file or multiple comma-delimited files. jpayne@69: list= Optional text file containing one input file per line. jpayne@69: Use list or in, but not both. jpayne@69: out= Output variant list in var format. If the name ends jpayne@69: with .vcf then it will be vcf format. jpayne@69: vcf= Output variant list in vcf format. jpayne@69: outgff= Output variant list in gff format. jpayne@69: ref= Reference fasta. Required to display ref alleles. jpayne@69: Variant calling wil be more accurate with the reference. jpayne@69: vcfin= Force calls at these locations, even if allele count is 0. jpayne@69: shist= (scorehist) Output for variant score histogram. jpayne@69: zhist= (zygosityhist) Output for zygosity histogram. jpayne@69: qhist= (qualityhist) Output for variant base quality histogram. jpayne@69: overwrite=f (ow) Set to false to force the program to abort rather than jpayne@69: overwrite an existing file. jpayne@69: extended=t Print additional variant statistics columns. jpayne@69: sample= Optional comma-delimited list of sample names. jpayne@69: multisample=f (multi) Set to true if there are multiple sam/bam files, jpayne@69: and each should be tracked as an individual sample. jpayne@69: vcf0= Optional comma-delimited list of per-sample outputs. jpayne@69: Only used in multisample mode. jpayne@69: bgzip=t Use bgzip for gzip compression. jpayne@69: samstreamer=t (ss) Load reads multithreaded to increase speed. jpayne@69: Disable to reduce the number of threads used. The number of jpayne@69: streamer threads can be set with e.g. 'ss=4'; default is 6. jpayne@69: streamermf=8 (ssmf) Allow multiple sam files to be read simultaneously. jpayne@69: Set ssmf=X to specify the maximum number or ssmf=f jpayne@69: to disable. jpayne@69: jpayne@69: Processing Parameters: jpayne@69: prefilter=f Use a Bloom filter to exclude variants seen fewer than jpayne@69: minreads times. Doubles the runtime but greatly reduces jpayne@69: memory usage. The results are identical. jpayne@69: coverage=t (cc) Calculate coverage, to better call variants. jpayne@69: ploidy=1 Set the organism's ploidy. jpayne@69: rarity=1.0 Penalize the quality of variants with allele fraction jpayne@69: lower than this. For example, if you are interested in jpayne@69: 4% frequency variants, you could set both rarity and jpayne@69: minallelefraction to 0.04. This is affected by ploidy - jpayne@69: a variant with frequency indicating at least one copy jpayne@69: is never penalized. jpayne@69: covpenalty=0.8 (lowcoveragepenalty) A lower penalty will increase the jpayne@69: scores of low-coverage variants, and is useful for jpayne@69: low-coverage datasets. jpayne@69: useidentity=t Include average read identity in score calculation. jpayne@69: usepairing=t Include pairing rate in score calculation. jpayne@69: usebias=t Include strand bias in score calculation. jpayne@69: useedist=t Include read-end distance in score calculation. jpayne@69: homopolymer=t Penalize scores of substitutions matching adjacent bases. jpayne@69: nscan=t Consider the distance of a variant from contig ends when jpayne@69: calculating strand bias. jpayne@69: callsub=t Call substitutions. jpayne@69: calldel=t Call deletions. jpayne@69: callins=t Call insertions. jpayne@69: calljunct=f Call junctions (in development). jpayne@69: nopassdot=f Use . as genotype for variations failing the filter. jpayne@69: jpayne@69: Coverage Parameters (these mainly affect speed and memory use): jpayne@69: 32bit=f Set to true to allow coverage tracking over depth 65535, jpayne@69: which increases memory use. Variant calls are impacted jpayne@69: where coverage exceeds the maximum. jpayne@69: atomic=auto Increases multithreaded speed; forces 32bit to true. jpayne@69: Defaults to true if there are more than 8 threads. jpayne@69: strandedcov=f (stranded) Tracks per-strand ref coverage to print the MCOV jpayne@69: and DP4 fields. Requires more memory when enabled. Strand jpayne@69: of variant reads is tracked regardless of this flag. jpayne@69: jpayne@69: Trimming Parameters: jpayne@69: border=5 Trim at least this many bases on both ends of reads. jpayne@69: qtrim=r Quality-trim reads on this end jpayne@69: r: right, l: left, rl: both, f: don't quality-trim. jpayne@69: trimq=10 Quality-trim bases below this score. jpayne@69: jpayne@69: Realignment Parameters: jpayne@69: realign=f Realign all reads with more than a couple mismatches. jpayne@69: Decreases speed. Recommended for aligners other than BBMap. jpayne@69: unclip=f Convert clip symbols from exceeding the ends of the jpayne@69: realignment zone into matches and substitutitions. jpayne@69: repadding=70 Pad alignment by this much on each end. Typically, jpayne@69: longer is more accurate for long indels, but greatly jpayne@69: reduces speed. jpayne@69: rerows=602 Use this many rows maximum for realignment. Reads longer jpayne@69: than this cannot be realigned. jpayne@69: recols=2000 Reads may not be aligned to reference seqments longer jpayne@69: than this. Needs to be at least read length plus jpayne@69: max deletion length plus twice padding. jpayne@69: msa= Select the aligner. Options: jpayne@69: MultiStateAligner11ts: Default. jpayne@69: MultiStateAligner9PacBio: Use for PacBio reads, or for jpayne@69: Illumina reads mapped to PacBio/Nanopore reads. jpayne@69: jpayne@69: Sam-filtering Parameters: jpayne@69: minpos= Ignore alignments not overlapping this range. jpayne@69: maxpos= Ignore alignments not overlapping this range. jpayne@69: minreadmapq=4 Ignore alignments with lower mapq. jpayne@69: contigs= Comma-delimited list of contig names to include. These jpayne@69: should have no spaces, or underscores instead of spaces. jpayne@69: secondary=f Include secondary alignments. jpayne@69: supplimentary=f Include supplimentary alignments. jpayne@69: duplicate=f Include reads flagged as duplicates. jpayne@69: invert=f Invert sam filters. jpayne@69: jpayne@69: Variant-Calling Cutoffs: jpayne@69: minreads=2 (minad) Ignore variants seen in fewer reads. jpayne@69: maxreads=BIG (maxad) Ignore variants seen in more reads. jpayne@69: mincov=0 Ignore variants in lower-coverage locations. jpayne@69: maxcov=BIG Ignore variants in higher-coverage locations. jpayne@69: minqualitymax=15 Ignore variants with lower max base quality. jpayne@69: minedistmax=20 Ignore variants with lower max distance from read ends. jpayne@69: minmapqmax=0 Ignore variants with lower max mapq. jpayne@69: minidmax=0 Ignore variants with lower max read identity. jpayne@69: minpairingrate=0.1 Ignore variants with lower pairing rate. jpayne@69: minstrandratio=0.1 Ignore variants with lower plus/minus strand ratio. jpayne@69: minquality=12.0 Ignore variants with lower average base quality. jpayne@69: minedist=10.0 Ignore variants with lower average distance from ends. jpayne@69: minavgmapq=0.0 Ignore variants with lower average mapq. jpayne@69: minallelefraction=0.1 Ignore variants with lower allele fraction. This jpayne@69: should be adjusted for high ploidies. jpayne@69: minid=0 Ignore variants with lower average read identity. jpayne@69: minscore=20.0 Ignore variants with lower Phred-scaled score. jpayne@69: clearfilters Clear all filters. Filter flags placed after jpayne@69: the clearfilters flag will still be applied. jpayne@69: jpayne@69: There are additionally max filters for score, quality, mapq, allelefraction, jpayne@69: and identity. jpayne@69: jpayne@69: Other Parameters: jpayne@69: minvarcopies=0 If set to 0, a genotype (vcf GT field) of 0 or 0/0 jpayne@69: will be called if observed allele frequency suggests jpayne@69: this is a minor allele. If set to 1, GT field will jpayne@69: contain at least one 1. jpayne@69: jpayne@69: Java Parameters: jpayne@69: -Xmx This will set Java's memory usage, overriding autodetection. jpayne@69: -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will jpayne@69: specify 200 megs. The max is typically 85% of physical memory. jpayne@69: -eoom This flag will cause the process to exit if an out-of-memory jpayne@69: exception occurs. Requires Java 8u92+. jpayne@69: -da Disable assertions. jpayne@69: jpayne@69: Please contact Brian Bushnell at bbushnell@lbl.gov if you encounter any problems. jpayne@69: " jpayne@69: } jpayne@69: jpayne@69: #This block allows symlinked shellscripts to correctly set classpath. jpayne@69: pushd . > /dev/null jpayne@69: DIR="${BASH_SOURCE[0]}" jpayne@69: while [ -h "$DIR" ]; do jpayne@69: cd "$(dirname "$DIR")" jpayne@69: DIR="$(readlink "$(basename "$DIR")")" jpayne@69: done jpayne@69: cd "$(dirname "$DIR")" jpayne@69: DIR="$(pwd)/" jpayne@69: popd > /dev/null jpayne@69: jpayne@69: #DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )/" jpayne@69: CP="$DIR""current/" jpayne@69: jpayne@69: z="-Xmx4g" jpayne@69: z2="-Xms4g" jpayne@69: set=0 jpayne@69: jpayne@69: if [ -z "$1" ] || [[ $1 == -h ]] || [[ $1 == --help ]]; then jpayne@69: usage jpayne@69: exit jpayne@69: fi jpayne@69: jpayne@69: calcXmx () { jpayne@69: source "$DIR""/calcmem.sh" jpayne@69: setEnvironment jpayne@69: parseXmx "$@" jpayne@69: if [[ $set == 1 ]]; then jpayne@69: return jpayne@69: fi jpayne@69: freeRam 4000m 84 jpayne@69: z="-Xmx${RAM}m" jpayne@69: z2="-Xms${RAM}m" jpayne@69: } jpayne@69: calcXmx "$@" jpayne@69: jpayne@69: callvariants() { jpayne@69: local CMD="java $EA $EOOM $z $z2 -cp $CP var2.CallVariants $@" jpayne@69: echo $CMD >&2 jpayne@69: eval $CMD jpayne@69: } jpayne@69: jpayne@69: callvariants "$@"