comparison CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/opt/bbmap-39.01-1/callvariants.sh @ 69:33d812a61356

planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author jpayne
date Tue, 18 Mar 2025 17:55:14 -0400
parents
children
comparison
equal deleted inserted replaced
67:0e9998148a16 69:33d812a61356
1 #!/bin/bash
2
3 usage(){
4 echo "
5 Written by Brian Bushnell
6 Last modified October 6, 2020
7
8 Description: Calls variants from sam or bam input.
9 In default mode, all input files are combined and treated as a single sample.
10 In multisample mode, each file is treated as an individual sample,
11 and gets its own column in the VCF file. Unless overridden, input file
12 names are used as sample names.
13 Please read bbmap/docs/guides/CallVariantsGuide.txt for more information,
14 or bbmap/pipelines/variantPipeline.sh for a usage example.
15
16 Usage: callvariants.sh in=<file,file,...> ref=<file> vcf=<file>
17
18 Input may be sorted or unsorted.
19 The reference should be fasta.
20
21 I/O parameters:
22 in=<file> Input; may be one file or multiple comma-delimited files.
23 list=<file> Optional text file containing one input file per line.
24 Use list or in, but not both.
25 out=<file> Output variant list in var format. If the name ends
26 with .vcf then it will be vcf format.
27 vcf=<file> Output variant list in vcf format.
28 outgff=<file> Output variant list in gff format.
29 ref=<file> Reference fasta. Required to display ref alleles.
30 Variant calling wil be more accurate with the reference.
31 vcfin=<file> Force calls at these locations, even if allele count is 0.
32 shist=<file> (scorehist) Output for variant score histogram.
33 zhist=<file> (zygosityhist) Output for zygosity histogram.
34 qhist=<file> (qualityhist) Output for variant base quality histogram.
35 overwrite=f (ow) Set to false to force the program to abort rather than
36 overwrite an existing file.
37 extended=t Print additional variant statistics columns.
38 sample= Optional comma-delimited list of sample names.
39 multisample=f (multi) Set to true if there are multiple sam/bam files,
40 and each should be tracked as an individual sample.
41 vcf0= Optional comma-delimited list of per-sample outputs.
42 Only used in multisample mode.
43 bgzip=t Use bgzip for gzip compression.
44 samstreamer=t (ss) Load reads multithreaded to increase speed.
45 Disable to reduce the number of threads used. The number of
46 streamer threads can be set with e.g. 'ss=4'; default is 6.
47 streamermf=8 (ssmf) Allow multiple sam files to be read simultaneously.
48 Set ssmf=X to specify the maximum number or ssmf=f
49 to disable.
50
51 Processing Parameters:
52 prefilter=f Use a Bloom filter to exclude variants seen fewer than
53 minreads times. Doubles the runtime but greatly reduces
54 memory usage. The results are identical.
55 coverage=t (cc) Calculate coverage, to better call variants.
56 ploidy=1 Set the organism's ploidy.
57 rarity=1.0 Penalize the quality of variants with allele fraction
58 lower than this. For example, if you are interested in
59 4% frequency variants, you could set both rarity and
60 minallelefraction to 0.04. This is affected by ploidy -
61 a variant with frequency indicating at least one copy
62 is never penalized.
63 covpenalty=0.8 (lowcoveragepenalty) A lower penalty will increase the
64 scores of low-coverage variants, and is useful for
65 low-coverage datasets.
66 useidentity=t Include average read identity in score calculation.
67 usepairing=t Include pairing rate in score calculation.
68 usebias=t Include strand bias in score calculation.
69 useedist=t Include read-end distance in score calculation.
70 homopolymer=t Penalize scores of substitutions matching adjacent bases.
71 nscan=t Consider the distance of a variant from contig ends when
72 calculating strand bias.
73 callsub=t Call substitutions.
74 calldel=t Call deletions.
75 callins=t Call insertions.
76 calljunct=f Call junctions (in development).
77 nopassdot=f Use . as genotype for variations failing the filter.
78
79 Coverage Parameters (these mainly affect speed and memory use):
80 32bit=f Set to true to allow coverage tracking over depth 65535,
81 which increases memory use. Variant calls are impacted
82 where coverage exceeds the maximum.
83 atomic=auto Increases multithreaded speed; forces 32bit to true.
84 Defaults to true if there are more than 8 threads.
85 strandedcov=f (stranded) Tracks per-strand ref coverage to print the MCOV
86 and DP4 fields. Requires more memory when enabled. Strand
87 of variant reads is tracked regardless of this flag.
88
89 Trimming Parameters:
90 border=5 Trim at least this many bases on both ends of reads.
91 qtrim=r Quality-trim reads on this end
92 r: right, l: left, rl: both, f: don't quality-trim.
93 trimq=10 Quality-trim bases below this score.
94
95 Realignment Parameters:
96 realign=f Realign all reads with more than a couple mismatches.
97 Decreases speed. Recommended for aligners other than BBMap.
98 unclip=f Convert clip symbols from exceeding the ends of the
99 realignment zone into matches and substitutitions.
100 repadding=70 Pad alignment by this much on each end. Typically,
101 longer is more accurate for long indels, but greatly
102 reduces speed.
103 rerows=602 Use this many rows maximum for realignment. Reads longer
104 than this cannot be realigned.
105 recols=2000 Reads may not be aligned to reference seqments longer
106 than this. Needs to be at least read length plus
107 max deletion length plus twice padding.
108 msa= Select the aligner. Options:
109 MultiStateAligner11ts: Default.
110 MultiStateAligner9PacBio: Use for PacBio reads, or for
111 Illumina reads mapped to PacBio/Nanopore reads.
112
113 Sam-filtering Parameters:
114 minpos= Ignore alignments not overlapping this range.
115 maxpos= Ignore alignments not overlapping this range.
116 minreadmapq=4 Ignore alignments with lower mapq.
117 contigs= Comma-delimited list of contig names to include. These
118 should have no spaces, or underscores instead of spaces.
119 secondary=f Include secondary alignments.
120 supplimentary=f Include supplimentary alignments.
121 duplicate=f Include reads flagged as duplicates.
122 invert=f Invert sam filters.
123
124 Variant-Calling Cutoffs:
125 minreads=2 (minad) Ignore variants seen in fewer reads.
126 maxreads=BIG (maxad) Ignore variants seen in more reads.
127 mincov=0 Ignore variants in lower-coverage locations.
128 maxcov=BIG Ignore variants in higher-coverage locations.
129 minqualitymax=15 Ignore variants with lower max base quality.
130 minedistmax=20 Ignore variants with lower max distance from read ends.
131 minmapqmax=0 Ignore variants with lower max mapq.
132 minidmax=0 Ignore variants with lower max read identity.
133 minpairingrate=0.1 Ignore variants with lower pairing rate.
134 minstrandratio=0.1 Ignore variants with lower plus/minus strand ratio.
135 minquality=12.0 Ignore variants with lower average base quality.
136 minedist=10.0 Ignore variants with lower average distance from ends.
137 minavgmapq=0.0 Ignore variants with lower average mapq.
138 minallelefraction=0.1 Ignore variants with lower allele fraction. This
139 should be adjusted for high ploidies.
140 minid=0 Ignore variants with lower average read identity.
141 minscore=20.0 Ignore variants with lower Phred-scaled score.
142 clearfilters Clear all filters. Filter flags placed after
143 the clearfilters flag will still be applied.
144
145 There are additionally max filters for score, quality, mapq, allelefraction,
146 and identity.
147
148 Other Parameters:
149 minvarcopies=0 If set to 0, a genotype (vcf GT field) of 0 or 0/0
150 will be called if observed allele frequency suggests
151 this is a minor allele. If set to 1, GT field will
152 contain at least one 1.
153
154 Java Parameters:
155 -Xmx This will set Java's memory usage, overriding autodetection.
156 -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will
157 specify 200 megs. The max is typically 85% of physical memory.
158 -eoom This flag will cause the process to exit if an out-of-memory
159 exception occurs. Requires Java 8u92+.
160 -da Disable assertions.
161
162 Please contact Brian Bushnell at bbushnell@lbl.gov if you encounter any problems.
163 "
164 }
165
166 #This block allows symlinked shellscripts to correctly set classpath.
167 pushd . > /dev/null
168 DIR="${BASH_SOURCE[0]}"
169 while [ -h "$DIR" ]; do
170 cd "$(dirname "$DIR")"
171 DIR="$(readlink "$(basename "$DIR")")"
172 done
173 cd "$(dirname "$DIR")"
174 DIR="$(pwd)/"
175 popd > /dev/null
176
177 #DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )/"
178 CP="$DIR""current/"
179
180 z="-Xmx4g"
181 z2="-Xms4g"
182 set=0
183
184 if [ -z "$1" ] || [[ $1 == -h ]] || [[ $1 == --help ]]; then
185 usage
186 exit
187 fi
188
189 calcXmx () {
190 source "$DIR""/calcmem.sh"
191 setEnvironment
192 parseXmx "$@"
193 if [[ $set == 1 ]]; then
194 return
195 fi
196 freeRam 4000m 84
197 z="-Xmx${RAM}m"
198 z2="-Xms${RAM}m"
199 }
200 calcXmx "$@"
201
202 callvariants() {
203 local CMD="java $EA $EOOM $z $z2 -cp $CP var2.CallVariants $@"
204 echo $CMD >&2
205 eval $CMD
206 }
207
208 callvariants "$@"