Mercurial > repos > rliterman > csp2
comparison CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/opt/bbmap-39.01-1/callvariants.sh @ 69:33d812a61356
planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author | jpayne |
---|---|
date | Tue, 18 Mar 2025 17:55:14 -0400 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
67:0e9998148a16 | 69:33d812a61356 |
---|---|
1 #!/bin/bash | |
2 | |
3 usage(){ | |
4 echo " | |
5 Written by Brian Bushnell | |
6 Last modified October 6, 2020 | |
7 | |
8 Description: Calls variants from sam or bam input. | |
9 In default mode, all input files are combined and treated as a single sample. | |
10 In multisample mode, each file is treated as an individual sample, | |
11 and gets its own column in the VCF file. Unless overridden, input file | |
12 names are used as sample names. | |
13 Please read bbmap/docs/guides/CallVariantsGuide.txt for more information, | |
14 or bbmap/pipelines/variantPipeline.sh for a usage example. | |
15 | |
16 Usage: callvariants.sh in=<file,file,...> ref=<file> vcf=<file> | |
17 | |
18 Input may be sorted or unsorted. | |
19 The reference should be fasta. | |
20 | |
21 I/O parameters: | |
22 in=<file> Input; may be one file or multiple comma-delimited files. | |
23 list=<file> Optional text file containing one input file per line. | |
24 Use list or in, but not both. | |
25 out=<file> Output variant list in var format. If the name ends | |
26 with .vcf then it will be vcf format. | |
27 vcf=<file> Output variant list in vcf format. | |
28 outgff=<file> Output variant list in gff format. | |
29 ref=<file> Reference fasta. Required to display ref alleles. | |
30 Variant calling wil be more accurate with the reference. | |
31 vcfin=<file> Force calls at these locations, even if allele count is 0. | |
32 shist=<file> (scorehist) Output for variant score histogram. | |
33 zhist=<file> (zygosityhist) Output for zygosity histogram. | |
34 qhist=<file> (qualityhist) Output for variant base quality histogram. | |
35 overwrite=f (ow) Set to false to force the program to abort rather than | |
36 overwrite an existing file. | |
37 extended=t Print additional variant statistics columns. | |
38 sample= Optional comma-delimited list of sample names. | |
39 multisample=f (multi) Set to true if there are multiple sam/bam files, | |
40 and each should be tracked as an individual sample. | |
41 vcf0= Optional comma-delimited list of per-sample outputs. | |
42 Only used in multisample mode. | |
43 bgzip=t Use bgzip for gzip compression. | |
44 samstreamer=t (ss) Load reads multithreaded to increase speed. | |
45 Disable to reduce the number of threads used. The number of | |
46 streamer threads can be set with e.g. 'ss=4'; default is 6. | |
47 streamermf=8 (ssmf) Allow multiple sam files to be read simultaneously. | |
48 Set ssmf=X to specify the maximum number or ssmf=f | |
49 to disable. | |
50 | |
51 Processing Parameters: | |
52 prefilter=f Use a Bloom filter to exclude variants seen fewer than | |
53 minreads times. Doubles the runtime but greatly reduces | |
54 memory usage. The results are identical. | |
55 coverage=t (cc) Calculate coverage, to better call variants. | |
56 ploidy=1 Set the organism's ploidy. | |
57 rarity=1.0 Penalize the quality of variants with allele fraction | |
58 lower than this. For example, if you are interested in | |
59 4% frequency variants, you could set both rarity and | |
60 minallelefraction to 0.04. This is affected by ploidy - | |
61 a variant with frequency indicating at least one copy | |
62 is never penalized. | |
63 covpenalty=0.8 (lowcoveragepenalty) A lower penalty will increase the | |
64 scores of low-coverage variants, and is useful for | |
65 low-coverage datasets. | |
66 useidentity=t Include average read identity in score calculation. | |
67 usepairing=t Include pairing rate in score calculation. | |
68 usebias=t Include strand bias in score calculation. | |
69 useedist=t Include read-end distance in score calculation. | |
70 homopolymer=t Penalize scores of substitutions matching adjacent bases. | |
71 nscan=t Consider the distance of a variant from contig ends when | |
72 calculating strand bias. | |
73 callsub=t Call substitutions. | |
74 calldel=t Call deletions. | |
75 callins=t Call insertions. | |
76 calljunct=f Call junctions (in development). | |
77 nopassdot=f Use . as genotype for variations failing the filter. | |
78 | |
79 Coverage Parameters (these mainly affect speed and memory use): | |
80 32bit=f Set to true to allow coverage tracking over depth 65535, | |
81 which increases memory use. Variant calls are impacted | |
82 where coverage exceeds the maximum. | |
83 atomic=auto Increases multithreaded speed; forces 32bit to true. | |
84 Defaults to true if there are more than 8 threads. | |
85 strandedcov=f (stranded) Tracks per-strand ref coverage to print the MCOV | |
86 and DP4 fields. Requires more memory when enabled. Strand | |
87 of variant reads is tracked regardless of this flag. | |
88 | |
89 Trimming Parameters: | |
90 border=5 Trim at least this many bases on both ends of reads. | |
91 qtrim=r Quality-trim reads on this end | |
92 r: right, l: left, rl: both, f: don't quality-trim. | |
93 trimq=10 Quality-trim bases below this score. | |
94 | |
95 Realignment Parameters: | |
96 realign=f Realign all reads with more than a couple mismatches. | |
97 Decreases speed. Recommended for aligners other than BBMap. | |
98 unclip=f Convert clip symbols from exceeding the ends of the | |
99 realignment zone into matches and substitutitions. | |
100 repadding=70 Pad alignment by this much on each end. Typically, | |
101 longer is more accurate for long indels, but greatly | |
102 reduces speed. | |
103 rerows=602 Use this many rows maximum for realignment. Reads longer | |
104 than this cannot be realigned. | |
105 recols=2000 Reads may not be aligned to reference seqments longer | |
106 than this. Needs to be at least read length plus | |
107 max deletion length plus twice padding. | |
108 msa= Select the aligner. Options: | |
109 MultiStateAligner11ts: Default. | |
110 MultiStateAligner9PacBio: Use for PacBio reads, or for | |
111 Illumina reads mapped to PacBio/Nanopore reads. | |
112 | |
113 Sam-filtering Parameters: | |
114 minpos= Ignore alignments not overlapping this range. | |
115 maxpos= Ignore alignments not overlapping this range. | |
116 minreadmapq=4 Ignore alignments with lower mapq. | |
117 contigs= Comma-delimited list of contig names to include. These | |
118 should have no spaces, or underscores instead of spaces. | |
119 secondary=f Include secondary alignments. | |
120 supplimentary=f Include supplimentary alignments. | |
121 duplicate=f Include reads flagged as duplicates. | |
122 invert=f Invert sam filters. | |
123 | |
124 Variant-Calling Cutoffs: | |
125 minreads=2 (minad) Ignore variants seen in fewer reads. | |
126 maxreads=BIG (maxad) Ignore variants seen in more reads. | |
127 mincov=0 Ignore variants in lower-coverage locations. | |
128 maxcov=BIG Ignore variants in higher-coverage locations. | |
129 minqualitymax=15 Ignore variants with lower max base quality. | |
130 minedistmax=20 Ignore variants with lower max distance from read ends. | |
131 minmapqmax=0 Ignore variants with lower max mapq. | |
132 minidmax=0 Ignore variants with lower max read identity. | |
133 minpairingrate=0.1 Ignore variants with lower pairing rate. | |
134 minstrandratio=0.1 Ignore variants with lower plus/minus strand ratio. | |
135 minquality=12.0 Ignore variants with lower average base quality. | |
136 minedist=10.0 Ignore variants with lower average distance from ends. | |
137 minavgmapq=0.0 Ignore variants with lower average mapq. | |
138 minallelefraction=0.1 Ignore variants with lower allele fraction. This | |
139 should be adjusted for high ploidies. | |
140 minid=0 Ignore variants with lower average read identity. | |
141 minscore=20.0 Ignore variants with lower Phred-scaled score. | |
142 clearfilters Clear all filters. Filter flags placed after | |
143 the clearfilters flag will still be applied. | |
144 | |
145 There are additionally max filters for score, quality, mapq, allelefraction, | |
146 and identity. | |
147 | |
148 Other Parameters: | |
149 minvarcopies=0 If set to 0, a genotype (vcf GT field) of 0 or 0/0 | |
150 will be called if observed allele frequency suggests | |
151 this is a minor allele. If set to 1, GT field will | |
152 contain at least one 1. | |
153 | |
154 Java Parameters: | |
155 -Xmx This will set Java's memory usage, overriding autodetection. | |
156 -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will | |
157 specify 200 megs. The max is typically 85% of physical memory. | |
158 -eoom This flag will cause the process to exit if an out-of-memory | |
159 exception occurs. Requires Java 8u92+. | |
160 -da Disable assertions. | |
161 | |
162 Please contact Brian Bushnell at bbushnell@lbl.gov if you encounter any problems. | |
163 " | |
164 } | |
165 | |
166 #This block allows symlinked shellscripts to correctly set classpath. | |
167 pushd . > /dev/null | |
168 DIR="${BASH_SOURCE[0]}" | |
169 while [ -h "$DIR" ]; do | |
170 cd "$(dirname "$DIR")" | |
171 DIR="$(readlink "$(basename "$DIR")")" | |
172 done | |
173 cd "$(dirname "$DIR")" | |
174 DIR="$(pwd)/" | |
175 popd > /dev/null | |
176 | |
177 #DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )/" | |
178 CP="$DIR""current/" | |
179 | |
180 z="-Xmx4g" | |
181 z2="-Xms4g" | |
182 set=0 | |
183 | |
184 if [ -z "$1" ] || [[ $1 == -h ]] || [[ $1 == --help ]]; then | |
185 usage | |
186 exit | |
187 fi | |
188 | |
189 calcXmx () { | |
190 source "$DIR""/calcmem.sh" | |
191 setEnvironment | |
192 parseXmx "$@" | |
193 if [[ $set == 1 ]]; then | |
194 return | |
195 fi | |
196 freeRam 4000m 84 | |
197 z="-Xmx${RAM}m" | |
198 z2="-Xms${RAM}m" | |
199 } | |
200 calcXmx "$@" | |
201 | |
202 callvariants() { | |
203 local CMD="java $EA $EOOM $z $z2 -cp $CP var2.CallVariants $@" | |
204 echo $CMD >&2 | |
205 eval $CMD | |
206 } | |
207 | |
208 callvariants "$@" |