Mercurial > repos > rliterman > csp2
comparison CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/opt/bbmap-39.01-1/tadpole.sh @ 69:33d812a61356
planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author | jpayne |
---|---|
date | Tue, 18 Mar 2025 17:55:14 -0400 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
67:0e9998148a16 | 69:33d812a61356 |
---|---|
1 #!/bin/bash | |
2 | |
3 usage(){ | |
4 echo " | |
5 Written by Brian Bushnell | |
6 Last modified February 3, 2021 | |
7 | |
8 Description: Uses kmer counts to assemble contigs, extend sequences, | |
9 or error-correct reads. Tadpole has no upper bound for kmer length, | |
10 but some values are not supported. Specifically, it allows 1-31, | |
11 multiples of 2 from 32-62, multiples of 3 from 63-93, etc. | |
12 Please read bbmap/docs/guides/TadpoleGuide.txt for more information. | |
13 | |
14 Usage: | |
15 Assembly: tadpole.sh in=<reads> out=<contigs> | |
16 Extension: tadpole.sh in=<reads> out=<extended> mode=extend | |
17 Correction: tadpole.sh in=<reads> out=<corrected> mode=correct | |
18 | |
19 Recommended parameters for optimal assembly: | |
20 tadpole.sh in=<reads> out=<contigs> shave rinse pop k=<50-70% of read length> | |
21 | |
22 Extension and correction may be done simultaneously. Error correction on | |
23 multiple files may be done like this: | |
24 | |
25 tadpole.sh in=libA_r1.fq,libA_merged.fq in2=libA_r2.fq,null extra=libB_r1.fq out=ecc_libA_r1.fq,ecc_libA_merged.fq out2=ecc_libA_r2.fq,null mode=correct | |
26 | |
27 Extending contigs with reads could be done like this: | |
28 | |
29 tadpole.sh in=contigs.fa out=extended.fa el=100 er=100 mode=extend extra=reads.fq k=62 | |
30 | |
31 | |
32 Input parameters: | |
33 in=<file> Primary input file for reads to use as kmer data. | |
34 in2=<file> Second input file for paired data. | |
35 extra=<file> Extra files for use as kmer data, but not for error- | |
36 correction or extension. | |
37 reads=-1 Only process this number of reads, then quit (-1 means all). | |
38 NOTE: in, in2, and extra may also be comma-delimited lists of files. | |
39 | |
40 Output parameters: | |
41 out=<file> Write contigs (in contig mode) or corrected/extended | |
42 reads (in other modes). | |
43 out2=<file> Second output file for paired output. | |
44 outd=<file> Write discarded reads, if using junk-removal flags. | |
45 dot=<file> Write a contigs connectivity graph (partially implemented) | |
46 dump=<file> Write kmers and their counts. | |
47 fastadump=t Write kmers and counts as fasta versus 2-column tsv. | |
48 mincounttodump=1 Only dump kmers with at least this depth. | |
49 showstats=t Print assembly statistics after writing contigs. | |
50 | |
51 Prefiltering parameters: | |
52 prefilter=0 If set to a positive integer, use a countmin sketch | |
53 to ignore kmers with depth of that value or lower. | |
54 prehashes=2 Number of hashes for prefilter. | |
55 prefiltersize=0.2 (pff) Fraction of memory to use for prefilter. | |
56 minprobprefilter=t (mpp) Use minprob for the prefilter. | |
57 prepasses=1 Use this many prefiltering passes; higher be more thorough | |
58 if the filter is very full. Set to 'auto' to iteratively | |
59 prefilter until the remaining kmers will fit in memory. | |
60 onepass=f If true, prefilter will be generated in same pass as kmer | |
61 counts. Much faster but counts will be lower, by up to | |
62 prefilter's depth limit. | |
63 filtermem=0 Allows manually specifying prefilter memory in bytes, for | |
64 deterministic runs. 0 will set it automatically. | |
65 | |
66 Hashing parameters: | |
67 k=31 Kmer length (1 to infinity). Memory use increases with K. | |
68 prealloc=t Pre-allocate memory rather than dynamically growing; | |
69 faster and more memory-efficient. A float fraction (0-1) | |
70 may be specified; default is 1. | |
71 minprob=0.5 Ignore kmers with overall probability of correctness below this. | |
72 minprobmain=t (mpm) Use minprob for the primary kmer counts. | |
73 threads=X Spawn X worker threads; default is number of logical processors. | |
74 buildthreads=X Spawn X contig-building threads. If not set, defaults to the same | |
75 as threads. Setting this to 1 will make contigs deterministic. | |
76 rcomp=t Store and count each kmer together and its reverse-complement. | |
77 coremask=t All kmer extensions share the same hashcode. | |
78 fillfast=t Speed up kmer extension lookups. | |
79 | |
80 Assembly parameters: | |
81 mincountseed=3 (mcs) Minimum kmer count to seed a new contig or begin extension. | |
82 mincountextend=2 (mce) Minimum kmer count continue extension of a read or contig. | |
83 It is recommended that mce=1 for low-depth metagenomes. | |
84 mincountretain=0 (mincr) Discard kmers with count below this. | |
85 maxcountretain=INF (maxcr) Discard kmers with count above this. | |
86 branchmult1=20 (bm1) Min ratio of 1st to 2nd-greatest path depth at high depth. | |
87 branchmult2=3 (bm2) Min ratio of 1st to 2nd-greatest path depth at low depth. | |
88 branchlower=3 (blc) Max value of 2nd-greatest path depth to be considered low. | |
89 minextension=2 (mine) Do not keep contigs that did not extend at least this much. | |
90 mincontig=auto (minc) Do not write contigs shorter than this. | |
91 mincoverage=1 (mincov) Do not write contigs with average coverage below this. | |
92 maxcoverage=inf (maxcov) Do not write contigs with average coverage above this. | |
93 trimends=0 (trim) Trim contig ends by this much. Trimming by K/2 | |
94 may yield more accurate genome size estimation. | |
95 trimcircular=t Trim one end of contigs ending in LOOP/LOOP by K-1, | |
96 to eliminate the overlapping portion. | |
97 contigpasses=16 Build contigs with decreasing seed depth for this many iterations. | |
98 contigpassmult=1.7 Ratio between seed depth of two iterations. | |
99 ownership=auto For concurrency; do not touch. | |
100 processcontigs=f Explore the contig connectivity graph. | |
101 popbubbles=t (pop) Pop bubbles; increases contiguity. Requires | |
102 additional time and memory and forces processcontigs=t. | |
103 | |
104 Processing modes: | |
105 mode=contig contig: Make contigs from kmers. | |
106 extend: Extend sequences to be longer, and optionally | |
107 perform error correction. | |
108 correct: Error correct only. | |
109 insert: Measure insert sizes. | |
110 discard: Discard low-depth reads, without error correction. | |
111 | |
112 Extension parameters: | |
113 extendleft=100 (el) Extend to the left by at most this many bases. | |
114 extendright=100 (er) Extend to the right by at most this many bases. | |
115 ibb=t (ignorebackbranches) Do not stop at backward branches. | |
116 extendrollback=3 Trim a random number of bases, up to this many, on reads | |
117 that extend only partially. This prevents the creation | |
118 of sharp coverage discontinuities at branches. | |
119 | |
120 Error-correction parameters: | |
121 ecc=f Error correct via kmer counts. | |
122 reassemble=t If ecc is enabled, use the reassemble algorithm. | |
123 pincer=f If ecc is enabled, use the pincer algorithm. | |
124 tail=f If ecc is enabled, use the tail algorithm. | |
125 eccfull=f If ecc is enabled, use tail over the entire read. | |
126 aggressive=f (aecc) Use aggressive error correction settings. | |
127 Overrides some other flags like errormult1 and deadzone. | |
128 conservative=f (cecc) Use conservative error correction settings. | |
129 Overrides some other flags like errormult1 and deadzone. | |
130 rollback=t Undo changes to reads that have lower coverage for | |
131 any kmer after correction. | |
132 markbadbases=0 (mbb) Any base fully covered by kmers with count below | |
133 this will have its quality reduced. | |
134 markdeltaonly=t (mdo) Only mark bad bases adjacent to good bases. | |
135 meo=t (markerrorreadsonly) Only mark bad bases in reads | |
136 containing errors. | |
137 markquality=0 (mq) Set quality scores for marked bases to this. | |
138 A level of 0 will also convert the base to an N. | |
139 errormult1=16 (em1) Min ratio between kmer depths to call an error. | |
140 errormult2=2.6 (em2) Alternate ratio between low-depth kmers. | |
141 errorlowerconst=3 (elc) Use mult2 when the lower kmer is at most this deep. | |
142 mincountcorrect=3 (mcc) Don't correct to kmers with count under this. | |
143 pathsimilarityfraction=0.45(psf) Max difference ratio considered similar. | |
144 Controls whether a path appears to be continuous. | |
145 pathsimilarityconstant=3 (psc) Absolute differences below this are ignored. | |
146 errorextensionreassemble=5 (eer) Verify this many kmers before the error as | |
147 having similar depth, for reassemble. | |
148 errorextensionpincer=5 (eep) Verify this many additional bases after the | |
149 error as matching current bases, for pincer. | |
150 errorextensiontail=9 (eet) Verify additional bases before and after | |
151 the error as matching current bases, for tail. | |
152 deadzone=0 (dz) Do not try to correct bases within this distance of | |
153 read ends. | |
154 window=12 (w) Length of window to use in reassemble mode. | |
155 windowcount=6 (wc) If more than this many errors are found within a | |
156 a window, halt correction in that direction. | |
157 qualsum=80 (qs) If the sum of the qualities of corrected bases within | |
158 a window exceeds this, halt correction in that direction. | |
159 rbi=t (requirebidirectional) Require agreement from both | |
160 directions when correcting errors in the middle part of | |
161 the read using the reassemble algorithm. | |
162 errorpath=1 (ep) For debugging purposes. | |
163 | |
164 Junk-removal parameters (to only remove junk, set mode=discard): | |
165 tossjunk=f Remove reads that cannot be used for assembly. | |
166 This means they have no kmers above depth 1 (2 for paired | |
167 reads) and the outermost kmers cannot be extended. | |
168 Pairs are removed only if both reads fail. | |
169 tossdepth=-1 Remove reads containing kmers at or below this depth. | |
170 Pairs are removed if either read fails. | |
171 lowdepthfraction=0 (ldf) Require at least this fraction of kmers to be | |
172 low-depth to discard a read; range 0-1. 0 still | |
173 requires at least 1 low-depth kmer. | |
174 requirebothbad=f (rbb) Only discard pairs if both reads are low-depth. | |
175 tossuncorrectable (tu) Discard reads containing uncorrectable errors. | |
176 Requires error-correction to be enabled. | |
177 | |
178 Shaving parameters: | |
179 shave=t Remove dead ends (aka hair). | |
180 rinse=t Remove bubbles. | |
181 wash= Set shave and rinse at the same time. | |
182 maxshavedepth=1 (msd) Shave or rinse kmers at most this deep. | |
183 exploredist=300 (sed) Quit after exploring this far. | |
184 discardlength=150 (sdl) Discard shavings up to this long. | |
185 Note: Shave and rinse can produce substantially better assemblies | |
186 for low-depth data, but they are very slow for large metagenomes. | |
187 | |
188 Overlap parameters (for overlapping paired-end reads only): | |
189 merge=f Attempt to merge overlapping reads prior to | |
190 kmer-counting, and again prior to correction. Output | |
191 will still be unmerged pairs. | |
192 ecco=f Error correct via overlap, but do not merge reads. | |
193 testmerge=t Test kmer counts around the read merge junctions. If | |
194 it appears that the merge created new errors, undo it. | |
195 | |
196 Java Parameters: | |
197 -Xmx This will set Java's memory usage, overriding autodetection. | |
198 -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. | |
199 The max is typically 85% of physical memory. | |
200 -eoom This flag will cause the process to exit if an | |
201 out-of-memory exception occurs. Requires Java 8u92+. | |
202 -da Disable assertions. | |
203 " | |
204 } | |
205 | |
206 #This block allows symlinked shellscripts to correctly set classpath. | |
207 pushd . > /dev/null | |
208 DIR="${BASH_SOURCE[0]}" | |
209 while [ -h "$DIR" ]; do | |
210 cd "$(dirname "$DIR")" | |
211 DIR="$(readlink "$(basename "$DIR")")" | |
212 done | |
213 cd "$(dirname "$DIR")" | |
214 DIR="$(pwd)/" | |
215 popd > /dev/null | |
216 | |
217 #DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )/" | |
218 CP="$DIR""current/" | |
219 | |
220 z="-Xmx14g" | |
221 z2="-Xms14g" | |
222 set=0 | |
223 | |
224 if [ -z "$1" ] || [[ $1 == -h ]] || [[ $1 == --help ]]; then | |
225 usage | |
226 exit | |
227 fi | |
228 | |
229 calcXmx () { | |
230 source "$DIR""/calcmem.sh" | |
231 setEnvironment | |
232 parseXmx "$@" | |
233 if [[ $set == 1 ]]; then | |
234 return | |
235 fi | |
236 freeRam 15000m 84 | |
237 z="-Xmx${RAM}m" | |
238 z2="-Xms${RAM}m" | |
239 } | |
240 calcXmx "$@" | |
241 | |
242 tadpole() { | |
243 local CMD="java $EA $EOOM $z $z2 -cp $CP assemble.Tadpole $@" | |
244 echo $CMD >&2 | |
245 eval $CMD | |
246 } | |
247 | |
248 tadpole "$@" |