jpayne@68
|
1 BBTools changelog and todo list.
|
jpayne@68
|
2
|
jpayne@68
|
3 V39.
|
jpayne@68
|
4 39.00
|
jpayne@68
|
5 Refactored Bloom Filter code and added increment-by-amount function.
|
jpayne@68
|
6 Eliminated some old versions of Bloom Filters; renamed KmerCount7MTA class to ReadCounter.
|
jpayne@68
|
7 Added kmer count output to bloomfilter.sh, and supported mincount=0.
|
jpayne@68
|
8 Fixed a crash when trimming mapped sam files.
|
jpayne@68
|
9 Filtersam no longer automatically trims qname and rname after the first whitespace.
|
jpayne@68
|
10 39.01
|
jpayne@68
|
11 Accelerated SortByName sequence mode using kmer prefixes.
|
jpayne@68
|
12 Adjusted some SortByName memory parameters to be more conservative; fixed pairing-related assertions.
|
jpayne@68
|
13 Added a Clumpify-like sorting mode.
|
jpayne@68
|
14 Added a BloomFilter results parser for collating bulk output.
|
jpayne@68
|
15 Read Streams can now report their input source.
|
jpayne@68
|
16
|
jpayne@68
|
17 TODO: randomgenome percent repeat and repeat length
|
jpayne@68
|
18
|
jpayne@68
|
19 V38.
|
jpayne@68
|
20 38.00
|
jpayne@68
|
21 Moved ByteBuilder to Structures.
|
jpayne@68
|
22 Added some formatting and comments to SuperLongList.
|
jpayne@68
|
23 JsonObject printing now has an inArray state that prevents newlines from arrays of JsonObjects.
|
jpayne@68
|
24 Improved JsonParser handling of booleans.
|
jpayne@68
|
25 Added a JsonParser validate command.
|
jpayne@68
|
26 Wrote TaxClient for internally doing tax lookups from the TaxServer.
|
jpayne@68
|
27 Added post mode to TaxClient and TaxServer, for URLs over 2000 characters.
|
jpayne@68
|
28 Moved StringNum to Structures.
|
jpayne@68
|
29 Accession loader now sorts files in ascending order of size and can load some before others.
|
jpayne@68
|
30 Fixed a flaw in the hash function for accession numbers that may have allowed collisions.
|
jpayne@68
|
31 TaxTree.parseNodeFromHeader will now try harder for headers with certain formatting.
|
jpayne@68
|
32 Fixed potential overflows by changing Integer.MAX_VALUE to Shared.MAX_ARRAY_LEN.
|
jpayne@68
|
33 SketchTool now has a custom, low-garbage loader instead of relying on ByteFile.
|
jpayne@68
|
34 RQCFilter2 now uses half as many threads for pigz as logical cores.
|
jpayne@68
|
35 Wrote BloomFilter and BloomFilterWrapper.
|
jpayne@68
|
36 Added BloomFilter support into BBMap and RQCFilter.
|
jpayne@68
|
37 Wrote a better available memory estimation function for BloomFilter.
|
jpayne@68
|
38 Accelerated BloomFilter lookup when minConsecutiveMatches>1.
|
jpayne@68
|
39 Fixed logging of BBSplit vs BBMap in RQCFilter2.
|
jpayne@68
|
40 Bloom filter creation from BBMap index now uses multiple threads per chunk.
|
jpayne@68
|
41 Fixed a null pointer in TextStringWriter.
|
jpayne@68
|
42 Fixed a static variable (ef) persisting in RQCFilter, which slowed human removal.
|
jpayne@68
|
43 38.01
|
jpayne@68
|
44 Added support for lowercase letters in accessions.
|
jpayne@68
|
45 gi2ncbi now supports streaming and some other options like shrinknames in server mode.
|
jpayne@68
|
46 Sketch can now return json format from a curl call.
|
jpayne@68
|
47 Sketch server no longer crashes from invalid symbols in sequence in local mode.
|
jpayne@68
|
48 SketchMaker now has a local cache of SketchHeaps per thread in per-taxa mode, allowing a 6x speedup by reducing synchronization and rework.
|
jpayne@68
|
49 RefSeq now uses a 250-species blacklist limit with sizemult=2 instead of 300.
|
jpayne@68
|
50 Wrote MergeSorted and mergesorted.sh to resume SortByName runs that crashed or were killed during merging.
|
jpayne@68
|
51 Removed DumpCount from SortByName and CrisContainer. It was too confusing. To shuffle large datasets, they can be merged round-robin.
|
jpayne@68
|
52 Fixed an error message when autodetecting quality encoding.
|
jpayne@68
|
53 Refseq sketch server is now double the normal resolution (sizemult=2).
|
jpayne@68
|
54 SendSketch defaults to sizemult=2 for RefSeq.
|
jpayne@68
|
55 Sketch server startup script now sets sizemult=2 for refseq.
|
jpayne@68
|
56 Added logscale peak calling.
|
jpayne@68
|
57 Added peaks file GC annotation.
|
jpayne@68
|
58 Fixed an array out of bounds in EntropyTracker.
|
jpayne@68
|
59 CallVariants now ignores duplicates by default (0x400 bit).
|
jpayne@68
|
60 StatsWrapper will now append to the gc output if there are multiple assemblies.
|
jpayne@68
|
61 Wrote AnalyzeAccession and analyzeaccession.sh to reduce the memory footprint of accessions in the tax server.
|
jpayne@68
|
62 Added entropy filter flag to RQCFilter2.
|
jpayne@68
|
63 BloomFilter can now act as a highpass filter.
|
jpayne@68
|
64 38.02
|
jpayne@68
|
65 BloomFilter can now do error correction, using the Tadpole algorithm.
|
jpayne@68
|
66 Added merge and unmerge to Tadpole and BloomFilter for dramatic error correction improvements.
|
jpayne@68
|
67 Improved BloomFilter error correction defaults and added smoothing.
|
jpayne@68
|
68 Improved BloomFilter's memory management and added a memfraction flag.
|
jpayne@68
|
69 Fixed tuc not working.
|
jpayne@68
|
70 Tadpole.BloomFilter ECC_ROLLBACK will now roll back merges also (but not ecco currently).
|
jpayne@68
|
71 Wrote Rollback object to simplify rollbacks during error correction.
|
jpayne@68
|
72 Spun BloomFilterCorrectorWrapper of from BloomFilterWrapper.
|
jpayne@68
|
73 Spun bbcms.sh off of bloomfilter.sh.
|
jpayne@68
|
74 Fixed a bug in msa.sh handling of reverse-complements.
|
jpayne@68
|
75 Improved msa.sh to fully expand undefined bases, accept fasta files, and name the output such that it is clear whether an alignment was forward or reverse.
|
jpayne@68
|
76 msa.sh now allows a cutoff for min identity.
|
jpayne@68
|
77 Improved bbcms smoothing.
|
jpayne@68
|
78 bbcms now allows a minimum fraction of kmers above a certain count to be specified.
|
jpayne@68
|
79 bbcms now prints more statistics about the loaded bloom filter.
|
jpayne@68
|
80 38.03
|
jpayne@68
|
81 Fixed broken interleaving in bbcms output.
|
jpayne@68
|
82 Added seed flag to bbcms and bloomfilter.
|
jpayne@68
|
83 Added BBMerge vstrict and ustrict flags to bbcms.
|
jpayne@68
|
84 Added mergeOK and testmerge flags to BBMerge.
|
jpayne@68
|
85 Added BloomFilter support to BBMerge.
|
jpayne@68
|
86 BBMerge now automatically writes both mergable and unmergable pairs to out if ecco=t and mix is unset.
|
jpayne@68
|
87 testmerge flag now works with ecco.
|
jpayne@68
|
88 Fixed indentation for Tadpole/bbcms results.
|
jpayne@68
|
89 38.04
|
jpayne@68
|
90 bbcms and bloom filter now allow random seeds.
|
jpayne@68
|
91 Changed version printing to not repeat arguments.
|
jpayne@68
|
92 Eliminated redundant copies of mergeOK functions.
|
jpayne@68
|
93 Fixed bbcms testmerge flag.
|
jpayne@68
|
94 Fixed trim/qtrim flag in BBSplit help.
|
jpayne@68
|
95 Added relative error threshold for mergeOK. TODO: Does not seem to help in my test; try on single cell data.
|
jpayne@68
|
96 Added variable smooth width to bbcms.
|
jpayne@68
|
97 Changed bbcms default bits to 4 after testing.
|
jpayne@68
|
98 Fixed bbcms extra flag.
|
jpayne@68
|
99 38.05
|
jpayne@68
|
100 Fixed interleaving detection in SortByName.
|
jpayne@68
|
101 Changed interleaving detection in FileFormat to audodetect more aggressively.
|
jpayne@68
|
102 Fixed a bug with RQCFilter2 interleaving settings carrying over from BBMerge to FilterByTaxa.
|
jpayne@68
|
103 38.06
|
jpayne@68
|
104 Changed KmerArray to collide all possible kmer extensions into the same cell.
|
jpayne@68
|
105 Wrote FillFast to grab all 4 possible kmer extensions with a single modulo operation.
|
jpayne@68
|
106 Simplified some of BBDuk pair-tracking and discarding logic.
|
jpayne@68
|
107 Added trimfailures bbduk flag.
|
jpayne@68
|
108 Fixed a division by zero bug in SortByName.mergeRecursive.
|
jpayne@68
|
109 Fixed an array-out-of-bounds in CallPeaks.
|
jpayne@68
|
110 Made dual-kmer ANI estimation from Sketch more accurate.
|
jpayne@68
|
111 Added loglog support to BBMerge and Seal.
|
jpayne@68
|
112 Added loglogout support to BBMerge, BBDuk, and Seal.
|
jpayne@68
|
113 RQCFilter2 status.log now tracks kmers.
|
jpayne@68
|
114 Removed RQCFilter and pointed rqcfilter.sh to rqcfilter2.sh.
|
jpayne@68
|
115 38.07
|
jpayne@68
|
116 Changed KmerTable increment functions to require an incr value.
|
jpayne@68
|
117 Added sortbuffer flag to Tadpole, but speed was barely improved on high-depth Clumpified data.
|
jpayne@68
|
118 Migrated coremask and fillfast to tadpole2, but they make it slower for some reason.
|
jpayne@68
|
119 Migrated shave and rinse improvements to Tadpole2; these can make those steps dramatically faster in metagenomes.
|
jpayne@68
|
120 Added BloomFilter serialization.
|
jpayne@68
|
121 Increased default k and minhits of Bloom filter in RQCFilter2 and added serialized filters.
|
jpayne@68
|
122 Reduced RandomReads default quality.
|
jpayne@68
|
123 Made gaussian insert size distribution default for RandomReads.
|
jpayne@68
|
124 Wrote FastaShredInputStream for faster Bloom filter loading with lower memory consumption.
|
jpayne@68
|
125 Fixed number of threads allocated to Bloom filter loading from index.
|
jpayne@68
|
126 38.08
|
jpayne@68
|
127 FilterByTaxa and RQCFilter no longer crash if a header cannot be parsed and the accession tables are not loaded.
|
jpayne@68
|
128 38.09
|
jpayne@68
|
129 bbcms default bits changed from 1 to 2.
|
jpayne@68
|
130 Improved bbcms tossjunk function.
|
jpayne@68
|
131 Added documentation to bbcms and Tadpole.
|
jpayne@68
|
132 Added fixextensions flag, and enabled it for CallVariants, BBDuk, Reformat, RQCFilter, BBNorm, BBMerge, BBMap, Tadpole, and bbcms.
|
jpayne@68
|
133 RQCFilter now extends reads prior to merging if there is enough memory. This means the insert size histogram will take longer, but allow non-overlapping inserts.
|
jpayne@68
|
134 BBMap now tracks statistics correctly when Bloom filter is enabled.
|
jpayne@68
|
135 Fixed Children flag in TaxServer.
|
jpayne@68
|
136 Shave and rinse no longer checks owner for initial high kmers.
|
jpayne@68
|
137 Shave and rinse now ignores initial high kmers above the isJunction trigger for extension in some cases, for a large speedup in isolates (uses shaveFast flag).
|
jpayne@68
|
138 Changed RandomReads default insert size distribution to more closely match JGI fragment library targets.
|
jpayne@68
|
139 Multithreaded KmerCountArray/KmerCountArrayU ownership array allocation via OwnershipThread for a large speed increase in assembly.
|
jpayne@68
|
140 Added 2passresize flag to Tadpole but it didn't seem to speed things up.
|
jpayne@68
|
141 Added Constellation-like output option for CompareSketch.
|
jpayne@68
|
142 Major changes to Kmer table sizing - a premade resize schedule is now used. Only for Kmer so far not UKmer.
|
jpayne@68
|
143 38.10
|
jpayne@68
|
144 Merged dev python changes.
|
jpayne@68
|
145 38.11
|
jpayne@68
|
146 Ported schedule to UKmer.
|
jpayne@68
|
147 Fixed a bytesPerKmer bug in KmerCountExact for k>31.
|
jpayne@68
|
148 Accelerated kmer lookups for k>31.
|
jpayne@68
|
149 Condensed code for shave/rinse, but no speed increase.
|
jpayne@68
|
150 Changed default exploredist from 100 to 300.
|
jpayne@68
|
151 38.12
|
jpayne@68
|
152 Stats now omits the first size bracket if it is less than minscaf.
|
jpayne@68
|
153 Fixed problems with extended stats in format 4-6.
|
jpayne@68
|
154 Fixed a bug in reporting amount of spikin removed in RQCFilter.
|
jpayne@68
|
155 Multithreaded kmer frequency histogram generation using kmer and ukmer packages.
|
jpayne@68
|
156 mutate.sh now outputs vcf files.
|
jpayne@68
|
157 Fixed processing of sam files with M, =, and X in cigar string.
|
jpayne@68
|
158 Fixed a bloom filter BBMap bug in counting reads.
|
jpayne@68
|
159 Updated some pipelines shell scripts.
|
jpayne@68
|
160 Started writing a new KCountArray class, but abandoned it as the current one looks as efficient as possible.
|
jpayne@68
|
161 38.13
|
jpayne@68
|
162 Fixed a casting exception in Shared.sort.
|
jpayne@68
|
163 Fixed missing column from mutate.sh vcf output.
|
jpayne@68
|
164 Addslash for RandomReads now works with the illuminanames flag.
|
jpayne@68
|
165 Fixed mutate.sh VCF files.
|
jpayne@68
|
166 Wrote Contig and Edge classes.
|
jpayne@68
|
167 Wrote ContigLengthComparator.
|
jpayne@68
|
168 Transitioned Tadpole from building Reads to building Contigs.
|
jpayne@68
|
169 Wrote ProcessContigThread.
|
jpayne@68
|
170 Tadpole now writes additional information about contig ends to headers.
|
jpayne@68
|
171 Tadpole now strictly uses F_BRANCH and B_BRANCH instead of just BRANCH (TODO: D_BRANCH).
|
jpayne@68
|
172 Tadpole output should now have canonical orientation, order, and names (apart from circular contigs).
|
jpayne@68
|
173 Tadpole1 now has a preliminary contig graph processing phase (in progress).
|
jpayne@68
|
174 Tadpole now supports preliminary dot output (not yet correct).
|
jpayne@68
|
175 Added appendln to some ByteBuilder methods.
|
jpayne@68
|
176 Added print(Contig) to bsw.
|
jpayne@68
|
177 38.14-38.15
|
jpayne@68
|
178 Integrated dev Python changes; merging Git branches.
|
jpayne@68
|
179 38.16
|
jpayne@68
|
180 Ported Tadpole1 ProcessContigThread to Tadpole2.
|
jpayne@68
|
181 Added perfile flag to CompareSketch, which allows multithreaded loading.
|
jpayne@68
|
182 Added prealloc flag to CompareSketch.
|
jpayne@68
|
183 Revised TaxServer to use Sketch index, and typically run 1 thread per sketch.
|
jpayne@68
|
184 Added outsketch flag to CompareSketch.
|
jpayne@68
|
185 Modified RandomGenome to be faster and more flexible, and added a shell script.
|
jpayne@68
|
186 38.17
|
jpayne@68
|
187 Added Sketch minLevelExtended flag.
|
jpayne@68
|
188 Fixed bbcms loglog using quality scores from the wrong read.
|
jpayne@68
|
189 Wrote MergeSketch and mergesketch.sh.
|
jpayne@68
|
190 Fixed a major bug in TaxTree.getNodeAtLevel and restarted all servers.
|
jpayne@68
|
191 Wrote KmerLimit and kmerlimit.sh.
|
jpayne@68
|
192 Wrote Shuffle2 and shuffle2.sh.
|
jpayne@68
|
193 Changed blacklist_nt_species_1000.sketch to blacklist_nt_species_500.sketch
|
jpayne@68
|
194 38.18
|
jpayne@68
|
195 Modified RQCFilter and BBMap to correctly track and report unmapped reads and bases when using the Bloom filter.
|
jpayne@68
|
196 Wrote RQCFilterStats for tracking relevant RQCFilter stats. This is printed to filterStats2.txt.
|
jpayne@68
|
197 Added some columns to BBMap scafstats/refstats where a read is assigned to at most a single reference.
|
jpayne@68
|
198 All classes that used ThreadLocalRandom now call Shared.threadLocalRandom() to comply with Java 6.
|
jpayne@68
|
199 Wrote KmerLimit and kmerlimit.sh to restrict a randomly-ordered file to a specific number of unique kmers.
|
jpayne@68
|
200 Wrote KmerLimit2 and kmerlimit2.sh to restrict an arbitrarily-ordered file to a specific number of unique kmers via subsampling.
|
jpayne@68
|
201 Updated /pipelines/ scripts for fetching and sketching.
|
jpayne@68
|
202 38.19
|
jpayne@68
|
203 Updated RQCFilterData tar.
|
jpayne@68
|
204 Updated wrapper shellscripts to handle Cori error messages.
|
jpayne@68
|
205 Fixed a bug in tracking duplicate reads in RQCFilter.
|
jpayne@68
|
206 38.20
|
jpayne@68
|
207 Added logsum and powsum to stats.sh gc output format 5.
|
jpayne@68
|
208 Fixed a bug in tracking reads in RQCFilter.
|
jpayne@68
|
209 Fixed a basic to extended taxonomy translation routine in TaxTree.
|
jpayne@68
|
210 Added JSON (format 8) to stats.sh.
|
jpayne@68
|
211 Fixed(?) BBMap tracking of trimm/untrimmed bases for mapped and unmapped reads.
|
jpayne@68
|
212 Fixed bugs in RQCFilter tracking of trim/untrimmed mapped bases.
|
jpayne@68
|
213 38.21
|
jpayne@68
|
214 Wrote JsonLiteral and modified Stats to not put quotes around formatted floats.
|
jpayne@68
|
215 Added support for accession, gi, and header lookups to RenameGiToNcbi.
|
jpayne@68
|
216 --help or --version now exit with status 0 rather than 1.
|
jpayne@68
|
217 Updated some documentation.
|
jpayne@68
|
218 Added BBDuk trimpolyg flag.
|
jpayne@68
|
219 FlowCell MicroTiles now track more data and have more methods.
|
jpayne@68
|
220 Wrote PlotFlowCell and plotflowcell.sh, to look at the distribution of polyG in NovaSeq runs.
|
jpayne@68
|
221 Fixed a broken if-else in AccessionToTaxId that was causing TaxServer to start with prealloc false.
|
jpayne@68
|
222 Fixed a bug in verifying other mapped stats in RQCFilter2.
|
jpayne@68
|
223 38.22
|
jpayne@68
|
224 Added getters for sketch.Comparison and sketch.CompareBuffer, and made fields private.
|
jpayne@68
|
225 Fixed bug causing Sketch unique count to display incorrectly - bitsetbits had been changed from 2 to 1. It should be 2; made static final.
|
jpayne@68
|
226 Fixed an array size bug in Tadpole caused by increasing the range of termination codes.
|
jpayne@68
|
227 Fixed a problem of Kmers being appended to ByteBuilders reverse-complemented. This impacted Shaver2.
|
jpayne@68
|
228 Fixed a static variable (MASK_CORE) hangover from Tadpole1 into Tadpole2 with TadWrapper.
|
jpayne@68
|
229 Added more BBDuk polyG options.
|
jpayne@68
|
230 Added polyG options and tracking to RQCFilter.
|
jpayne@68
|
231 Fixed an incident where a new KmerComparator was created unnecessarily.
|
jpayne@68
|
232 Clumpify now correctly counts the number of reads when a temp file is streamed without being clumped.
|
jpayne@68
|
233 38.23
|
jpayne@68
|
234 Wrote hiseq.CycleTracker.
|
jpayne@68
|
235 Fixed a parse error in AnalyzeFlowCell.
|
jpayne@68
|
236 Added preliminary G-bubble-detection and elimination to AnalyzeFlowCell, but it is not clear if it is working correctly.
|
jpayne@68
|
237 Wrote hiseq.IlluminaHeaderParser.
|
jpayne@68
|
238 Revised A_Sample, A_SampleMT, and A_SampleByteFile with additional submethods to reduce the length of long methods.
|
jpayne@68
|
239 Removed JNI path flag from BBMerge, BBMap, and RQCFilter shell scripts.
|
jpayne@68
|
240 Fixed a bug in reading adaptersOut.fa from RQCFilter2.
|
jpayne@68
|
241 Changed the way path is appended to output files in RQCFilter2.
|
jpayne@68
|
242 Added poly-C flags to BBDuk.
|
jpayne@68
|
243 Wrote PolymerTracker.
|
jpayne@68
|
244 Added polymer count tracking to BBDuk and RQCFilter.
|
jpayne@68
|
245 Added clipfilter to Reformat.
|
jpayne@68
|
246 38.24
|
jpayne@68
|
247 Skipped this version.
|
jpayne@68
|
248 38.25
|
jpayne@68
|
249 Added maxcov flag to Tadpole.
|
jpayne@68
|
250 Seal now supports filenames without the ref= flag to allow wildcard expansion.
|
jpayne@68
|
251 Removed calcmem.sh perl dependency on Genepool, since Genepool is gone.
|
jpayne@68
|
252 Fixed a logging bug in RQCFilter.
|
jpayne@68
|
253 Added optical alias to RQCFilter.
|
jpayne@68
|
254 Modified mergesorted.sh.
|
jpayne@68
|
255 SortByName and MergeSorted buffer-resizing logic made safer.
|
jpayne@68
|
256 Fixed leftRatio calculation in Tadpole for printing in contig headers.
|
jpayne@68
|
257 Fixed an unwanted print statement in Tadpole dot generation.
|
jpayne@68
|
258 Fixed a crash in Clumpify when handling Ns.
|
jpayne@68
|
259 BBMap bloomserial now defaults to true.
|
jpayne@68
|
260 Deleted normandcorrectwrapper.sh.
|
jpayne@68
|
261 Updated removehuman, removehuman2, etc. to use Bloom filters and clarified that the scripts are for NERSC.
|
jpayne@68
|
262 Wrote PercentEncoding for translating URLs, and made it more efficient by removing String functions.
|
jpayne@68
|
263 38.26
|
jpayne@68
|
264 Improved Blacklist name translation.
|
jpayne@68
|
265 Data internmap is now faster and takes less memory.
|
jpayne@68
|
266 Made prok package for prok gene-calling.
|
jpayne@68
|
267 Moved LOGICAL_PROCESSORS to Shared to avoid an initialization order problem.
|
jpayne@68
|
268 Fixed a bug in FastaReadInputStream with buffer resizing logic.
|
jpayne@68
|
269 Disabled some assertions in BBIndex that do not appear to be valid with a long maxindel and many short contigs.
|
jpayne@68
|
270 Added nl() and tab() to ByteBuilder.
|
jpayne@68
|
271 Reduced memory prealloc request for kmer tables on high memory (>120G) nodes.
|
jpayne@68
|
272 Fixed CallVariants reporting of deletion count.
|
jpayne@68
|
273 Clarified CallVariants SamStreamer flag, and capped it at Shared.threads().
|
jpayne@68
|
274 Clarified callvariants2.sh purpose and function.
|
jpayne@68
|
275 Wrote AnalyzeGenes, CallGenes, and CompareGff.
|
jpayne@68
|
276 Added amino acid output to CallGenes.
|
jpayne@68
|
277 38.27
|
jpayne@68
|
278 Bugfixes and improvements to gene calling.
|
jpayne@68
|
279 Began adding RNA models to gene calling.
|
jpayne@68
|
280 Refactored gene-caller to allow more flexibility with models; pgm format changed.
|
jpayne@68
|
281 Adjusted default gene model.
|
jpayne@68
|
282 38.28
|
jpayne@68
|
283 Multithreaded AnalyzeGenes.
|
jpayne@68
|
284 Wrote FloatList.
|
jpayne@68
|
285 Fixed a bug in Tools.reverseInPlace for partial arrays.
|
jpayne@68
|
286 Added trimcircular flag to Tadpole to trim ends of loop-loop contigs, which are presumably circular.
|
jpayne@68
|
287 Finished tRNA and rRNA models and calling functions.
|
jpayne@68
|
288 38.28
|
jpayne@68
|
289 Fixed a bug in 3-column Sketch colors.
|
jpayne@68
|
290 38.29
|
jpayne@68
|
291 Calibration of gene models.
|
jpayne@68
|
292 Fixed a bug with chloroOutFile/fbtOutFile name in RQCFilter2.
|
jpayne@68
|
293 Sketch now allows integrated gene-calling for nucleotide to protein translation.
|
jpayne@68
|
294 Added minsize and maxsize to RepresentativeSet.
|
jpayne@68
|
295 38.30
|
jpayne@68
|
296 More calibration of gene models.
|
jpayne@68
|
297 Fixed some misassumptions in percent encoding.
|
jpayne@68
|
298 Modified GatherKapaStats to output raw data.
|
jpayne@68
|
299 Generated a minimal representation of RefSeq Microbial... achieved 80% size reduction.
|
jpayne@68
|
300 Changed the way pileup calculates coverage from soft-clipped bases; they are now ignored.
|
jpayne@68
|
301 Changed the way samtools/sambamba exclusion flags are processed to be more flexible and faster.
|
jpayne@68
|
302 Pileup now uses samtools to parse the header and sambamba to parse the reads, since sambamba is slow at reading headers.
|
jpayne@68
|
303 Added key=value pair output to pileup.
|
jpayne@68
|
304 Wrote ScoreTracker to track scores of accepted and rejected ORFs when calling genes.
|
jpayne@68
|
305 38.31
|
jpayne@68
|
306 Added long kmer support to RNA calling in CallGenes.
|
jpayne@68
|
307 Added BBMerge flags maxmismatches and forcemerge.
|
jpayne@68
|
308 Added Tadpole flag filtermem.
|
jpayne@68
|
309 38.32
|
jpayne@68
|
310 Tadpole now refuses to run with no input files.
|
jpayne@68
|
311 BBMerge now supports filtermemory flag.
|
jpayne@68
|
312 Wrote KmerFilterSetMaker and kmerfilterset.sh to generate small covering sets of kmers for use with BBDuk.
|
jpayne@68
|
313 Added silent flags to suppress screen messages from BBDuk, Reformat, and KmerTableSet-related classes.
|
jpayne@68
|
314 Added reformat padding flags.
|
jpayne@68
|
315 38.33
|
jpayne@68
|
316 Shred now validates input files.
|
jpayne@68
|
317 Reformat now has options for padding sequences.
|
jpayne@68
|
318 ****KmerFilterSet now accepts an initial kmer set.
|
jpayne@68
|
319 Wrote IntList3.
|
jpayne@68
|
320 Wrote HashArrayHybridFast.
|
jpayne@68
|
321 Changed HashArray bulk add contract.
|
jpayne@68
|
322 Back-ported HashArrayHybridFast changes to KmerNode2D.
|
jpayne@68
|
323 Seal now uses HashArrayHybrid; indexing Silva became >100x faster.
|
jpayne@68
|
324 Sketch now uses HashArrayHybrid; indexing speed increased somewhat.
|
jpayne@68
|
325 Added amino support to BBDuk.
|
jpayne@68
|
326 Added amino support to KmerCountExact.
|
jpayne@68
|
327 Added amino support to EntropyTracker.
|
jpayne@68
|
328 Modified entropy defaults for amino acid mode with Sketch(?) and BBDuk(?)
|
jpayne@68
|
329 Fixed tracking of PercentOfPairs for insert size statistics.
|
jpayne@68
|
330 CompareSketch now automatically sets the protein, fungi, or mito path on NERSC.
|
jpayne@68
|
331 Mutate.sh now works on amino acid sequences.
|
jpayne@68
|
332 Validated CompareSketch on raw reads in protein space; it works amazingly well.
|
jpayne@68
|
333 38.34
|
jpayne@68
|
334 Wrote MetagenomeDataWriter to produce some stats for Brian Foster.
|
jpayne@68
|
335 Modified PreParser and Shared to deal with determining the original command line.
|
jpayne@68
|
336 TODO: (Brian Foster) report base and read counts exactly, not rounded to the nearest million.
|
jpayne@68
|
337 Refactored and commented IntList classes.
|
jpayne@68
|
338 Added merge and ecco to CallGenes.
|
jpayne@68
|
339 Wrote MetadataWriter to allow unified reads in and out nomenclature for certain programs.
|
jpayne@68
|
340 Removed a constructor from PreParser.
|
jpayne@68
|
341 Fixed MetadataWriter for AssemblyStats.
|
jpayne@68
|
342 Added support for protein Sketch server.
|
jpayne@68
|
343 Fixed some printing errors in CallGenes.
|
jpayne@68
|
344 Added recode and retranslate to CallGenes.
|
jpayne@68
|
345 Increased SendSketch default sizemult for RefSeq and proteins to 2.2.
|
jpayne@68
|
346 38.35
|
jpayne@68
|
347 Added sketchonly flag to CompareSketch, allowing it to just sketch and write files but not actually run comparisons.
|
jpayne@68
|
348 Protein sketch server is now active.
|
jpayne@68
|
349 Added TaxTree.descendsFrom(child, parent).
|
jpayne@68
|
350 TaxTree now classifies species-attached no-rank archaeal nodes as strains in addition to bacteria.
|
jpayne@68
|
351 pigz --version is now recorded to determine whether -11 and -I flags are supported.
|
jpayne@68
|
352 Added sketch sixframes flag, for dealing with indels. This works suprisingly well but bloats the genome size. Probably the size should be divided by 6.
|
jpayne@68
|
353 Added prokprot sketch to RQCFilter.
|
jpayne@68
|
354 Sketch now ignores AA kmers spanning stop codons in sixframes mode.
|
jpayne@68
|
355 Fixed a flaw in rkmer generation following Ns, in many classes.
|
jpayne@68
|
356 Added Sketch toValue2 function to process dual kmers in an unbiased manner. This yields more accurate ANI.
|
jpayne@68
|
357 Added comparison logic for tracking k1 and k2 matches independently.
|
jpayne@68
|
358 toValue2 now handles aminos as well.
|
jpayne@68
|
359 Changed default kmer lengths from 31,0 to 32,23, and 10,7 to 11,7.
|
jpayne@68
|
360 Simplified some parts of Sketch, like removing aniFromWkid flag.
|
jpayne@68
|
361 Changed an assertion in TaxTree to a warning, because the latest version of NCBI taxdump contains errors.
|
jpayne@68
|
362 Validation of K and hash version between sketches is now more robust.
|
jpayne@68
|
363 Fixed all instances of kmer bitmasks to work correctly with k=32; prior limit was k=31.
|
jpayne@68
|
364 Added 1-bit antialiasing to Sketch hashcodes.
|
jpayne@68
|
365 Bumped hash version to 2.
|
jpayne@68
|
366 Increased amino default kmer length to 12,8 to increase specificity.
|
jpayne@68
|
367 Fixed an assertion failure in comparesketch perfile mode.
|
jpayne@68
|
368 Increased size of prokprot blacklist.
|
jpayne@68
|
369 Added Sketch refhits flag, to indicate the number of references sharing kmers with keys hitting a reference.
|
jpayne@68
|
370 Remade prokprot blacklists at a higher taxonomic level to deal with high conservation.
|
jpayne@68
|
371 Fixed an assertion with regards to sketchonly mode in comparesketch.
|
jpayne@68
|
372 avgrefhits is now weakly factored into score.
|
jpayne@68
|
373 Modified some rqcfilter2 sketch flags such as minprob.
|
jpayne@68
|
374 38.36
|
jpayne@68
|
375 Increased Sketch minprob to 0.0008. Q7 (80% accurate) areas will be used but Q6 (75%) will be ignored; before it was 0.0001 (Q6.1). This slightly increases accuracy with raw reads.
|
jpayne@68
|
376 Trimrname now works on sam headers.
|
jpayne@68
|
377 Trimrname is now automatically set to the same as trd unless explicitly overriden with the trimrname flag.
|
jpayne@68
|
378 Added small RNA adapters to adapters.fa (thanks to Daniel N.)
|
jpayne@68
|
379 Sketch now reports the number of unique kmers indexed.
|
jpayne@68
|
380 BBTools can now read embl and gbk formats.
|
jpayne@68
|
381 Added support for subcohort taxonomic level.
|
jpayne@68
|
382 38.37
|
jpayne@68
|
383 Fixed a bug in BBDuk JSON readsOut reporting.
|
jpayne@68
|
384 BBSketch format 3 now prints taxID.
|
jpayne@68
|
385 Fixed broken qin flag (was being overriden by autodetection).
|
jpayne@68
|
386 Improved quality autodetection for out-of-range quality scores.
|
jpayne@68
|
387 FastqReadInputStream now correctly inherits interleaving from FileFormat rather than running internal tests.
|
jpayne@68
|
388 Added JsonParser.parseJsonObjectStatic.
|
jpayne@68
|
389 Added Blacklist.toBlacklist.
|
jpayne@68
|
390 Added SendSketch.toAddress, .setFromAddress, and .sendSketch (static).
|
jpayne@68
|
391 Simplified SendSketch parsing.
|
jpayne@68
|
392 TestFormat now automatically tries to detect organism with SendSketch.
|
jpayne@68
|
393 ReadStats bhist is now faster by formatting with ByteBuilder.
|
jpayne@68
|
394 Added TestFormat bhistlen flag to disable gigantic bhists.
|
jpayne@68
|
395 38.38
|
jpayne@68
|
396 Fixed a parsing error in SendSketch.
|
jpayne@68
|
397 Wrote docs/RestartingServers.txt
|
jpayne@68
|
398 Fixed CallGenes load failure whith under 9 threads.
|
jpayne@68
|
399 Added a 100k limit to SendSketch queries per instance, and added reference tars to the website.
|
jpayne@68
|
400 Increased buffer sizes of SendSketch.
|
jpayne@68
|
401 Reduced number of threads per session for Sketch servers.
|
jpayne@68
|
402 Added trackers for number of Sketches processed, bytes received, and bytes sent to Sketch server.
|
jpayne@68
|
403 38.39
|
jpayne@68
|
404 Fixed a bug in phist (required polysymbol to be set).
|
jpayne@68
|
405 Fixed a bug in BBDuk amino mode (failure to support k=12).
|
jpayne@68
|
406 Fixed a bug in bhist (no newlines!).
|
jpayne@68
|
407 Sketch and Tax servers now tracks single versus bulk queries.
|
jpayne@68
|
408 Converted several ReadStats histograms from TextStreamWriter to ByteStreamWriter.
|
jpayne@68
|
409 38.40
|
jpayne@68
|
410 Replaced some obsolete StringBuilder methods (mainly for read printing) with ByteBuilder.
|
jpayne@68
|
411 Deleted obsolete classes ReadStreamStringWriter and SortByMapping.
|
jpayne@68
|
412 Replaced many instances of StringBuilder with ByteBuilder.
|
jpayne@68
|
413 Moved some fields from Gene to Shared.
|
jpayne@68
|
414 Made Header class.
|
jpayne@68
|
415 Fixed a float-to-int rounding-down problem making BBMerge not strictly obey the maxmismatches flag.
|
jpayne@68
|
416 Redid RandomReads naming format to be pair-capable in sam format.
|
jpayne@68
|
417 Converted all known header-parsing functions to use the new format.
|
jpayne@68
|
418 Wrote SuperLongList.toString
|
jpayne@68
|
419 Added Reformat prioritizelength flag for subsampling variable-length reads.
|
jpayne@68
|
420 Fixed trailing whitespace in bhist.
|
jpayne@68
|
421 38.41
|
jpayne@68
|
422 Fixed a compile error.
|
jpayne@68
|
423 38.42
|
jpayne@68
|
424 Wrote SubSketch and subsketch.sh to pull partial sketches out of larger sketches (e.g. to shrink RefSeq).
|
jpayne@68
|
425 Added stats handler to TaxServer, with version and quantity tracking.
|
jpayne@68
|
426 Added bbversion field to sendsketch header.
|
jpayne@68
|
427 Fixed SendSketch address parsing.
|
jpayne@68
|
428 Added p and q suffixes to parseKMG.
|
jpayne@68
|
429 Added PacBio read length modelling to RandomReads.
|
jpayne@68
|
430 Fixed a CallVariants assertion with SamLine.RNAME_AS_BYTES.
|
jpayne@68
|
431 Fixed major bug in vcf line reading, misinterpreting variant types, preventing BBDuk from parsing vcf properly.
|
jpayne@68
|
432 Wrote SamStreamerMF, a multifile SamStreamer.
|
jpayne@68
|
433 Integrated SamStreamerMF into CallVariants. Now, with 8 sam.gz files, CallVariants is about 5x as fast on a 32-core node.
|
jpayne@68
|
434 Fixed CallVariants vcf output MCOV reporting -1 when out= is set instead of vcf=.
|
jpayne@68
|
435 Fixed ihist not working in BBDuk.
|
jpayne@68
|
436 38.43
|
jpayne@68
|
437 Wrote var2.VarKey for hashing. May not use it.
|
jpayne@68
|
438 Added indel processing to fixVars, and Read.containsVars().
|
jpayne@68
|
439 Fixed bugs in reading insertions from VCF files.
|
jpayne@68
|
440 TaxServer usage no longer displays stats (stats are on the /stats page).
|
jpayne@68
|
441 Added ref flag to CompareVCF.
|
jpayne@68
|
442 Added shist to FilterVCF (for vars passing filter).
|
jpayne@68
|
443 FilterVCF no longer requires a reference (in most cases) if the VCF has a correct header.
|
jpayne@68
|
444 CallVariants modified to reduce negative impact of strand bias and read bias on score, in cases that otherwise appear fine.
|
jpayne@68
|
445 Demuxbyname can now do 1 file per sequence header, but it does not close the streams as soon as a sequence is written. This would be better as a custom program.
|
jpayne@68
|
446 Removed a mysterious automatic newline from Read.toSam(bb).
|
jpayne@68
|
447 Wrote CoverageArray3A, Atomic version.
|
jpayne@68
|
448 Added atomic flag to CallVariants, which increases speed by up to 300 percent.
|
jpayne@68
|
449 Increased speed of multithreaded coverage calculation even without atomic flag.
|
jpayne@68
|
450 Fixed stranded coverage default to false.
|
jpayne@68
|
451 Added CoverageArray.incrementRangeSynchronized.
|
jpayne@68
|
452 CallVariants trackstrand now correctly defaults to false, which disables the DP4 field.
|
jpayne@68
|
453 CalcTrueQuality should now ignore indels declared in a VCF.
|
jpayne@68
|
454 38.44
|
jpayne@68
|
455 Fixed a bug in Tools.parseKMG.
|
jpayne@68
|
456 Added qualhist to CallVariants.
|
jpayne@68
|
457 Added code in CallVariants to deal with recalibrated base quality.
|
jpayne@68
|
458 CallVariants no longer needs ref= prefix before fasta reference.
|
jpayne@68
|
459 FilterVCF can now split alleles.
|
jpayne@68
|
460 Modified mutate.sh to allow variable-length indels, and not put them too close together (to allow better grading).
|
jpayne@68
|
461 Major: Fixed BBDuk/Seal/Clumpify issue in failure to correctly reverse-complement some kmers.
|
jpayne@68
|
462 38.45
|
jpayne@68
|
463 Last restarted timestamp fixed for TaxServer stats page.
|
jpayne@68
|
464 Clarified randomreads.sh description of generating twin files versus interleaved.
|
jpayne@68
|
465 Added Read.countVars, CallVariants.findUniqueVars.
|
jpayne@68
|
466 Added support for indels and border to FilterSam.
|
jpayne@68
|
467 CallVariants can now force calls of specific alleles with an input vcf.
|
jpayne@68
|
468 VarMap is now iterable over values.
|
jpayne@68
|
469 Modified ShrinkAccession to optionally retain GI numbers.
|
jpayne@68
|
470 Fixed VCF genotype call of 1 for haploids failing filters.
|
jpayne@68
|
471 Updated GiToNcbi to read gi numbers from accession files since gi files will disappear soon.
|
jpayne@68
|
472 Clarified bbduk.sh comment on maxlength.
|
jpayne@68
|
473 Added unzip.sh script.
|
jpayne@68
|
474 Split Sketch displayfname into rfname and qfname.
|
jpayne@68
|
475 Fixed file column being enabled by default for sendsketch.
|
jpayne@68
|
476 Changed VarMap WAYS to 8, allowing 16 billion variants.
|
jpayne@68
|
477 Short match strings no longer generate consecutive symbols like mm because it is hard to parse.
|
jpayne@68
|
478 MSA.score() now accepts short or long match strings.
|
jpayne@68
|
479 CallVariants no longer generates long match strings prior to trimming, for perfect matches; 5-10% faster.
|
jpayne@68
|
480 FilterVCF can now split long substitutions into SNPs with the splitsubs flag.
|
jpayne@68
|
481 Fixed CalcTrueQuality ploidy unset warning.
|
jpayne@68
|
482 Add ls to testfilesystem. May be inaccurate due to cache effects.
|
jpayne@68
|
483 Added amino acid codes B and Z, mapped to ANY (same as X).
|
jpayne@68
|
484 CallVariants now integrated into FilterSam.
|
jpayne@68
|
485 BBCMS now supports sam files, if error-correction is disabled (depth filtering is allowed).
|
jpayne@68
|
486 Added some columns to CallVariants screen output for average allele depth.
|
jpayne@68
|
487 Added taxonomic levels series and section.
|
jpayne@68
|
488 Added RenameGiToTaxid badheaders flag for logging.
|
jpayne@68
|
489 Added RenameGiToTaxid maxbadheaders flag for early termination when exceeded, and included it in the download scripts (at 5000 since recent nt contains 2440 headers with no TaxID).
|
jpayne@68
|
490 Removed sharedVarMap from CallVariants2; replace with forcedVars1 or forcedVars2 for the two passes.
|
jpayne@68
|
491 FungalRelease agp generation now uses ByteStreamWriter over tsw and Read.breakAtGaps uses ByteBuilder over sb to save memory.
|
jpayne@68
|
492 Fully commented MSA11ts fullUnlimited.
|
jpayne@68
|
493 38.46
|
jpayne@68
|
494 Added Unzip.java and fixed unzip.sh. It is pretty resource-intensive, though, for a program that does nothing. This is possible to improve.
|
jpayne@68
|
495 Added KID and WKID to Sketch format 3, and flags to disable them.
|
jpayne@68
|
496 CompareVCF now prints results to screen correctly when there is no output file.
|
jpayne@68
|
497 TaxServer now defaults to 200k max reads in local mode.
|
jpayne@68
|
498 In local mode, TaxServer no longer reads files with pigz.
|
jpayne@68
|
499 FilterVCF now correctly observes del and ins flags.
|
jpayne@68
|
500 Added Var.COMPOUND type for multiallelic variations.
|
jpayne@68
|
501 Added VCFLine.trimPrefix() and trimSuffix().
|
jpayne@68
|
502 Fixed bugs in trimToCanonical handling of compound variations.
|
jpayne@68
|
503 VCFLines split by allele now split INFO fields as well.
|
jpayne@68
|
504 Wrote demuxbyname2, to support massively multiplexed Novaseq runs.
|
jpayne@68
|
505 Splitting alleles now also splits the info field of VCFLines.
|
jpayne@68
|
506 38.47
|
jpayne@68
|
507 Added demuxbyname2 hamming distance support.
|
jpayne@68
|
508 Renamed Var.COMPOUND to Var.MULTI and added Var.COMPLEX.
|
jpayne@68
|
509 Modified demuxbyname2.sh to use pigz.
|
jpayne@68
|
510 Increased compiler error level (@Override, shadowing) and fixed resulting errors.
|
jpayne@68
|
511 Wrote MultiCros3, which supports concurrent streams; this makes DemuxByName2 faster.
|
jpayne@68
|
512 Made BufferedMultiCross an abstract superclass of MultiCros2 and MultiCros3.
|
jpayne@68
|
513 38.48
|
jpayne@68
|
514 Added samline field to Read. obj field is no longer used for SamLines. Caused substantial refactoring; may have introduced bugs when processing sam files (they will not be subtle if present).
|
jpayne@68
|
515 BufferedMultiCross now offers a threaded mode, but this has not improved performance.
|
jpayne@68
|
516 BufferedMultiCross now supports minReadsToDump and puts residual reads into unknown.
|
jpayne@68
|
517 Fixed DemuxByName2 hamming distance code, and improved it to only remove colliding keys.
|
jpayne@68
|
518 38.49
|
jpayne@68
|
519 Fully commented DemuxByName2, BufferedMultiCros, MultiCros2, and MultiCros3.
|
jpayne@68
|
520 Fixed a bug in MultiCros3 that created some duplicate reads. Speed is now >950MB/s for twin files.
|
jpayne@68
|
521 38.50
|
jpayne@68
|
522 Added bgzip control flags and version parsing.
|
jpayne@68
|
523 .vcf.gz files now default to being written and read by bgzip.
|
jpayne@68
|
524 All gzip files now default to being read with bgzip over pigz.
|
jpayne@68
|
525 Non-vcf files will only be written with bgzip if the bgzip flag is added (for now).
|
jpayne@68
|
526 Added alternate Sketch addresses via vm flag.
|
jpayne@68
|
527 minProb and minQual moved from SketchObject to DisplayParams, requiring the modification of many methods.
|
jpayne@68
|
528 Simplified some Sketch method signatures by allowing DisplayParams to substitute for multiple parameters.
|
jpayne@68
|
529 Added Locale to all String formatting without it.
|
jpayne@68
|
530 Refactored DemuxByName2.
|
jpayne@68
|
531 Improved commenting of DemuxByName2 and related classes.
|
jpayne@68
|
532 Added PacBio subread support to PartionReads (partition.sh).
|
jpayne@68
|
533 Disabled ByteFile1 being forced outside of JGI. ByteFile2 caused some problems, but those should be resolved now, I think...
|
jpayne@68
|
534 Added loglog and barcode flags to DemuxByName2.
|
jpayne@68
|
535 Fixed order of SendSketch setting server address to allow alternate (VM) server use.
|
jpayne@68
|
536 Fixed DemuxByName2 order of parsing parser args, allowing the barcode flag to trigger.
|
jpayne@68
|
537 Unified DemuxByName2 modes under a single mode field.
|
jpayne@68
|
538 Fixed maxrecords not being observed in Sketch JSON format.
|
jpayne@68
|
539 TaxServer sketch handler now does full parsing of URL arguments.
|
jpayne@68
|
540 Added D3 support to Sketch results.
|
jpayne@68
|
541 38.51
|
jpayne@68
|
542 Changed handling of same-name JSON keys; by default they are now replaced.
|
jpayne@68
|
543 Improved Sketch D3 output - added more keys, fixed depth handling.
|
jpayne@68
|
544 Subprocess testing now returns false for exit codes 126 and higher (missing libraries yield 127).
|
jpayne@68
|
545 Turned bgzip and pigz on by default for all programs.
|
jpayne@68
|
546 Made bgzip the default for RQCFilter.
|
jpayne@68
|
547 Modified TaxServer sketch portion to prevent carryover of parameters from subsequent queries.
|
jpayne@68
|
548 Fixed Sketch header reporting observed depth as actual depth.
|
jpayne@68
|
549 Wrote IceCreamFinder, IceCreamAligner, and icecreamfinder.sh.
|
jpayne@68
|
550 Wrote A_Sample_Generator, IceCreamMaker, and icecreammaker.sh.
|
jpayne@68
|
551 Moved A_Sample classes to new templates package.
|
jpayne@68
|
552 Changed some new Random() calls to Shared.threadLocalRandom().
|
jpayne@68
|
553 Added jsonarrays flag to Sketch.
|
jpayne@68
|
554 Wrote IceCreamGrader and icecreamgrader.sh.
|
jpayne@68
|
555 Renamed demuxbyname2.sh to demuxbyname.sh.
|
jpayne@68
|
556 38.52
|
jpayne@68
|
557 Made IceCreamFinder ~50% faster by debranching loops and optimizing cache footprint.
|
jpayne@68
|
558 Added IceCreamFinder junction output.
|
jpayne@68
|
559 Simplified shell scripts by centralizing path-setting commands.
|
jpayne@68
|
560 Moved JNI library loading to Shared.
|
jpayne@68
|
561 Wrote IceCreamAligner JNI version.
|
jpayne@68
|
562 38.53
|
jpayne@68
|
563 Made IceCreamAligner JNI faster by adding functions for all alignments and adding 16-bit versions.
|
jpayne@68
|
564 Fixed bugs in calcmem.sh path setting and module loading on Cori.
|
jpayne@68
|
565 Automated jni library path setting (-Djava.library.path flag is no longer required).
|
jpayne@68
|
566 Disabled BBMerge attempt to load JNI libraries.
|
jpayne@68
|
567 Added magic number detection for .gz files.
|
jpayne@68
|
568 Disabled bgzip reading of non-bgzip .gz files, awaiting new bgzip release, because current bgzip breaks on concatenated gzip files (supposedly addressed after v1.9).
|
jpayne@68
|
569 38.54
|
jpayne@68
|
570 Changed a method to avoid a Java 11 dependency.
|
jpayne@68
|
571 Added ZMW stats to IceCreamFinder.
|
jpayne@68
|
572 Added preliminary adapter detection to IceCreamFinder.
|
jpayne@68
|
573 38.55
|
jpayne@68
|
574 Fixed a JNI bug in RQCFilter with BBMerge.
|
jpayne@68
|
575 38.56
|
jpayne@68
|
576 Improved and accelerated IceCreamFinder adapter detection.
|
jpayne@68
|
577 Reduced discarding of reads with adapters only at the tips.
|
jpayne@68
|
578 38.57
|
jpayne@68
|
579 Greatly improved IceCreamFinder adapter detection sensitivity by aligning to more reads.
|
jpayne@68
|
580 Increased speed of adapter aligner.
|
jpayne@68
|
581 Added less-specific adapter-screening phases to reduce calls to the adapter aligner.
|
jpayne@68
|
582 Added ambig output stream and changed the logic for determining ambiguous inverted repeats.
|
jpayne@68
|
583 Adapter-containing inverted repeats no longer go to junctions output.
|
jpayne@68
|
584 Improved timeless adapter aligner and made it default.
|
jpayne@68
|
585 Added start location to low bits of timeless aligner score, but it does not seem to work.
|
jpayne@68
|
586 38.58
|
jpayne@68
|
587 Fixed PreParser failure when encountering a standalone equals sign.
|
jpayne@68
|
588 Fixed a bug in automatically setting Sketch blacklists for known databases.
|
jpayne@68
|
589 Updated server-starting shellscripts to point to the new URLs.
|
jpayne@68
|
590 Renamed Missing Adapter as Absent Adapter.
|
jpayne@68
|
591 Changed ambiguity logic to better classify reads when there are 2 passes.
|
jpayne@68
|
592 Adapter alignment is slightly more lenient when an inverted repeat is detected.
|
jpayne@68
|
593 Slightly accelerated adapter detection by changing conditionals to array lookups in the inner loop.
|
jpayne@68
|
594 SendSketch can now load TaxTree.
|
jpayne@68
|
595 Increased Sketch number of comparisons returned, to compensate for potential losses during TaxFilter.
|
jpayne@68
|
596 38.59
|
jpayne@68
|
597 Added json output and stats redirection to IceCreamFinder.
|
jpayne@68
|
598 Added preliminary SamStreamer support to IceCreamFinder.
|
jpayne@68
|
599 SamStreamer now supports a limited number of reads.
|
jpayne@68
|
600 Added libbbtools.dylib (Mac version) to jni folder. Thanks Jie Wang for compiling it!
|
jpayne@68
|
601 Updated makefile.osx and jni readme.
|
jpayne@68
|
602 CoveragePileup now detects and aborts when a scaffold is specified multiple times with different lengths.
|
jpayne@68
|
603 Added ByteBuilder.print(float x, int decimals).
|
jpayne@68
|
604 Added asrhist and irsrhist to IceCreamFinder.
|
jpayne@68
|
605 Fixed an unnecessary array copy in adapter detection; X is now properly added to reads with adapters detected.
|
jpayne@68
|
606 Added trim support to IceCreamFinder.
|
jpayne@68
|
607 38.60
|
jpayne@68
|
608 maxReads is now a required parameter for SamStreamer; this allows acceleration of some other tools when reads are limited.
|
jpayne@68
|
609 Redid Sketch taxfilter. Now there are two different taxfilters, white (include) and black (exclude). The flags have changed.
|
jpayne@68
|
610 Organism names are now acceptable for TaxFilter.
|
jpayne@68
|
611 JNI mode for IceCreamFinder and BBMap is now automatic on NERSC or Mac/Linux AMD64 systems.
|
jpayne@68
|
612 Moved OS/CPU environment detection from Data to Shared.
|
jpayne@68
|
613 Restarted Sketch servers; they will no longer handle the old taxonomy filtering flags.
|
jpayne@68
|
614 Added reformat complement flag.
|
jpayne@68
|
615 Fixed spelling of complement in some cases.
|
jpayne@68
|
616 Added Sketch taxID to Sketch lookup table.
|
jpayne@68
|
617 Added Sketch server reference mode.
|
jpayne@68
|
618 Sketch taxonomy and metadata filtering are now handled by DisplayParams, and done prior to comparison, exactly once, and in threads.
|
jpayne@68
|
619 38.61
|
jpayne@68
|
620 Added Sketch KID, WKID, and hits comparators.
|
jpayne@68
|
621 Revised TaxonomyGuide.txt.
|
jpayne@68
|
622 Wrote ThreadWaiter and simplified A_SampleMT.
|
jpayne@68
|
623 Fixed an accidental use of bgzip for decompression.
|
jpayne@68
|
624 Fixed an erroneous error message (header with no bases) from splitting reads of target length in FastaReadInputStream.
|
jpayne@68
|
625 Added fixsra and addpairnum flags to RenameReads.
|
jpayne@68
|
626 Modified and moved ncbi and sketch scripts to pipelines/fetch and pipelines/server.
|
jpayne@68
|
627 38.62
|
jpayne@68
|
628 Added ACGT count tracking and printgc to Sketch.
|
jpayne@68
|
629 Sketch JSON format now caps decimals places of some numbers.
|
jpayne@68
|
630 MergeSorted can now use subprocess for decompression.
|
jpayne@68
|
631 Added linear sketch sizing via the density flag.
|
jpayne@68
|
632 Added polyploid support for MutateGenome (ploidy and hetrate flags).
|
jpayne@68
|
633 Added nohomopolymers flag to MutateGenome.
|
jpayne@68
|
634 Updated calcmem.sh with path for pigz.
|
jpayne@68
|
635 Revised fetch pipeline scripts again.
|
jpayne@68
|
636 Added plasmids to prokprot; removed viroids which no longer exist.
|
jpayne@68
|
637 Sambamba should no longer print the banner.
|
jpayne@68
|
638 Wrote FetchProks and fetchproks.sh, for downloading one genome assembly and gff per prokaryotic genus.
|
jpayne@68
|
639 Updated model.pgm with all archaea and one bacteria per genus.
|
jpayne@68
|
640 Deleted spurious copy of GffLine.
|
jpayne@68
|
641 Split VcfToGff off of GffLine.
|
jpayne@68
|
642 Moved Gff-related classes to gff package.
|
jpayne@68
|
643 Wrote GbffFile, GbffLocus, and GbffFeature.
|
jpayne@68
|
644 Fixed equals method in StringNum.
|
jpayne@68
|
645 Rewrote CompareGff to take sequence name and type into account.
|
jpayne@68
|
646 Generated pgms for plastid and plasmid, but they made bacterial calling worse.
|
jpayne@68
|
647 Enabled 5S long kmer support (9-mers) for CallGenes. Might be worthwhile ignoring the 1-count kmers.
|
jpayne@68
|
648 38.63
|
jpayne@68
|
649 Gene-calling long kmers are now uncompressed.
|
jpayne@68
|
650 tRNA and 5S now use 10-mers instead of 9-mers; plastid, plasmid, and viral sources are included.
|
jpayne@68
|
651 Fixed some remaining crash bugs from adding GC content to Sketches.
|
jpayne@68
|
652 Updated RefSeq protein sketching pipeline.
|
jpayne@68
|
653 38.64
|
jpayne@68
|
654 Added TaxTree methods for determining if a node descends from unclassified or environmental samples.
|
jpayne@68
|
655 Added banUnclassified and banVirus Sketch flags.
|
jpayne@68
|
656 Reduced TaxServer startup time by around 60% by multithreading AccessionToTaxid per-file reading.
|
jpayne@68
|
657 Wrote GlocalAligner to perform flat alignments for SSU identity calculation, and integrated it into sketch.Comparison.
|
jpayne@68
|
658 Wrote AddSSU and addssu.sh.
|
jpayne@68
|
659 Wrote AlignmentJob and AlignmentThreadPool to maintain a growable, limited pool of shared threads for aligning SSUs.
|
jpayne@68
|
660 Added GeneModel length restrictions for RNAs based on empirical data.
|
jpayne@68
|
661 IceCreamFinder trim flag now adjusts read coordinates in headers.
|
jpayne@68
|
662 Fixed a bug where IceCreamFinder trimming SamLines corrupted their quality.
|
jpayne@68
|
663 CallVariants now automatically converts IUPAC symbols to N; before it crashed when encountering them.
|
jpayne@68
|
664 iupacton flag now works on all programs and happens during read validation.
|
jpayne@68
|
665 38.65
|
jpayne@68
|
666 Updated FindPrimers to use a different, faster aligner and perform reverse alignments.
|
jpayne@68
|
667 38.66
|
jpayne@68
|
668 Removed a debug print statement.
|
jpayne@68
|
669 Fixed a crash bug in RepresentativeSet.
|
jpayne@68
|
670 38.67
|
jpayne@68
|
671 Fixed some ssu-related issues in Sketch.
|
jpayne@68
|
672 CompareSketch and SendSketch no longer try to grab SSUs if printssu=f.
|
jpayne@68
|
673 38.68
|
jpayne@68
|
674 Added subkmer support for BloomFilter via kbig.
|
jpayne@68
|
675 Added subkmer support for BloomFilterCorrector via ksmall.
|
jpayne@68
|
676 BBCMS subkmer support improves accuracy at very high load.
|
jpayne@68
|
677 Improved FetchProks to prefer reference assemblies when available.
|
jpayne@68
|
678 FetchProks now retries when connections time out.
|
jpayne@68
|
679 Added maxidfilter to Reformat.
|
jpayne@68
|
680 Wrote A_SampleSamStreamer.
|
jpayne@68
|
681 Added comma() and under() to ByteBuilder.
|
jpayne@68
|
682 Multithreaded AnalyzeAccession; now roughly 4x faster (190 -> 42 seconds). Speed limited by largest file.
|
jpayne@68
|
683 Wrote consensus package (BaseGraphPart, BaseNode, BaseGraph, ConsensusMaker) and consensus.sh.
|
jpayne@68
|
684 Added SamLine.calcIdentity().
|
jpayne@68
|
685 Added SamFilter minid and maxid flags.
|
jpayne@68
|
686 Added SamLine leftmost() method.
|
jpayne@68
|
687 Wrote FixScaffoldGaps and fixgaps.sh for resizing scaffold gaps based on mapped pair insert sizes.
|
jpayne@68
|
688 IceCreamFinder now discards short reads after trimming.
|
jpayne@68
|
689 Added ConsensusMaker mindepth, mafsub, mafdel, mafins, mafn, and usemapq flags.
|
jpayne@68
|
690 Wrote Lilypad and lilypad.sh (scaffolder).
|
jpayne@68
|
691 Tested ConsensusMaker with Quast and changed default parameters.
|
jpayne@68
|
692 Added ConsensusMaker nonly and noindels flags.
|
jpayne@68
|
693 Fixed deletions sometimes being counted as bases in pileup.sh when delcov=f.
|
jpayne@68
|
694 Fixed a depth-0 assertion error in Tadpole.
|
jpayne@68
|
695 Added popbubbles to Tadpole.
|
jpayne@68
|
696 38.69
|
jpayne@68
|
697 Improved Tadpole bubble-popping and fixed some assertions.
|
jpayne@68
|
698 Multithreaded FindPrimers (msa.sh).
|
jpayne@68
|
699 Improved speed of SingleStateAligner by removing mode tracking (maybe 15% impact). But it does not produce correct tracebacks.
|
jpayne@68
|
700 Fixed a bug in FindPrimers with not swapping bases in the output.
|
jpayne@68
|
701 FindPrimers in swap mode now produces sam headers.
|
jpayne@68
|
702 Added multiple debubbling passes; may incur small misassemblies (default off).
|
jpayne@68
|
703 Added dead-end debranching; may incur small misassemblies (default off).
|
jpayne@68
|
704 PopBubbles now resets LOOP end conditions if new loops are created.
|
jpayne@68
|
705 BBSketch now automatically disables minprob if 0-quality reads are detected (due to PacBio).
|
jpayne@68
|
706 Fixed some scenarios in which Sketch would not spawn an adequate number of threads, particularly for long reads.
|
jpayne@68
|
707 TestFormat (testformat2.sh) now counts and makes a histogram of ZMWs, and provides better taxonomic identification for PacBio.
|
jpayne@68
|
708 38.70
|
jpayne@68
|
709 Fixed redundant comparisons in FindPrimers.
|
jpayne@68
|
710 Fixed incorrect reverse-complement in FindPrimers.
|
jpayne@68
|
711 Added Reformat flipsam flag, to disable flipping sam records into the correct orientation upon loading.
|
jpayne@68
|
712 Fixed a consensus bug in BaseNode.
|
jpayne@68
|
713 Added trimdepthfraction and trimns to ConsensusMaker.
|
jpayne@68
|
714 Added alternate aligner (msa2 flag) to FindPrimers.
|
jpayne@68
|
715 Fixed a bug in FindPrimers regarding max columns needed in swap mode.
|
jpayne@68
|
716 Wrote scripts for automatically determining consensus sequences of ribosomal components.
|
jpayne@68
|
717 SingleStateAlignerFlat now correctly produces N/m/S or M/=/X symbols in match/cigar strings.
|
jpayne@68
|
718 SingleStateAlignerFlat2 now works correctly.
|
jpayne@68
|
719 Added flags controlling RNA consensus alignment for CallGenes.
|
jpayne@68
|
720 Wrote 1D version of FlatAligner.
|
jpayne@68
|
721 Fixed SketchMaker running out of memory with SSU alignments:
|
jpayne@68
|
722 Prevented rRNA candidates from being created at over 8x the expected length (retries with an increased bias).
|
jpayne@68
|
723 Prevented rRNA alignments from being performed at over 15x expected length. Seems to mainly affect 16S search in plants.
|
jpayne@68
|
724 Adjusted RNA score cutoffs and adjusted algorithm for RNA gene-calling; more rRNAs and tRNAs will be called now.
|
jpayne@68
|
725 MergePGM now supports a per-file multiplication factor for asymmetric merges (e.g. in=bact.pgm@1,arch.pgm@5).
|
jpayne@68
|
726 Began adding 18S support to CallGenes.
|
jpayne@68
|
727 AnalyzeGenes now supports the alignribo flag, enabling it to discard
|
jpayne@68
|
728 Refactored Prok package, adding ProkObject with statics.
|
jpayne@68
|
729 LSU and SSU start and stop slop are all independently configurable, and optimized.
|
jpayne@68
|
730 Rewrote CutGff:
|
jpayne@68
|
731 CutGff is now multithreaded per file.
|
jpayne@68
|
732 CutGff now supports alignment.
|
jpayne@68
|
733 CutGff can rename by taxID.
|
jpayne@68
|
734 FetchProks now attempts to find the best asembly of each species (longest scaffold).
|
jpayne@68
|
735 Connection retries for FetchProks now wait for an increasing amount of time after each failure.
|
jpayne@68
|
736 Many other changes; to be documented.
|
jpayne@68
|
737 38.71
|
jpayne@68
|
738 Fixed an issue with containments of paired reads in Clumpify.
|
jpayne@68
|
739 Started AWS servers for Taxonomy and Sketch, and added code to enable them (in SendSketch, etc) via the aws flag.
|
jpayne@68
|
740 Some default resource paths are now set automatically based on the presence of the environment variable EC2_HOME.
|
jpayne@68
|
741 Added seed flag to AnalyzeFlowCell (filterbytile.sh).
|
jpayne@68
|
742 38.72
|
jpayne@68
|
743 SortByName now deletes recursive temp files after a full pass rather than incrementally, for easier resuming.
|
jpayne@68
|
744 Added BBDuk entropytrim flag.
|
jpayne@68
|
745 Increased buffer limits for SortByName; now it should rely on the data limit rather than sequence limit.
|
jpayne@68
|
746 Improved available memory estimation.
|
jpayne@68
|
747 Modified SortByName memory management to hopefully avoid merging too many files simultaneously.
|
jpayne@68
|
748 Added Unite support for taxonomy header parsing.
|
jpayne@68
|
749 Reduced BlacklistMaker memory consumption to prevent crashes with long sequences, by reducing buffers and threads. This may reduce speed slightly.
|
jpayne@68
|
750 Reduced default Sketch keyfraction from 0.2 to 0.16. This should only impact tiny partial ribo sequences which are not very useful, but make blacklists more efficient.
|
jpayne@68
|
751 Moved BBTool_ST to templates and changed access modifiers of some methods.
|
jpayne@68
|
752 Fixed EntropyTracker failing at k=1.
|
jpayne@68
|
753 38.73
|
jpayne@68
|
754 Halved MSA2PBA aligner weights to allow longer sequences (like ITS).
|
jpayne@68
|
755 Added Sketch requiressu flag.
|
jpayne@68
|
756 Moved IntListCompressor to its own file in structures.
|
jpayne@68
|
757 Wrote BlacklistMaker2, which makes blacklists from sketches rather than sequences.
|
jpayne@68
|
758 Modified SubSketch to do bulk operations (on # symbol in filenames), apply a blacklist, and use autosize.
|
jpayne@68
|
759 More Sketch output columns use KMG for big numbers.
|
jpayne@68
|
760 Modified Sketch scripts to use sketchblacklist2.sh and subsketch.sh, plus merged genus/family blacklists. Blacklists may be a bit too strict now.
|
jpayne@68
|
761 38.74
|
jpayne@68
|
762 Trimmed ~300ms off Sketch startup by eliminating redundant hash mask generation and antialiasing.
|
jpayne@68
|
763 Added Sketch printcommonancestor and printcommonancestorlevel flags.
|
jpayne@68
|
764 Fixed display of decimal numbers in json.
|
jpayne@68
|
765 Fixed json array output form for multiple query sketches.
|
jpayne@68
|
766 Fixed SendSketch perfile mode losing input file order.
|
jpayne@68
|
767 Added DoubleList and related static functions.
|
jpayne@68
|
768 Wrote AnalyzeSketchResults.java.
|
jpayne@68
|
769 Added Sketch recordsperlevel flag.
|
jpayne@68
|
770 Added multiple Sketch output file option in perfile mode.
|
jpayne@68
|
771 Refactored AnalyzeSketchResults by spinning off ResultLineParser, Record, and RecordSet into files.
|
jpayne@68
|
772 Added AnalyzeSketchResults shrinkonly mode to remove unnecessary records.
|
jpayne@68
|
773 MergeRibo now makes a consensus per taxID. It and selects the best sequence on the basis of alignment to that consensus.
|
jpayne@68
|
774 Added align() to BaseGraph.
|
jpayne@68
|
775 Wrote CompareSSU for all-to-all SSU alignment.
|
jpayne@68
|
776 Multithreaded AnalyzeSketchResults alignment phase.
|
jpayne@68
|
777 Threads now accepts floats like 0.5 to use half of the logical processors.
|
jpayne@68
|
778 FetchProks now puts the taxID in the filename, which AnalyzeSketchResults now expects (but does not require).
|
jpayne@68
|
779 ukmer package now observes rcomp, thus KmerCountExact now works for k>31 with rcomp=f.
|
jpayne@68
|
780 TaxTree will now hash names on demand, allowing flags like exclude=Thermus for CompareSketch.
|
jpayne@68
|
781 RenameGiToTaxid should now produce proper statistics for gff files.
|
jpayne@68
|
782 gi2taxid.sh now reports the cause when it aborts due to bad headers.
|
jpayne@68
|
783 38.75
|
jpayne@68
|
784 DemuxByName should be fixed now, via disabling the closefast option.
|
jpayne@68
|
785 Added CompareSketch refseqbig and prokprotbig flags for internal use.
|
jpayne@68
|
786 Adjusted size and sensitivity of nt, RefSeq, and prokprot blacklists.
|
jpayne@68
|
787 Rewrote Sketch Comparison score calculation to give more weight to hit count, particularly for low hit counts.
|
jpayne@68
|
788 Added printssulen flag.
|
jpayne@68
|
789 SubSketch no longer sets the filename field to an input sketch file.
|
jpayne@68
|
790 Implemented comparison SSU printing in json format (previously it was just for the query).
|
jpayne@68
|
791 38.76
|
jpayne@68
|
792 Wrote prok.SplitRibo and splitribo.sh for splitting mixed sequence types (e.g. Silva) into individual files.
|
jpayne@68
|
793 Added an exception to reporting an error when terminating a subprocess with exit code 141 (sigpipe).
|
jpayne@68
|
794 Split Sketch SSU tracking into 16S and 18S; 18S is chosen when available, and displayed with a * symbol.
|
jpayne@68
|
795 Added MergeRibo 16S and 18S flags, and made them mutually exclusive.
|
jpayne@68
|
796 stdin and .sketch should no longer appear in the sketch filename field.
|
jpayne@68
|
797 SubSketch now retains counts.
|
jpayne@68
|
798 SketchMaker now calls CompareSketch for ONE_SKETCH mode in addition to PER_FILE.
|
jpayne@68
|
799 sketch.sh now supports multiple files in persequence mode.
|
jpayne@68
|
800 Removed/disabled perheader mode as it was not clear how it differed from persequence, and it was not documented anywhere.
|
jpayne@68
|
801 Fixed missing tabs in stats.sh format=4.
|
jpayne@68
|
802 Added ConsensusMaker auto-loading of rRNA subunit consensuses.
|
jpayne@68
|
803 Moved non-BBMap alignment classes to aligner package, and IceCream-related classes to icecream package; redirected/recompiled related C code.
|
jpayne@68
|
804 Fixed IceCreamFinder modification of synthetic headers with whitespace-delimited extra content.
|
jpayne@68
|
805 Added SplitRibo support for mitochondrial and plastid rRNA output.
|
jpayne@68
|
806 Updated Silva script to remove mito and chloroplast sequences by name (chloroplast 16S cannot be distinguished from cyanobacteria).
|
jpayne@68
|
807 Added iupacton flag to Silva formatting script.
|
jpayne@68
|
808 Modified AddSSU to allow SSU deletions and more precise control over choices between new and prexisting sequences.
|
jpayne@68
|
809 Wrote FilterSilva and filtersilva.sh to eliminate some troublesome sequences with ambiguous names that were getting misclassified.
|
jpayne@68
|
810 SketchMaker now allows 16S calling to be skipped for Eukaryotes or when not desired.
|
jpayne@68
|
811 Changed MergeRibo formula to value sequences closer to the consensus length rather than the longest.
|
jpayne@68
|
812 SketchHeap now also favors SSUs closer to the consensus length.
|
jpayne@68
|
813 Possibly accelerated mergeRibo by sorting lists by length, descending.
|
jpayne@68
|
814 Requiressu flag now properly supports both 16S and 18S.
|
jpayne@68
|
815 38.77
|
jpayne@68
|
816 AddSSU now accepts legacy files with #SSU instead of #16S/#18S.
|
jpayne@68
|
817 Updated variantPipeline.sh.
|
jpayne@68
|
818 Wrote BaseGraph.score for grading alignments using a model.
|
jpayne@68
|
819 Enabled BaseGraph serialization via ConsensusMaker outmodel flag.
|
jpayne@68
|
820 Fixed a bug in Aligner classes that replaced I with C at match position 0 or 1.
|
jpayne@68
|
821 ConsensusMaker now accepts unaligned fasta/fastq input if there is only a single reference sequence.
|
jpayne@68
|
822 38.78
|
jpayne@68
|
823 Made a floating point aligner version allowing positional weights.
|
jpayne@68
|
824 Fixed json null pointer exception on addLiteral.
|
jpayne@68
|
825 Added json output for CallGenes.
|
jpayne@68
|
826 stderr can now be specified multiple times for file output.
|
jpayne@68
|
827 Added length histogram support to CallGenes.
|
jpayne@68
|
828 Removed unnecessary assertions from BaseGraph.score, which assumed a consensus was being used as the reference (therefore, indels would be under 50% probability).
|
jpayne@68
|
829 Fixed a bug with underscores and no prefix in rename.sh.
|
jpayne@68
|
830 Loading consensus sequence will now try fq first then fa.
|
jpayne@68
|
831 Added and removed indel weights from the float aligner, as they didn't seem to help.
|
jpayne@68
|
832 Split ITS by clade and made consensus for fungi, plant, animal, and other.
|
jpayne@68
|
833 Implemented restrictleft/restrictright in Seal.
|
jpayne@68
|
834 Added BBMap readgroup autonaming based on input filename.
|
jpayne@68
|
835 Added BBWrap fofn support.
|
jpayne@68
|
836 Modified Dedupe to allow concatenation of headers of absorbed sequences.
|
jpayne@68
|
837 Made some changes to consensus model scoring to improve separation.
|
jpayne@68
|
838 38.79
|
jpayne@68
|
839 Fixed a weird compiler static issue.
|
jpayne@68
|
840 38.80
|
jpayne@68
|
841 Extensive changes; unfortunately, log is currently incomplete as a result of COVID-related workplace changes. This will be updated.
|
jpayne@68
|
842 Improvements to cardinality estimation (LogLog-related structures and programs).
|
jpayne@68
|
843 Fixes and additions to variant-calling-related programs for the purpose of COVID genotyping.
|
jpayne@68
|
844 Fixed trimclip not working (BBDuk).
|
jpayne@68
|
845 38.81
|
jpayne@68
|
846 Add pileup tip ignore and qtrimming to match coverage from CallVariants.
|
jpayne@68
|
847 Wrote summarizecoverage.sh to summarize coverage basecov files for multiple samples.
|
jpayne@68
|
848 Fixed a testExecute bug causing bzip2 to hang.
|
jpayne@68
|
849 38.82
|
jpayne@68
|
850 CallVariants and Pileup no apply border to reads mapped to the ends of a chromosome.
|
jpayne@68
|
851 Filtersam no longer discards sam headers.
|
jpayne@68
|
852 Wrote A_SampleBasic to act as a template for programs that do not do stream processing.
|
jpayne@68
|
853 Tweaked some variant-calling scripts.
|
jpayne@68
|
854 38.83
|
jpayne@68
|
855 Fixed fastq interleaving detection for a rare failure with PacBio reads.
|
jpayne@68
|
856 Added entropy histogram support to BBDuk.
|
jpayne@68
|
857 Extended EntropyTracker to always track monomer counts in a window.
|
jpayne@68
|
858 Added EntropyTracker longestLowEntropyBlock function.
|
jpayne@68
|
859 Added entropy/monomer fraction filter to IceCreamFinder.
|
jpayne@68
|
860 38.84
|
jpayne@68
|
861 Removed some IceCreamFinder debug code causing a crash.
|
jpayne@68
|
862 38.85
|
jpayne@68
|
863 Fixed fastq interleaving detection for a rare failure with PacBio reads.
|
jpayne@68
|
864 Added entropy histogram support to BBDuk.
|
jpayne@68
|
865 Pileup modifications and a bugfix.
|
jpayne@68
|
866 Wrote ReformatPacBio, reformatpb.sh, and various support classes (ZMW, ZMWStreamer, PBHeader).
|
jpayne@68
|
867 Modified BaseGraph to do alignment.
|
jpayne@68
|
868 Modified BaseGraph to do piecewise alignment and handle overlaps correctly.
|
jpayne@68
|
869 Added CCS support to ReformatPacBio, but it currently only works well on synthetic data.
|
jpayne@68
|
870 Added artic3 primers to resources.
|
jpayne@68
|
871 Wrote FlatAligner2.
|
jpayne@68
|
872 Added hmm package and classes for parsing hmmsearch results.
|
jpayne@68
|
873 Added jasper package for intern.
|
jpayne@68
|
874 Added KmerPosition, KmerPosition3, and kmerposition.sh for positional kmer counts. These were written by Jasper Toscani Field.
|
jpayne@68
|
875 Added FlatAligner2 with flatter weights than FlatAligner.
|
jpayne@68
|
876 Added ApplyVariants support for renaming, excluding certain indels, and better handling of variations in low-coverage regions.
|
jpayne@68
|
877 Updated Covid scripts.
|
jpayne@68
|
878 Added A_SampleSummary template.
|
jpayne@68
|
879 Added BBDuk entropy histogram.
|
jpayne@68
|
880 Fixed IceCreamMaker reference loading.
|
jpayne@68
|
881 Fixed Tadpole1 ownership reinitialization bug.
|
jpayne@68
|
882 Added total sub/var count to FilterSam.
|
jpayne@68
|
883 Added CallVariants/VCF support for NearbyVarCount and Flag fields.
|
jpayne@68
|
884 Added seed to RandomGenome.
|
jpayne@68
|
885 Added TrimRead handling of aligned reads without attached SamLines.
|
jpayne@68
|
886 38.86
|
jpayne@68
|
887 Bump due to git glitch.
|
jpayne@68
|
888 38.87
|
jpayne@68
|
889 SamStreamer now correctly sets the header for sam files with no reads, fixing a hang.
|
jpayne@68
|
890 LoadSharedHeader wait delay reduced from 1000 to 100ms.
|
jpayne@68
|
891 Removed leftover BBDuk entropy-trimming print statement.
|
jpayne@68
|
892 38.88
|
jpayne@68
|
893 Refactored many instances of numeric array initialization to use KillSwitch.
|
jpayne@68
|
894 Added donefile to RQCFilter2.
|
jpayne@68
|
895 Wrote initial Walker class for kmer set iteration.
|
jpayne@68
|
896 Added in-place condense to IntList/LongList/etc.
|
jpayne@68
|
897 Wrote LongListSet for extending LongList beyond 2b elements.
|
jpayne@68
|
898 Wrote LongListSetIterator.
|
jpayne@68
|
899 Fixed BBDuk speed flag; was limited to range 0-7.
|
jpayne@68
|
900 Changed BBDuk rcomp calls to accomodate amino mode.
|
jpayne@68
|
901 Added SuperLongList functionality and checks.
|
jpayne@68
|
902 Integrated SuperLongList into ReadStats for length histograms; read length histograms no longer have an upper length limit.
|
jpayne@68
|
903 ApplyVariants should correctly handle truncated scaffold names now.
|
jpayne@68
|
904 RQCFilter/ReadStats static variable hangover fixed.
|
jpayne@68
|
905 Wrote KExpand and kmutate.sh for creating sets of mutant kmers.
|
jpayne@68
|
906 Split Parse class off from Tools.
|
jpayne@68
|
907 38.89
|
jpayne@68
|
908 Added support for degenerate amino acid symbols B, J, Z.
|
jpayne@68
|
909 38.90
|
jpayne@68
|
910 Added workaround for BubblePopper 640 assertion failure.
|
jpayne@68
|
911 RQCFilter deterministic mode added for mapping phases.
|
jpayne@68
|
912 Wrapped instances of byte array instantiation in allocByte1D.
|
jpayne@68
|
913 38.91
|
jpayne@68
|
914 Fixed SendSketch json array format for very large input, on client side.
|
jpayne@68
|
915 Added maxload flag to BBCMS.
|
jpayne@68
|
916 38.97
|
jpayne@68
|
917 Added trimtips to BBDuk, mainly for trimming adapters on both ends of PacBio reads.
|
jpayne@68
|
918 Changed processing of reads longer than 200bp to force ASCII-33 quality.
|
jpayne@68
|
919 Enable automatic entryfilter in Clumpify to handle libraries with mostly identical reads.
|
jpayne@68
|
920 38.98
|
jpayne@68
|
921 Added bloom filter option lockedincrement, which substantially increases accuracy of overloaded counting Bloom filters, with a ~15% speed reduction. Disabled by default, except for BBCMS.
|
jpayne@68
|
922 Fixed a possible race condition in RQCFilter file writing.
|
jpayne@68
|
923
|
jpayne@68
|
924
|
jpayne@68
|
925 todo: outshort for fungalrelease.
|
jpayne@68
|
926 TODO: Add sff support.
|
jpayne@68
|
927 TODO: Add TPM - fpkm normalized so sum is 1M. Also add RLTD.
|
jpayne@68
|
928
|
jpayne@68
|
929 TODO: Variant whitelist for callvariants
|
jpayne@68
|
930 TODO: filtervcf multiple position interval support.
|
jpayne@68
|
931 TODO: refnames flag in seal should clearly indicate it outputs on file per ref file. Also kmers should be unified per reference.
|
jpayne@68
|
932 TODO: Understand how BubblePopper 640 assertion can occur.
|
jpayne@68
|
933 TODO: ApplyVariants fails silently if the header mismatches in certain ways - e.g. "NC123" versus "NC123 virus".
|
jpayne@68
|
934 TODO: make universal flag document including e.g. cq (changequality).
|
jpayne@68
|
935 TODO: bzip2 fails on JGI cloud, though lbzip2 works. Test bzip2 on Cori.
|
jpayne@68
|
936 TODO: Since code knows where docs are it should point to it.
|
jpayne@68
|
937 TODO: Complete list of common flags in a file, point every shell script to this.
|
jpayne@68
|
938
|
jpayne@68
|
939 TODO: Allow GT field in VCF to display 1 for low-frequency vars passing filters, or optionally 0/1.
|
jpayne@68
|
940 TODO: reformat strip or set sam header
|
jpayne@68
|
941 TODO: Reformat file list input. And testformat2. Also support paired files with # flag in the list input.
|
jpayne@68
|
942 TODO: clarify k>31 in BBDuk documentation.
|
jpayne@68
|
943 TODO: Add bbcms description to website.
|
jpayne@68
|
944 TODO: Make fusesequence bufferless (for fasta anyway) and independent of bbtoolST.
|
jpayne@68
|
945 TODO: fusesequence does not support amino acids.
|
jpayne@68
|
946 TODO: testformat2 json output
|
jpayne@68
|
947 TODO: protein alignment in sharp fold space (sharpest fold every ~20 AAs)
|
jpayne@68
|
948
|
jpayne@68
|
949 TODO: Add option to tax server:
|
jpayne@68
|
950 if(name.startsWith(parentName+" "))
|
jpayne@68
|
951 { name=name.substring(parentName.length()+1); }
|
jpayne@68
|
952
|
jpayne@68
|
953 TODO: Validate covid scripts, especially recal, since changes may not have been saved.
|
jpayne@68
|
954 **TODO: testformat2 Bryce changes, and add unique q-scores count and set notation
|
jpayne@68
|
955
|
jpayne@68
|
956
|
jpayne@68
|
957 TODO: Allow banning indels near poly-N in ApplyVariants.
|
jpayne@68
|
958 TODO: NVC, failnearby, and flagnearby seem to work for CallVariants but not CallVariants2.
|
jpayne@68
|
959 TODO: Add all reformat filters to FilterSam.
|
jpayne@68
|
960 TODO: Sketch redlist/whitelist, applied upon kmer eviction.
|
jpayne@68
|
961 TODO: BBDuk anomalies when left-trimming with mink.
|
jpayne@68
|
962 ***TODO: BBMerge should trim terminal poly-G or any homopolymer from adapter sequence.
|
jpayne@68
|
963
|
jpayne@68
|
964 TODO: CallVariants2 is giving one extra var call and 2 failing vars that pass every sample... for the command:
|
jpayne@68
|
965 /global/projectb/sandbox/gaag/bbtools/callvarstest/synth2> callvariants2.sh in=deduped_trimclip.sam.gz,mapped2.sam.gz ref=ref.fa out=vars_fail_multi.vcf -Xmx1g ow strandedcov flagnearby
|
jpayne@68
|
966
|
jpayne@68
|
967 TODO: Fix Pileup to deal with unpaired reads, or filter out reads with mate removed due to junk filter.
|
jpayne@68
|
968 TODO: Fix DedupeByPosition to deal with unpaired reads that say they are paired.
|
jpayne@68
|
969
|
jpayne@68
|
970 TODO: Add minor allele support.
|
jpayne@68
|
971 TODO: ApplyVariants should optionally examine AF to give IUPAC codes for mixed alleles.
|
jpayne@68
|
972
|
jpayne@68
|
973 TODO: Dedupebymapping has trouble with reads marked paired whose mate is missing.
|
jpayne@68
|
974 TODO: Examine interplay of minedist and chromosome ends.
|
jpayne@68
|
975 TODO: Add CallNs to CallVariants. There is a flag for it but the purpose is currently to call N alleles from Ns in reads.
|
jpayne@68
|
976 TODO: Add CallVariants coverage output.
|
jpayne@68
|
977 TODO: Add trimclip to CallVariants implicitly.
|
jpayne@68
|
978 TODO: Fix rqcfilterdata tax data.
|
jpayne@68
|
979 TODO: Fix size-zero SendSketch ribosomal bug.
|
jpayne@68
|
980 TODO: Add amplicon flag to CallVariants.
|
jpayne@68
|
981
|
jpayne@68
|
982 ***TODO: SSAs appear to never perform clipping, and the score function is slightly different than traceback.
|
jpayne@68
|
983 TODO: Add CallVariants option to only produce the top allele at multiallelic sites.
|
jpayne@68
|
984
|
jpayne@68
|
985
|
jpayne@68
|
986 TODO: sketch.sh does not support multiple input files in onesketch or pertaxa modes.
|
jpayne@68
|
987 TODO: comparesketch.sh does not support refid flag, which is pretty useful...
|
jpayne@68
|
988
|
jpayne@68
|
989 TODO: Update sketch-creation and download scripts...
|
jpayne@68
|
990
|
jpayne@68
|
991 TODO: Different sizes of output files for 1rpl and 9999.
|
jpayne@68
|
992 TODO: CompareSketch outputs results in an arbitrary order, for per-file mode.
|
jpayne@68
|
993
|
jpayne@68
|
994 TODO: Rename rdp data by taxa, and figure out how to pull out just 16S, etc.
|
jpayne@68
|
995
|
jpayne@68
|
996 TODO: Consider adding gene caller support for ITS.
|
jpayne@68
|
997 TODO: Rename Sketch SSU column as Ribo and use ITS/16S depending on clade (or best for organisms that seem to have both... maybe 16S and ITS Sketch fields, but only one column.)
|
jpayne@68
|
998
|
jpayne@68
|
999 TODO: Document AnalyzeGenes new flags.
|
jpayne@68
|
1000 TODO: Filter and map reads to universal 16S in realtime to make a consensus.
|
jpayne@68
|
1001
|
jpayne@68
|
1002 TODO: Why does SketchMaker run at ~10 cores when t=40? Check typical CPU utilization.
|
jpayne@68
|
1003 TODO: Tool to isolate all 16S sequences from RefSeq based on gene-calling. Then this can be merged with RefSeq, etc and used as a resource when sketching RefSeq.
|
jpayne@68
|
1004 TODO: Make KeepBestCopy multithreaded and align to 16S sequences.
|
jpayne@68
|
1005 TODO: Option to rename CallGenes output by taxID (or add to description field).
|
jpayne@68
|
1006
|
jpayne@68
|
1007 Try minimizing the number of active long sequences to reduce RefSeq memory consumption. Probably a problem with euks. Or, maybe unsorted would be better.
|
jpayne@68
|
1008
|
jpayne@68
|
1009 Seems to run out of memory when it gets to mitos:
|
jpayne@68
|
1010
|
jpayne@68
|
1011 #SZ:420 CD:ADC K:32,24 H:2 GS:16646 GK:16615 GE:14747 GQ:1 BC:4780,4432,3022,4412 ID:86930 NM:Richardsonius balteatus NM0:tid|86930|NC_033945.1 Richardsonius balteatus mitochondrial DNA, complete genome
|
jpayne@68
|
1012
|
jpayne@68
|
1013 TODO: 5000s for prefilter phase - bgzip is at 150% but java is only 100%.
|
jpayne@68
|
1014
|
jpayne@68
|
1015 78.64 vs 78.63 (it varies) for command:
|
jpayne@68
|
1016 /global/projectb/sandbox/gaag/bbtools/prok/auto> time nice msa.sh ref=16S_consensus_sequence.fa in=16S_flipped.fa t=32 -Xmx8g
|
jpayne@68
|
1017
|
jpayne@68
|
1018 Document refid for sendsketch and add it to comparesketch.
|
jpayne@68
|
1019
|
jpayne@68
|
1020 Fix NCBI annotations of 16S.
|
jpayne@68
|
1021
|
jpayne@68
|
1022 Fix NCBI orientation of 16S within the tool that pulls them from gffs, via alignment (make it multithreaded).
|
jpayne@68
|
1023
|
jpayne@68
|
1024 ***
|
jpayne@68
|
1025
|
jpayne@68
|
1026 TODO: Some loop-creating bubble poppings are banned.
|
jpayne@68
|
1027
|
jpayne@68
|
1028 *****TODO: Fix destMap pointers when nodes are merged and so forth.
|
jpayne@68
|
1029 TODO: All bubble-popping with midnodes shorter than 2k-1 via truncation.
|
jpayne@68
|
1030 TODO: Inflate collapsed repeats in BubblePopper by duplicating a single fbranch/fbranch midnode. This requires linkage information for the endnode pairs. Does not need to be fully resolved as long as any pair of endnodes can be uniquely associated. Should be done before popping bubbles.
|
jpayne@68
|
1031 TODO: Try a second pass of bubble popping.
|
jpayne@68
|
1032 TODO: Ultrafast aligner. Could be BBMap with alignment disabled, or could use a new data structure...
|
jpayne@68
|
1033 TODO: HashArrayHybridLong, associating a kmer with multiple longs. These can be used to encode scafnum in high bits and pos in low bits.
|
jpayne@68
|
1034 TODO: LilyPad should not require mapped reads; alternatively, make a map of contig end kmers (eg. 31-mers) then scan reads and pairs for those kmers to make edges.
|
jpayne@68
|
1035 TODO: Consensus should pad Ns on ref start and stop, then optionally truncate at the 50th percentile depth (for 16S).
|
jpayne@68
|
1036 TODO: Potentially make 3 kinds of edges in Tadpole: A) built edges, B) captured (in a read) edges, C) and captured (in a pair, unknown length) edges. A and B could perhaps be lumped together.
|
jpayne@68
|
1037
|
jpayne@68
|
1038 TODO: Lilypad could track overhanging reads to fill in gap Ns.
|
jpayne@68
|
1039 TODO: Tadpole simple tandem resolution.
|
jpayne@68
|
1040
|
jpayne@68
|
1041 TODO: ByteBuilder toText and append interface.
|
jpayne@68
|
1042
|
jpayne@68
|
1043 TODO: CallVariants should track strand ratio and ignore strandedness if it is highly biased.
|
jpayne@68
|
1044 TODO: On Amazon, Rob had to specify path= for BBMap as otherwise it tries to write to /ref/ which is not allowed.
|
jpayne@68
|
1045 TODO: Flag to Swap stats N/L50.
|
jpayne@68
|
1046 *TODO: Provide a column for ssu comparison results.
|
jpayne@68
|
1047 TODO: SendSketch on a 16S sequence makes a sketch that is too small.
|
jpayne@68
|
1048
|
jpayne@68
|
1049 TODO: Enable auto tax server access for all programs.
|
jpayne@68
|
1050 TODO: Enable server access for taxonomy.sh.
|
jpayne@68
|
1051 TODO: Change taxonomy.sh to use server-style header/name parsing.
|
jpayne@68
|
1052 TODO: Add more checks when parsing for invalid parameters, e.g. k>31 for bbcms.
|
jpayne@68
|
1053 TODO: Rename archaeal and bacterial per-genus downloads with taxID.
|
jpayne@68
|
1054 TODO: Write test harness for Sketch.
|
jpayne@68
|
1055
|
jpayne@68
|
1056
|
jpayne@68
|
1057 TODO: analyzegenes is singlethreaded with 1 input file.
|
jpayne@68
|
1058
|
jpayne@68
|
1059 TODO: Cori memory autodetection detects too little memory.
|
jpayne@68
|
1060 TODO: Default to less memory on Cori head nodes.
|
jpayne@68
|
1061 TODO: fixvcf for left or right justify of indels, etc.
|
jpayne@68
|
1062
|
jpayne@68
|
1063 TODO: Shred input?
|
jpayne@68
|
1064
|
jpayne@68
|
1065 TODO: Java gzip decompressor does not seem to work with multipart files streamed from stdin, for example, RefSeq.
|
jpayne@68
|
1066 TODO: Convert other multithreaded programs to the ThreadWaiter pattern.
|
jpayne@68
|
1067 TODO: Integrate tax server into everything as a replacement for downloading accessions (?).
|
jpayne@68
|
1068 TODO: Allow partial tax tree queries from the tax server. Or even complete tax tree downloading? That would be convenient...
|
jpayne@68
|
1069 TODO: Put taxonomy update pipeline scripts in version control.
|
jpayne@68
|
1070 TODO: Update taxonomy sub-scripts.
|
jpayne@68
|
1071
|
jpayne@68
|
1072 TODO: Per-organism frame ACTG composition; may allow 2-pass better gene-calling.
|
jpayne@68
|
1073 TODO: tRNA folding
|
jpayne@68
|
1074 TODO: Way to bulk download and organize bacteria.
|
jpayne@68
|
1075
|
jpayne@68
|
1076 TODO: Indel bit for alignments. If this bit gets set, there was an indel. Perfect for accelerating alignments using dual arrays with no traceback.
|
jpayne@68
|
1077
|
jpayne@68
|
1078 TODO: Enable silent (and json) flag for more shellscripts.
|
jpayne@68
|
1079 TODO: If JNI init fails, give a useful error message.
|
jpayne@68
|
1080 TODO: Compile JNI for Windows.
|
jpayne@68
|
1081 TODO: SamStreamer does not support ordered output.
|
jpayne@68
|
1082
|
jpayne@68
|
1083 TODO: Look for inverted repeat around suspected PacBio adapters, if there is only 1 adapter.
|
jpayne@68
|
1084 TODO: Generate histograms for IceCreamFinder - ratios, lengths, and subread count.
|
jpayne@68
|
1085 TODO: Re-test increasing entropy cutoff for euk sketches.
|
jpayne@68
|
1086 TODO: Download all bacterial assemblies and automatically select a representative set for gene-caller training.
|
jpayne@68
|
1087
|
jpayne@68
|
1088 TODO: SendSketch depth estimation is too low (e.g. 0.5x data yields 0.435x estimate). Compensate for kmer depth based on number of bases and number of sequences.
|
jpayne@68
|
1089 **TODO: SendSketch crashes and hangs with in=nonexistent file.
|
jpayne@68
|
1090 TODO: Clarify comparsketch single vs perfile flag.
|
jpayne@68
|
1091 TODO: Reduce impact of quality scores on BBMerge.
|
jpayne@68
|
1092 TODO: BBSketch proportional-size mode (linear-size).
|
jpayne@68
|
1093 TODO: Sketch callribo flag (instead of whitelist - only for proks? And only assemblies, not reads.)
|
jpayne@68
|
1094
|
jpayne@68
|
1095 TODO: Trim terminal adapter sequence on PacBio reads prior to looking for inverted repeats and central adapters. Adjust read names as needed.
|
jpayne@68
|
1096 TODO: Test improved RC aligner with reduced gap extension penalties?
|
jpayne@68
|
1097 TODO: Redo ambig classification of 2-pass based on proximity of junction to the other read (inner terminus).
|
jpayne@68
|
1098 TODO: Combine scores of adapter and inverted repeat. e.g., use a lower adapter threshold when inverted repeat is detected.
|
jpayne@68
|
1099 TODO: Debug method of storing start loc in low bits if adapter aligner which does not seem to work properly.
|
jpayne@68
|
1100 TODO: 2-array, JNI version of adapter aligner.
|
jpayne@68
|
1101
|
jpayne@68
|
1102 TODO: Consider effect of only filling last querylength cells in BBMap with DEL penalty.
|
jpayne@68
|
1103 TODO: Write a local aligner to refine the junction location of IceCream reads.
|
jpayne@68
|
1104
|
jpayne@68
|
1105 TODO: Vary queue size
|
jpayne@68
|
1106 TODO: Break alignment into a few columns
|
jpayne@68
|
1107 TODO: More precisely detect junction by recording highest value in addition to last value
|
jpayne@68
|
1108
|
jpayne@68
|
1109 TODO: Read-based quantification for BBSketch (basically, assign each read to a reference).
|
jpayne@68
|
1110 TODO: Make Depth a better indicator of abundance.
|
jpayne@68
|
1111 TODO: Make big fastas faster to sketch.
|
jpayne@68
|
1112 TODO: /global/projectb/sandbox/rqc/syao/anaconda2/envs/aligners/bin/bgzip
|
jpayne@68
|
1113 TODO: Include taxlevel as a key in D3 mode
|
jpayne@68
|
1114 TODO: Custom fast banded aligner for PacBio triangle read detection.
|
jpayne@68
|
1115 TODO: Examine align2.Block.allowSubprocess (for writing index). Also find similar flag for reading index.
|
jpayne@68
|
1116 TODO: Sketch json format - option to print stuff in arrays.
|
jpayne@68
|
1117 TODO: Base limit (as opposed to read limit) would be nice for sketching PacBio data.
|
jpayne@68
|
1118 TODO: BBTools Mark Duplicates.
|
jpayne@68
|
1119 ***TODO: Put bgzip support in all tool shellscripts.
|
jpayne@68
|
1120 **TODO: Link to latest nt and restart server.
|
jpayne@68
|
1121 *TODO: Add option for full taxonomy in sketch output (JSON).
|
jpayne@68
|
1122 *TODO: Added order option for TaxServer (JSON).
|
jpayne@68
|
1123 TODO: Benchmark DemuxByName with various buffering settings on a shuffled file.
|
jpayne@68
|
1124 TODO: test bgz/gz size on clumped/non-clumped files
|
jpayne@68
|
1125 TODO: Flag for generating bgzf indexes
|
jpayne@68
|
1126 TODO: bgz compress/decompress speed/mem test at various levels with various numbers of threads
|
jpayne@68
|
1127 TODO: Command-line zipthreads setting
|
jpayne@68
|
1128 TODO: Ensure bgz is always preferred for temp files; add a FileFormat temp flag.
|
jpayne@68
|
1129 TODO: Add FileFormat support to ReadWrite.
|
jpayne@68
|
1130 TODO: bgzip BBMap index; allow compression level specification for BBMap index.
|
jpayne@68
|
1131 TODO: Allow streaming refseq to concatenate lines under Xbp long
|
jpayne@68
|
1132 TODO: Sortbyname barcode mode/comparator (can add it to obj field)
|
jpayne@68
|
1133
|
jpayne@68
|
1134
|
jpayne@68
|
1135
|
jpayne@68
|
1136
|
jpayne@68
|
1137 TODO: Clumpify.main is not called by RQCFilter2 so statics are not caught and restored. Easy solution - have a second main function that returns the constructed object?
|
jpayne@68
|
1138
|
jpayne@68
|
1139 TODO: Consider adding tax info to demuxer.
|
jpayne@68
|
1140 TODO: Test bgzip decompression speed.
|
jpayne@68
|
1141 TODO: DemuxByName2 should also split sam files by contig.
|
jpayne@68
|
1142 **TODO: Flag to skip synth contam removal in RQCFilter.
|
jpayne@68
|
1143 TODO: Test to ensure refactoring did not break anything.
|
jpayne@68
|
1144
|
jpayne@68
|
1145 TODO: DemuxByName2 is way faster without output; not clear why.
|
jpayne@68
|
1146
|
jpayne@68
|
1147 TODO: Consider splitting extra fields of VCFLines.
|
jpayne@68
|
1148 TODO: BBMap kfilter float.
|
jpayne@68
|
1149 TODO: Consider making unzip more efficient.
|
jpayne@68
|
1150
|
jpayne@68
|
1151
|
jpayne@68
|
1152 TODO: Modify testfilesystem to test ls on a random directory.
|
jpayne@68
|
1153 TODO: gi2taxid should accept wildcards (shrunk.*.accession2taxid.gz).
|
jpayne@68
|
1154 **TODO: CallVariants3 - multisample variant-caller without writing any intermediate VCF files.
|
jpayne@68
|
1155 *TODO: Write CallVariants usage example that merges independent samples across nodes.
|
jpayne@68
|
1156 TODO: CallVariants2 without merging VCFs.
|
jpayne@68
|
1157 TODO: Update shells with new flags.
|
jpayne@68
|
1158 *TODO: Add meta path flag to TestFilesystem.
|
jpayne@68
|
1159 *TODO: MergeVCF shell wrapper. Also improve it to use hashmaps rather than requiring identical input files.
|
jpayne@68
|
1160
|
jpayne@68
|
1161 TODO: Var - add an optimal 63-bit hashcode for Var, using max scaffold length and num scaffolds for bit widths.
|
jpayne@68
|
1162 TODO: FilterSam - add capability to run without VCF file, both via Bloom filter and calling variants internally.
|
jpayne@68
|
1163 TODO: FilterSam - consider soft-clipping and ignoring soft-clipped areas.
|
jpayne@68
|
1164 TODO: Error-correct indel-free sam reads with kmers (and ref provided)... as long as a read has only mismatches, it does not need realignment.
|
jpayne@68
|
1165 ***TODO: Sketch multi-JSON fix (Adam R).
|
jpayne@68
|
1166 TODO: BBDuk var-based filtering does not support indels or border like filtersam.
|
jpayne@68
|
1167 TODO: Consider scanning the index for only long kmers, at least optionally. Or only indexing long kmers.
|
jpayne@68
|
1168
|
jpayne@68
|
1169 TODO: CallVariants stranded and 32bit should default to auto and be enabled based on memory.
|
jpayne@68
|
1170 TODO: Investigate variant-calling after removing trash reads.
|
jpayne@68
|
1171 TODO: CompareVCF should be able to process and produce var files, or else, CompareVar should be written.
|
jpayne@68
|
1172 TODO: In strandBiasScore, correct for bias of all reads in addition to just mapped reads, when mcov is being tracked.
|
jpayne@68
|
1173 TODO: CallVariants first base is capital, later bases may be lower case, for deletion ref calls.
|
jpayne@68
|
1174 TODO: Bed output for pileup?
|
jpayne@68
|
1175 TODO: Tool to split multiallelic variants. Or, ignore multiallelic in comparevcf.
|
jpayne@68
|
1176 TODO: size= flag does not work sometimes: comparesketch.sh in=mruber.fa.gz ref=protein blacklist=null index=f translate size=10000
|
jpayne@68
|
1177 TODO: .fa files should cause a warning when processed in amino mode (rather than translate).
|
jpayne@68
|
1178
|
jpayne@68
|
1179 TODO: Mhist seems to only get 0.5 for an indel instead of 1.
|
jpayne@68
|
1180 TODO: Adapter-trimming grading.
|
jpayne@68
|
1181 TODO: BBDuk auto adapters.
|
jpayne@68
|
1182 TODO: Test CallVariants speed with ssmf and raw sam; possibly reduce concurrent files.
|
jpayne@68
|
1183 TODO: Replace Integer.parseInt with Tools.parseIntKMG.
|
jpayne@68
|
1184 TODO: BBMap ecco (for adapter trimming).
|
jpayne@68
|
1185 TODO: RandomReads default insert sizes sould probably not have adapter sequences.
|
jpayne@68
|
1186 TODO: RandomReads should add match string to reads.
|
jpayne@68
|
1187 TO DELETE?: assemble toText methods, ErrorCorrect, countFastqSplit,
|
jpayne@68
|
1188
|
jpayne@68
|
1189 TODO: Slow speed of singlethreaded sketching. Largely caused by entropy filtering.
|
jpayne@68
|
1190 TODO: Consider 1-bit encoding for CallGenes, with 10-mers recording only AT vs GC or AC vs GT.
|
jpayne@68
|
1191 TODO: Allow CallGenes to report gain over coding regions with a gff; or, write a new program to do that.
|
jpayne@68
|
1192 TODO: Fix CompareGFF to be robust with multifastas.
|
jpayne@68
|
1193 *TODO: SendSketch is not robust against size-0 sketches. On the server side, they just get reported as Error rather than 0 hits.
|
jpayne@68
|
1194 *TODO: 2 or 3 connection threads for SendSketch.
|
jpayne@68
|
1195 *TODO: SendSketch/CompareSketch only load sketches singlethreaded per file in persequence mode.
|
jpayne@68
|
1196
|
jpayne@68
|
1197 TODO: Convert ReadStats formatting fully to ByteBuilder.
|
jpayne@68
|
1198 TODO: Link Seal and BBSketch, and improve Seal per-file ID assignment or per-TaxID.
|
jpayne@68
|
1199 *TODO: Collision-free, reversible hashcodes can be implemented. The reversibility requires masking the selection bits in each mask. May require single-kmer mode, particularly if k>31.
|
jpayne@68
|
1200
|
jpayne@68
|
1201 NOTE: Entropy is disabled in Sketch amino acid mode; might be worth checking the entropy of common amino kmers.
|
jpayne@68
|
1202
|
jpayne@68
|
1203 TODO: There may be no value in indexing protein sketches, due to high conservation.
|
jpayne@68
|
1204 TODO: e-value - track range.
|
jpayne@68
|
1205 TODO: more efficient blacklists and test blacklist efficiency.
|
jpayne@68
|
1206 TODO: It is possible to count the average number of ref sketches sharing hit kmers. A lower average is more specific.
|
jpayne@68
|
1207
|
jpayne@68
|
1208 TODO: Make sure sizemult flag works with servers. (checked, and it does)
|
jpayne@68
|
1209 TODO: index=f did not work: comparesketch.sh in=c.fa.gz silva whitelist tree=auto index=f
|
jpayne@68
|
1210 TODO: KmerCountExact can no longer ouput sketches correctly for the default dual kmer lengths.
|
jpayne@68
|
1211
|
jpayne@68
|
1212 ***TODO: BBNorm does not like R#.fq notation in 2-pass mode.
|
jpayne@68
|
1213 TODO: Call adjacent tRNAs.
|
jpayne@68
|
1214 TODO: Add sixframes flag to Sketch for instances of frameshifts, e.g. in raw PacBio data.
|
jpayne@68
|
1215 ***TODO: Make sure commonAncestor works correctly for no-rank nodes (if it is important).
|
jpayne@68
|
1216 ***TODO: Tag Sketch hash codes with the lowest bit to indicate whether they are from long or short kmers. Then calculate genome fraction as in notebook.
|
jpayne@68
|
1217
|
jpayne@68
|
1218 ***TODO: Figure out better dual-kmer ANI estimate. For example, if 24-mers and 31-mers have similar KID, this implies that the differences are not randomly distributed and therefore the ANI is an overestimate.
|
jpayne@68
|
1219 *TODO: Sizemult does not work with sendsketch local flag.
|
jpayne@68
|
1220 *TODO: Default query sketch size of ribo sketch server seems to be too low.
|
jpayne@68
|
1221 ***TODO: Figure out how to filter mito and chloro from RefSeq...
|
jpayne@68
|
1222
|
jpayne@68
|
1223 TODO: tRNAs are often densely packed (30bp apart), but CallGenes will only call one of them.
|
jpayne@68
|
1224 TODO: Amino Acid cardinality, and a handful of other BBDuk trivia...
|
jpayne@68
|
1225 TODO: KmerfilterSet for amino?
|
jpayne@68
|
1226 TODO: Check defaults for amino mode size, ANI, entropy, etc calculations in Sketch.
|
jpayne@68
|
1227
|
jpayne@68
|
1228 TODO: HashArrayHybridFast could be ported to ukmer, but that would be a pain since HashArrayUHybrid is not currently used.
|
jpayne@68
|
1229 ****TODO: Initial kmer set does not appear to work correctly after the first pass... or something. Be sure to clear it and retain the whole thing. It should get copied.
|
jpayne@68
|
1230
|
jpayne@68
|
1231 TODO: Set operations of GFF files.
|
jpayne@68
|
1232 TODO: Cut sequences from gff file (check if already exists).
|
jpayne@68
|
1233 TODO: MakeKmerSet with a blacklist set (do kmer tables support set subtraction?)
|
jpayne@68
|
1234 TODO: Rename MakeKmerSet.
|
jpayne@68
|
1235 TODO: Better documentation for rqcfilter.sh for remote users.
|
jpayne@68
|
1236 TODO: Modify removehuman and removecatdogmousehuman to use the rqcfilterdata flag.
|
jpayne@68
|
1237 TODO: Require multi-hit in BBDuk (remove current set from table before picking new kmers?)
|
jpayne@68
|
1238 TODO: Easy kmer set operations.
|
jpayne@68
|
1239 TODO: Make 5S and tRNA datasets.
|
jpayne@68
|
1240 TODO: Test long kmer sensitivity with various lengths; consider increasing to 16-mers.
|
jpayne@68
|
1241
|
jpayne@68
|
1242 *****TODO: Test RNA-calling long kmer support.
|
jpayne@68
|
1243
|
jpayne@68
|
1244 TODO: How to use rqcfilter.sh and removehuman.sh externally
|
jpayne@68
|
1245
|
jpayne@68
|
1246 TODO: rRNA score need to be higher, possibly doubled, to compete with CDS.
|
jpayne@68
|
1247 TODO: sketch.sh perfile, or comparesketch.sh ignore ata flag (just outsketch should be fine).
|
jpayne@68
|
1248
|
jpayne@68
|
1249 TODO: maxcount is now a supported flag for kmercountexact, but it does not actually work.
|
jpayne@68
|
1250
|
jpayne@68
|
1251 TODO: MDWalker breaks when there are reads with both N and D in cigar.
|
jpayne@68
|
1252
|
jpayne@68
|
1253 TODO: retain longest isoform and all high-scoring isoforms, but not low-scoring ones.
|
jpayne@68
|
1254 TODO: Try multiplicative model for start probs, not additive.
|
jpayne@68
|
1255 TODO: Score operons via window, and add operon scores to orf scores.
|
jpayne@68
|
1256 TODO: Once optimal path is chosen, refine it by adding and removing orfs.
|
jpayne@68
|
1257 TODO: Apply minscore filter before and after choosing optimal path.
|
jpayne@68
|
1258
|
jpayne@68
|
1259 TODO: Homopolymer density test. This may affect ease of sequencing and assembly.
|
jpayne@68
|
1260 TODO: Correlation between Tadpole and Spades stats.
|
jpayne@68
|
1261
|
jpayne@68
|
1262 TODO:
|
jpayne@68
|
1263 Vasanth:
|
jpayne@68
|
1264 /global/projectb/sandbox/rnaseq/projects/Golovinomyces_orontii_MGH1_Metatranscriptome_1196471/multimap
|
jpayne@68
|
1265 shifter --image=bryce911/bbtools bbmap.sh nodisk=t nzo=f ambig=all deterministic=t maxindel=100000 ref=/global/dna/projectdirs/RD/rnaseq_store/genomes/Golovinomyces_orontii_MGH1/Golor3_AssemblyScaffolds.fasta rpkm=CTWOX_counts.txt in=read1.fq.gz in2=read2.fq.gz machineout=t out=stdout.sam statsfile=CTWOX_counts.txt.summary | shifter --image=rmonti/samtools samtools view -Sb - | shifter --image=rmonti/samtools samtools sort - -o CTWOX_hits.bam
|
jpayne@68
|
1266
|
jpayne@68
|
1267 TODO: Entropyout for BBDuk (Alex Copeland email).
|
jpayne@68
|
1268
|
jpayne@68
|
1269 TODO: Percolate A_SampleMT changes to other A_Samples.
|
jpayne@68
|
1270 TODO: Note that FilterByTile can be run at a very small tile size with perhaps 50 reads each to detect bubbles at high resolution. This is more effective than trying to detect a high G rate in a large tile that extends out of the bubble.
|
jpayne@68
|
1271 TODO: Verify that gton is working in FilterByTile. It does seem to reduce homopolymers in some cases, but in other cases it increases the rate by selectively discarding non-G reads while not fixing the remaining reads... perhaps? The overall rate of homopolymers and Gs does not change much. It may also be prudent to calc stdev from cycles rather than tile averages; it's currently way too small to be useful.
|
jpayne@68
|
1272
|
jpayne@68
|
1273 TODO: Optical deduplication is pretty slow if there are a massive number of duplicates, though it seems to be linear with the size of the file. Not a priority.
|
jpayne@68
|
1274
|
jpayne@68
|
1275 TODO: minLevelExtended flag would be more useful if it could be used to identify hits where there is a higher identity to a different clade than to the same clade.
|
jpayne@68
|
1276 TODO: Add an option to trim depth-1 contig ends in Tadpole (probably just for dead ends).
|
jpayne@68
|
1277 TODO: Add fofn support to FileFormat.
|
jpayne@68
|
1278
|
jpayne@68
|
1279 *TODO: Seems like last element of a sketch has count lower than it should by 1 or more.
|
jpayne@68
|
1280 TODO: Look at Silva names of removed things. Figure out how to deal with them.
|
jpayne@68
|
1281 TODO: see how big a merged Silva sketch is.
|
jpayne@68
|
1282
|
jpayne@68
|
1283 TODO: Report (from Donovan) of a site not reported when a pair maps perfectly to two locations with different insert sizes.
|
jpayne@68
|
1284 TODO: Note: srd=2 seems to improve scores of metagenome assemblies despite removing fewer kmers.
|
jpayne@68
|
1285 TODO: Currently impossible to set ScheduleMaker memRatio except through prefilterFraction flag. There could be a default static memratio flag, in ScheduleMaker, for example.
|
jpayne@68
|
1286
|
jpayne@68
|
1287 TODO: Custom memory settings for Oracle verses Open JDK.
|
jpayne@68
|
1288 TODO: Consider not shaving on a case-by-case basis after looking at extensions (use the passed branchMult2 and minCountExtend fields).
|
jpayne@68
|
1289
|
jpayne@68
|
1290 TODO: BBMerge extend (rsem) mode with 2-bit bloom filter, or maybe even 1-bit.
|
jpayne@68
|
1291 TODO: bloom filter with k>31.
|
jpayne@68
|
1292
|
jpayne@68
|
1293 TODO: RQCFilter2 produce a file similar to status.log but column-delimited.
|
jpayne@68
|
1294 TODO: Modify RQCFilter2 to do entropy filtering in a discrete step.
|
jpayne@68
|
1295 TODO: Update RQCFilter2 reproduce.sh to reflect what actually happens.
|
jpayne@68
|
1296
|
jpayne@68
|
1297 TODO: Retain read names in bbfakereads.
|
jpayne@68
|
1298 *TODO: Sanity check for paired reads being under 1mbp.
|
jpayne@68
|
1299 TODO: Sketch kmer frequency histograms.
|
jpayne@68
|
1300 TODO: Test consect.
|
jpayne@68
|
1301
|
jpayne@68
|
1302 TODO: Refactor BBMerge?
|
jpayne@68
|
1303 TODO: Tadpole reassemble is creating new ByteBuilders instead of re-using them.
|
jpayne@68
|
1304
|
jpayne@68
|
1305 TODO: Write custom Fastq parser.
|
jpayne@68
|
1306 TODO: Test reading kmers, then locking, then writing, to prepopulate cache.
|
jpayne@68
|
1307 TODO: Consider allowing HashBuffers to deposit kmers per way, and only hash them if the deposit is too big. Thus, each way would have 2 swappable LongLists and you'd need to sync on the way to swap them.
|
jpayne@68
|
1308 TODO: Write a nonatomic Bloom filter.
|
jpayne@68
|
1309
|
jpayne@68
|
1310 V37.
|
jpayne@68
|
1311 37.01
|
jpayne@68
|
1312 Fixed crash with Seal qhdist.
|
jpayne@68
|
1313 37.02
|
jpayne@68
|
1314 Added ReadComparatorRandom and shuffling support to SortByName.
|
jpayne@68
|
1315 Compared trimming tile edges before removing duplicates.
|
jpayne@68
|
1316 Added support for taxonomy headers in Silva or comma-delimited format.
|
jpayne@68
|
1317 Added simple mode to PrintTaxonomy.
|
jpayne@68
|
1318 Fixed a bug with stdout stream name detection.
|
jpayne@68
|
1319 Added subsampling for CompareSketch/SendSketch.
|
jpayne@68
|
1320 Improved distribution of sketch file sizes.
|
jpayne@68
|
1321 Wrote MergeSam for concatenating sam files.
|
jpayne@68
|
1322 Bam streaming from the bamscript is now done uncompressed.
|
jpayne@68
|
1323 Added preliminary support for flex-size Sketches via LongList.
|
jpayne@68
|
1324 Fixed an assertion error in Clumpify with consensus.
|
jpayne@68
|
1325 Dedupe now uses pigz by default.
|
jpayne@68
|
1326 Added output ordering to TexStreamWriter and CompareSketch.
|
jpayne@68
|
1327 Changed the way names of uncultured organisms are parsed.
|
jpayne@68
|
1328 Fixed a regex bug when setting tmpdir.
|
jpayne@68
|
1329 Fixed a bug in RandomReads with a print statement.
|
jpayne@68
|
1330 Added trimnonoverlapping flag to BBMerge to produce consensus sequence only.
|
jpayne@68
|
1331 Clumpify should now automatically create extra groups when expected reads exceed 2 billion.
|
jpayne@68
|
1332 Sketch now supports blacklists.
|
jpayne@68
|
1333 37.03
|
jpayne@68
|
1334 Bump.
|
jpayne@68
|
1335 37.04
|
jpayne@68
|
1336 Wrote SamFilter.
|
jpayne@68
|
1337 Added sam positional filtering capabilities to SamStreamerWrapper and CallVariants.
|
jpayne@68
|
1338 SamStreamer now optionally retains sam headers.
|
jpayne@68
|
1339 Wrote Sketch guide.
|
jpayne@68
|
1340 Added VCFLine filtering to SamFilter.
|
jpayne@68
|
1341 Wrote FilterVCF and filtervcf.sh.
|
jpayne@68
|
1342 Added max filters to variants (maxscore, etc).
|
jpayne@68
|
1343 Added sam line mapq filters to SamFilter (SamStreamer and CallVariants).
|
jpayne@68
|
1344 Removed some shellscript module loads for specific versions of samtools.
|
jpayne@68
|
1345 Added quality-trimming to variant-calling.
|
jpayne@68
|
1346 Reference alleles now always use uppercase letters.
|
jpayne@68
|
1347 37.05
|
jpayne@68
|
1348 TaxServer now prints initialization time and memory.
|
jpayne@68
|
1349 Reduced Sketch memory usage for constuction in taxa mode with prefilter.
|
jpayne@68
|
1350 Sketch shellscripts now load pigz if not loaded.
|
jpayne@68
|
1351 ReadWrite.readObject now correctly uses the allowSubprocess flag; TaxTree now loads much faster.
|
jpayne@68
|
1352 Changed a couple array allocations in Sketch to use safe allocation.
|
jpayne@68
|
1353 Split SketchHeap genome size into genomeSizeKmers and genomeSizeBases, to be more clear.
|
jpayne@68
|
1354 Fixed some issues with TaxFilter; it was not working properly with default taxlevel.
|
jpayne@68
|
1355 Added taxa sorting to SortByName.
|
jpayne@68
|
1356 Fixed a bug with PrintTaxonomy accession=auto flag.
|
jpayne@68
|
1357 Taxa parsing now supports tid as well as ncbi in sequence headers.
|
jpayne@68
|
1358 RenameGiToNcbi now allows custom prefixes for the taxid number; default is tid.
|
jpayne@68
|
1359 RenameGiToNcbi now supports accessions.
|
jpayne@68
|
1360 Taxa sorting changed a bit. Promoting everything to direct descendants of the common ancestor did not work, so they are now promoted to the same level.
|
jpayne@68
|
1361 Changed BBDuk.RQC_MAP to use Long values instead of Strings; it is now additive.
|
jpayne@68
|
1362 Seal now uses the BBDuk RQC_MAP (for RQCFilter).
|
jpayne@68
|
1363 Added spikein removal and mtst to RQCFilter.
|
jpayne@68
|
1364 37.06
|
jpayne@68
|
1365 Wrote ServerTools to house some functions from TaxServer.
|
jpayne@68
|
1366 Shortened TaxServer functions by breaking off blocks into functions.
|
jpayne@68
|
1367 Added comments to TaxServer.
|
jpayne@68
|
1368 Added kill-old-instance flag to TaxServer.
|
jpayne@68
|
1369 Added ability to print all sequence headers of empty Fasta sequences.
|
jpayne@68
|
1370 37.07
|
jpayne@68
|
1371 Sketch blacklist now supports comma-delimited lists of files.
|
jpayne@68
|
1372 Refactored sketch code to unify location and parsing of shared fields such as k.
|
jpayne@68
|
1373 Added capacity() method to SketchHeap.
|
jpayne@68
|
1374 Implemented graduated sketch size via size=auto.
|
jpayne@68
|
1375 Added a lower cutoff for hashcode values, to reduce blacklist size and increase speed.
|
jpayne@68
|
1376 Codes are now checked against the heap prior to the blacklist, which is faster.
|
jpayne@68
|
1377 BBDuk now supports entropy masking like BBMask, but uses less memory.
|
jpayne@68
|
1378 CompareSketch now supports whitelisting.
|
jpayne@68
|
1379 SketchHeap can now automatically apply the blacklist and whitelist.
|
jpayne@68
|
1380 Fixed bloom filter crashing on unicode symbols in sequence.
|
jpayne@68
|
1381 37.08
|
jpayne@68
|
1382 Fixed a bug in FastaReadInputStream with long headers containing multiple greater-than symbols.
|
jpayne@68
|
1383 37.09
|
jpayne@68
|
1384 Added SketchObject keyFraction flag and changed default to 0.2.
|
jpayne@68
|
1385 Wrote SketchIndex to contain Sketch indexing methods.
|
jpayne@68
|
1386 Changed Blacklist ways to 1.
|
jpayne@68
|
1387 Changed Bloom filters to by default keep duplicate kmers within reads, and rewrote that method to be more efficient.
|
jpayne@68
|
1388 Bloom filter can now apply Sketch hash function to exclude kmers.
|
jpayne@68
|
1389 Made some IntList allocation methods safer.
|
jpayne@68
|
1390 Deprecated SortByTaxa (functionality moved to SortByName).
|
jpayne@68
|
1391 Wrote BlacklistMaker and sketchblacklist.sh
|
jpayne@68
|
1392 sketch.sh now sets -Xms.
|
jpayne@68
|
1393 PrintTaxonomy now ignores the cellular organisms node.
|
jpayne@68
|
1394 Added auto passes to BlacklistMaker.
|
jpayne@68
|
1395 BlacklistMaker no longer requires a gi table.
|
jpayne@68
|
1396 RenameGiToNcbi now handles lines that have already been renamed.
|
jpayne@68
|
1397 Wrote ShrinkAccession.
|
jpayne@68
|
1398 Accession loading will now default to trying a shrunk prefix, then revert to the normal filename.
|
jpayne@68
|
1399 Removed calcmem.sh perl dependency outside of Genepool.
|
jpayne@68
|
1400 Fixed a bug in sketch autosize.
|
jpayne@68
|
1401 Changed sketch comparison sort order to use WKID.
|
jpayne@68
|
1402 Sketch autosize now defaults to true.
|
jpayne@68
|
1403 Added default blacklists.
|
jpayne@68
|
1404 Wrote DisplayParams, which handles parsing of Sketch display parameters.
|
jpayne@68
|
1405 Curl calls can now pass parameters to a sketch server.
|
jpayne@68
|
1406 Tax server now returns an error message if no sketches are loaded.
|
jpayne@68
|
1407 37.10
|
jpayne@68
|
1408 Added utot to Reformat help.
|
jpayne@68
|
1409 Sketch taxlevel now supports strings like species in addition to numbers.
|
jpayne@68
|
1410 Disabled a print statement in CrisContainer.
|
jpayne@68
|
1411 Removed some redundant static fields from SketchObject and CompareSketch to avoid duplication with DisplayOptions.
|
jpayne@68
|
1412 Improved CompareSketch multithreading for small numbers of input sketches.
|
jpayne@68
|
1413 37.11
|
jpayne@68
|
1414 Changed the word reads to pairs in BBMap output header for pairing report.
|
jpayne@68
|
1415 Moved KillSwitch from stream to shared.
|
jpayne@68
|
1416 37.12
|
jpayne@68
|
1417 Added Tools.contains(String a, String b, int start)
|
jpayne@68
|
1418 Fixed an error in Clumpify allduplicates mode; the last in an odd-sized set of duplicates was not detected.
|
jpayne@68
|
1419 Added Clumpify renamebycount mode.
|
jpayne@68
|
1420 SendSketch no longer requires in= before filename.
|
jpayne@68
|
1421 ShrinkAccession now discards lines with no TaxID.
|
jpayne@68
|
1422 37.14
|
jpayne@68
|
1423 Added chloro and mito removal to rqcfilter.
|
jpayne@68
|
1424 Updated tax data and renamed Refseq Microbial records.
|
jpayne@68
|
1425 Single-sketch mode sketches are now named after filename rather than sequence name.
|
jpayne@68
|
1426 TaxServer now does gc() before killing the old server.
|
jpayne@68
|
1427 Sketch Comparison genome size is now estimated from the sketch.
|
jpayne@68
|
1428 37.15
|
jpayne@68
|
1429 Sketch size now accepts kmg symbols.
|
jpayne@68
|
1430 Added aliases for autosizefactor.
|
jpayne@68
|
1431 printname0 has alias pn0 and defaults to false.
|
jpayne@68
|
1432 Shared read buffer settings now use getters and setters.
|
jpayne@68
|
1433 SendSketch and CompareSketch use a larger default read buffer length.
|
jpayne@68
|
1434 Fixed an RQCFilter slowdown due to SendSketch changing read buffer length.
|
jpayne@68
|
1435 Set RQCFilter final ziplevel to 9 from 8.
|
jpayne@68
|
1436 Temporarily (?) set ScafMap.defaultScafMap from CallVariants, for ref-allele Var testing.
|
jpayne@68
|
1437 37.16
|
jpayne@68
|
1438 Fixed Seal clearing outu flag.
|
jpayne@68
|
1439 Fixed RQCFilter misreporting number of input reads.
|
jpayne@68
|
1440 Modified shellscripts to load samtools 1.4.
|
jpayne@68
|
1441 37.17
|
jpayne@68
|
1442 Added rqcfilter.sh to public distribution for Dockerization.
|
jpayne@68
|
1443 Fixed clipping/trimming bug in CallVariants leading to incorrect variant calls.
|
jpayne@68
|
1444 37.18
|
jpayne@68
|
1445 Wrote standalone realigner using Realigner class.
|
jpayne@68
|
1446 Wrote bbrealign.sh.
|
jpayne@68
|
1447 Fixed a bug in sam output when loading rname as a String.
|
jpayne@68
|
1448 Reads that would be fully quality-trimmed are no longer used for calling variations.
|
jpayne@68
|
1449 Fixed a realigner bug in which length-1 reads had no cigar string.
|
jpayne@68
|
1450 37.19
|
jpayne@68
|
1451 Added zygosity histogram output to CallVariants.
|
jpayne@68
|
1452 Wrote ProbShared for calculating the chance of two sequences sharing a kmer.
|
jpayne@68
|
1453 37.20
|
jpayne@68
|
1454 TaxServer URL parsing now correctly handles all reserved symbols encoded as percent codes, and many common non-reserved symbols.
|
jpayne@68
|
1455 Added addunderscore flag to RenameReads.
|
jpayne@68
|
1456 Added shrinknames flag to RenameGiToNcbi.
|
jpayne@68
|
1457 RenameGiToNcbi now tests input files before loading taxonomy data.
|
jpayne@68
|
1458 Changed sketch Comparison function to incorporate genomic kmer size.
|
jpayne@68
|
1459 Changed removesmartbell to split reads by default rather than masking adapters.
|
jpayne@68
|
1460 Sketch blacklist maker now correctly sets rcomp=f in amino mode.
|
jpayne@68
|
1461 37.21
|
jpayne@68
|
1462 Sketch blacklist maker yields oddly few keys in amino mode.
|
jpayne@68
|
1463 Modified KmerCount7MTA to correctly observe the rcomp flag, in most cases.
|
jpayne@68
|
1464 Modified KmerCount7MTA to support amino acids.
|
jpayne@68
|
1465 Clumpify Parser.parse moved to end of block.
|
jpayne@68
|
1466 Fixed a false warning for reads that were a multiple of fastareadlen.
|
jpayne@68
|
1467 Added amino8, a reduced representation coding of amino acids.
|
jpayne@68
|
1468 Improved sketch amino acid constants in postParse.
|
jpayne@68
|
1469 Added amino and kmer length tags to sketch headers. This will require server restarts.
|
jpayne@68
|
1470 Fixed a bug causing sketch.sh to write the file twice in single-sketch mode.
|
jpayne@68
|
1471 Increased small sketch sizing for amino acids.
|
jpayne@68
|
1472 Sketch size is now based on genome size estimate rather than genomic kmers.
|
jpayne@68
|
1473 Suppressed writing of length-0 sketches.
|
jpayne@68
|
1474 Fixed overflow when running over 2 billion comparisons.
|
jpayne@68
|
1475 Added bbversion.sh.
|
jpayne@68
|
1476 Sketch servers will now return an error message if incompatible settings were used for SendSketch.
|
jpayne@68
|
1477 Added BBMap deterministic mode.
|
jpayne@68
|
1478 Added SendSketch local mode.
|
jpayne@68
|
1479 SendSketch local mode no longer loads a blacklist.
|
jpayne@68
|
1480 Restarted servers with support for new flags.
|
jpayne@68
|
1481 37.22
|
jpayne@68
|
1482 DisplayParams now supports a reads flag.
|
jpayne@68
|
1483 samplerate and maxreads removed from SketchObject statics list.
|
jpayne@68
|
1484 Servers now report and continue in the case then there is an error while trying to kill the old server.
|
jpayne@68
|
1485 UseSizeEstimate flag now fully enables or disables size estimate for both scoring and sketch-sizing.
|
jpayne@68
|
1486 Slightly increased sketch size for large genomes over 200Mbp.
|
jpayne@68
|
1487 37.23
|
jpayne@68
|
1488 Fixed module load samtools line in bs.sh to reflect the current version of samtools.
|
jpayne@68
|
1489 Added minlen flag to SortByName (for use with nt/sketch).
|
jpayne@68
|
1490 Added cohort to TaxTree.
|
jpayne@68
|
1491 Improved organization of TaxTree extended level names and synonyms.
|
jpayne@68
|
1492 Wrote a script for fetching and sketching nt.
|
jpayne@68
|
1493 BBDuk now prints an error message when invalid settings of ktrim are used.
|
jpayne@68
|
1494 BBTools now crash rather than hange when quality-score autodetection fails.
|
jpayne@68
|
1495 Clumpify now additionally sorts by lane and tile, making optical duplicate detection much faster for huge clumps.
|
jpayne@68
|
1496 Added soft-clip trimming to BBDuk.
|
jpayne@68
|
1497 Added pipelines directory with scripts for fetching and processing NCBI files.
|
jpayne@68
|
1498 37.24
|
jpayne@68
|
1499 Added support for alapy compression.
|
jpayne@68
|
1500 Added getters and setters for private static errorState fields.
|
jpayne@68
|
1501 Sketch kmer length field made optional, only if k!=31.
|
jpayne@68
|
1502 Added optional sketch HASH_VERSION field.
|
jpayne@68
|
1503 Partially addressed overlapping paramter names in SketchObject/DisplayParams.
|
jpayne@68
|
1504 Added clump.KmerComparator2, X, and Y for axial sorting.
|
jpayne@68
|
1505 Added axial sort to clump.Clump.
|
jpayne@68
|
1506 Added Clumpify sortx and sorty flags to facilitate testing of axial sorting.
|
jpayne@68
|
1507 Added additional XY sorting flags.
|
jpayne@68
|
1508 Improved XY sorting and made it default to true for all optical duplicate removal.
|
jpayne@68
|
1509 Clumpify now additionally sorts by sequence by default, yielding a slight compression improvement.
|
jpayne@68
|
1510 Updated RQCFilter with new Clumpify flags (spany adjacent).
|
jpayne@68
|
1511 Split Clumpify spantiles into spanx and spany.
|
jpayne@68
|
1512 Added adjacent flag to Clumpify for removal of only optical duplicates on adjacent tiles (tile-edge duplicates).
|
jpayne@68
|
1513 SendSketch now allows KMG for number of reads.
|
jpayne@68
|
1514 CalcTrueQuality now supports the CallVariants prefilter flag.
|
jpayne@68
|
1515 Fixed a bug in BBDuk r2 entropy masking.
|
jpayne@68
|
1516 Added BBDuk poly-A trimming.
|
jpayne@68
|
1517 37.25
|
jpayne@68
|
1518 Fixed a bug in tax server processing headers with spaces.
|
jpayne@68
|
1519 Added IMG support to tax server.
|
jpayne@68
|
1520 Finished revised img sketch support on a per-IMG-id basis.
|
jpayne@68
|
1521 Adapted SketchBlacklist for IMG.
|
jpayne@68
|
1522 Tested memory consumption of nt and Silva servers and reduced -Xmx flag.
|
jpayne@68
|
1523 Added kapatags.L40.fasta and blacklist_img_species_300.sketch to resources.
|
jpayne@68
|
1524 Fixed a bug in which BBDuk was not applying the minlength cutoff when no trimming was performed.
|
jpayne@68
|
1525 Removed ecc.sh.
|
jpayne@68
|
1526 37.26
|
jpayne@68
|
1527 Fixed an inequality when checking read length in BBMap.
|
jpayne@68
|
1528 Fixed error reporting number of sketches loaded in all-to-all mode.
|
jpayne@68
|
1529 Replaced int[] with CompareBuffer object.
|
jpayne@68
|
1530 Added Sketch completeness calculation.
|
jpayne@68
|
1531 Added Sketch contamination calculation.
|
jpayne@68
|
1532 Revised Sketch KID calculation to compensate for variable sketch sizes.
|
jpayne@68
|
1533 37.27
|
jpayne@68
|
1534 Added ANI calculation and ANI flag.
|
jpayne@68
|
1535 Added new flags to SendSketch and doubleheaders.
|
jpayne@68
|
1536 Contamination can now be calculated without an index.
|
jpayne@68
|
1537 Changed the way contamination was calculated.
|
jpayne@68
|
1538 Fixed an indexing bug related to autosize mode.
|
jpayne@68
|
1539 37.28
|
jpayne@68
|
1540 Added printscore flag.
|
jpayne@68
|
1541 Wrote AtomicBitSet.
|
jpayne@68
|
1542 Sketch now uses AtomicBitSet for contam tracking, fixing a cache-coherency bug.
|
jpayne@68
|
1543 Wrote RawBitSet.
|
jpayne@68
|
1544 Sketch now uses one RawBitSet per thread to avoid atomic communication.
|
jpayne@68
|
1545 Rewrote Sketch threading; multithreading is now possible with an index.
|
jpayne@68
|
1546 Moved Parser to shared.
|
jpayne@68
|
1547 Moved a few data structures to structures.
|
jpayne@68
|
1548 BBDuk now allows user-specified poly-A length.
|
jpayne@68
|
1549 Added Parser parallelsort flag.
|
jpayne@68
|
1550 Reduced default sketch records to 20.
|
jpayne@68
|
1551 Updated BBSketch guide.
|
jpayne@68
|
1552 37.29
|
jpayne@68
|
1553 Fixed a bug in mutate.sh when giving indel and sub rates using the 0-1 scale.
|
jpayne@68
|
1554 Wrote IntHashSet from LongHashSet and added increment.
|
jpayne@68
|
1555 Wrote IntHashMap.
|
jpayne@68
|
1556 Integrated IntHashMap into SketchIndex and made it the default path (can be disabled with intmap=f flag).
|
jpayne@68
|
1557 Wrote IntHashMapBinary to avoid modulo operation when hashing.
|
jpayne@68
|
1558 Added processIMG.sh to pipelines directory.
|
jpayne@68
|
1559 Added some new assertions and messages to Clumpify's FetchThread loop to diagnose an odd crash.
|
jpayne@68
|
1560 TaxServer query count now ignores usage queries.
|
jpayne@68
|
1561 37.30
|
jpayne@68
|
1562 Increased the default number of Clumpify groups by 50% (with 2 fetch threads), and made it scale with the number of fetch threads.
|
jpayne@68
|
1563 Sketch comparison raw fields can now be printed.
|
jpayne@68
|
1564 Wrote SketchResults object, for managing comparison printing methods.
|
jpayne@68
|
1565 Fixed a bug in displaying Sketch hits to ref sketches with no TaxID.
|
jpayne@68
|
1566 Fixed a name0 display bug.
|
jpayne@68
|
1567 Added flowcell and sequence modes to SortByName.
|
jpayne@68
|
1568 37.31
|
jpayne@68
|
1569 Wrote SplitSam6Way.
|
jpayne@68
|
1570 Removed obsolete tryAllExtensions option from TextFile/ByteFile.
|
jpayne@68
|
1571 Added histbefore flag to BBDuk, and option to generate histograms after processing.
|
jpayne@68
|
1572 Added fname metadata to Sketch header.
|
jpayne@68
|
1573 Changed Sketch results query formatting to include more metadata.
|
jpayne@68
|
1574 37.32
|
jpayne@68
|
1575 Added parsing for comment.
|
jpayne@68
|
1576 Clumpify with groups>1 now works with paired fasta files, though interleaved fasta files need interleaved to be explicitly set.
|
jpayne@68
|
1577 Wrote MultiBitSet and improved AbstractBitSet.
|
jpayne@68
|
1578 Refactored comparison formating into DisplayParams.
|
jpayne@68
|
1579 TaxServer no longer dies when receiving an unexpected parameter.
|
jpayne@68
|
1580 TaxServer no longer terminates when failing to kill an old instance.
|
jpayne@68
|
1581 SendSketch now passes printRefDivisor and so forth.
|
jpayne@68
|
1582 Added Unique, uContam, and noHit Sketch results columns.
|
jpayne@68
|
1583 Added taxonomy-based Sketch results coloring.
|
jpayne@68
|
1584 Added TaxTree.extendedToLevel for reverse lookup.
|
jpayne@68
|
1585 37.33
|
jpayne@68
|
1586 Added counters for tracking TaxServer statistics.
|
jpayne@68
|
1587 Improved server help messages; added Sketch usage info.
|
jpayne@68
|
1588 Added Tadpole extra flag and clarified the documentation.
|
jpayne@68
|
1589 37.34
|
jpayne@68
|
1590 Added some scripts to Pipelines.
|
jpayne@68
|
1591 Wrote SummarizeSketchStats and summarizesketch.sh for making tables of multiple Sketch results files.
|
jpayne@68
|
1592 Added optional hard-coded path flags for CompareSketch to use silva, img, nt, and refseq sketches.
|
jpayne@68
|
1593 Verified that external queries are tracked properly, even though none have been recorded.
|
jpayne@68
|
1594 SendSketch and CompareSketch default to colors=f if outputting to a file rather than stdout.
|
jpayne@68
|
1595 Fixed issue of FungalRelease not writing an AGP file if contig output file was not specified.
|
jpayne@68
|
1596 Fixed a bug when specifying a SendSketch address with a terminal slash.
|
jpayne@68
|
1597 Updated DisplayParams to only send hk and hamino.
|
jpayne@68
|
1598 Improved SendSketch handling of automatic blacklists.
|
jpayne@68
|
1599 Added support for dual kmer lengths in Sketch.
|
jpayne@68
|
1600 gSize calculation now supports dual kmers.
|
jpayne@68
|
1601 ANI now supports dual kmers, but uses linear interpolation of an exponential function. Seems to work, though.
|
jpayne@68
|
1602 Fixed CompareSketch all-to-all including self for contamination detection.
|
jpayne@68
|
1603 Accelerated dual-kmer sketching by choosing a random hashcode rather than the larger one.
|
jpayne@68
|
1604 37.35
|
jpayne@68
|
1605 Dual kmers are now supported in TaxServer's error message for incompatible sketches.
|
jpayne@68
|
1606 Fixed a display bug in LoadReads.
|
jpayne@68
|
1607 Added quality-score binning detection to fastq file memory use estimation.
|
jpayne@68
|
1608 Added lowcomplexity flag to fastq file memory use estimation.
|
jpayne@68
|
1609 37.36
|
jpayne@68
|
1610 Clarified TaxServer error message for incompatible settings.
|
jpayne@68
|
1611 Added deleteinput flag for Reformat and Clumpify.
|
jpayne@68
|
1612 Updated BBSketchGuide.
|
jpayne@68
|
1613 37.37
|
jpayne@68
|
1614 Fixed a read orientation bug in CalcTrueQuality when using a VCF file.
|
jpayne@68
|
1615 Simplified some calls to short and long match string conversion.
|
jpayne@68
|
1616 Added a variant-calling script to /pipelines.
|
jpayne@68
|
1617 Fixed a null pointer exception in Sketch when using sam files.
|
jpayne@68
|
1618 Investigated recalibration of R2. Turns out the graph just looks odd because of low-quality unmapped reads.
|
jpayne@68
|
1619 BBDuk can now accept ref=phix or adapters or artifacts, and automatically locates the file in /resources.
|
jpayne@68
|
1620 Read identity calculation was crashing with fixvariants (from a VCF file).
|
jpayne@68
|
1621 Removed bbduk2.sh as deprecated; only BBDuk is maintained.
|
jpayne@68
|
1622 37.38
|
jpayne@68
|
1623 Adjusted Sketch hash function; cycleMask is now a constant.
|
jpayne@68
|
1624 Made Sketch hashing variables private.
|
jpayne@68
|
1625 Made Sketch hashCycles variable; speeds up shorter kmer lengths and makes k2 codes compatible with k codes of same length.
|
jpayne@68
|
1626 SortByName now uses compression level 2 for temp files.
|
jpayne@68
|
1627 RenameImg now also reports the number of files, sequences, bases, and TaxIDs used.
|
jpayne@68
|
1628 IMG naming is now in the old NCBI style, e.g. tid|1234|img|56789
|
jpayne@68
|
1629 IMG header parsing methods and lookup table moved from TaxServer and ImgRecord2 to TaxTree.
|
jpayne@68
|
1630 IMG header parsing is now automatic.
|
jpayne@68
|
1631 Updated some descriptions in commonMicrobes filter directory.
|
jpayne@68
|
1632 RQCFilter now by default queries nt, RefSeq, and Silva when Sketching.
|
jpayne@68
|
1633 Wrote TestFilesystem and testfilesystem.sh to monitor filesystem performance.
|
jpayne@68
|
1634 SendSketch now automatically sets k and k2 for nt, silva, and refseq.
|
jpayne@68
|
1635 Changed RefSeq and nt sketch servers and scripts to k=31,24 (needs restart).
|
jpayne@68
|
1636 Modified KmerCount7MTA increment routine slightly; it can now store hashed kmers.
|
jpayne@68
|
1637 gi2taxid now runs in silva mode without requiring a gi or accession file.
|
jpayne@68
|
1638 Altered BlacklistMaker to fix an issue of redundant hash codes.
|
jpayne@68
|
1639 Fixed DisplayParams clone method.
|
jpayne@68
|
1640 Fixed order of parsing imghq and setting the default img file.
|
jpayne@68
|
1641 Fixed a bug in taxa coloring using parent instead of current node.
|
jpayne@68
|
1642 Added dark purple to Colors.
|
jpayne@68
|
1643 Taxa coloring now underlines records with the same color but different taxa compared to above.
|
jpayne@68
|
1644 Updated SketchGuide to explain underlining.
|
jpayne@68
|
1645 Added a second genome repeat content estimation method.
|
jpayne@68
|
1646 Genome repeat content now considers one copy of a repeat to be non-repeat. For example, a genome with 1% duplicated would be considered 1% repeat instead of 2%.
|
jpayne@68
|
1647 Added pipelines/assemblyPipeline.sh.
|
jpayne@68
|
1648 Increased maximum samtools compression threads to 64.
|
jpayne@68
|
1649 Clarified descriptions of outm and BBDuk kmer-matching modes.
|
jpayne@68
|
1650 Revised Reformat trimrname handling to include all whitespace, and clarified description to include bam files.
|
jpayne@68
|
1651 Restarted RefSeq and nt servers with k=31,24.
|
jpayne@68
|
1652 CallVariants and FilterVCF can now enable/disable SNPs, insertions, deletions.
|
jpayne@68
|
1653 ReadStats histogram lengths can now be adjusted with the maxhistlen flag.
|
jpayne@68
|
1654 37.39
|
jpayne@68
|
1655 CompareSketch now allows first parameter to be a file name without in=.
|
jpayne@68
|
1656 Wrote LongHashMap and LongHeapMap.
|
jpayne@68
|
1657 Refactored SketchHeap to support LongHashMap when minkeycount>1.
|
jpayne@68
|
1658 SketchHeap can now be temporarily longer than the desired sketch length when minkeycount>1.
|
jpayne@68
|
1659 37.40
|
jpayne@68
|
1660 Added usage query tracking to TaxServer.
|
jpayne@68
|
1661 Added correct sketch blacklists to public distribution.
|
jpayne@68
|
1662 Fixed incorrect insert size with renamebyinsert flag in RandomReads when reads are longer than insert size.
|
jpayne@68
|
1663 RQCFilter now resets Sketch statics prior to subsequent SendSketch runs.
|
jpayne@68
|
1664 SketchObject minkKeyCount moved to DisplayParams.
|
jpayne@68
|
1665 SketchObject minCount field replaced.
|
jpayne@68
|
1666 DisplayParams.minCount renamed minHits.
|
jpayne@68
|
1667 BlacklistMaker.minCount renamed to MinTaxCount.
|
jpayne@68
|
1668 RQCFilter now uses minkeycount=2 for Silva.
|
jpayne@68
|
1669 Changed SketchObject.size to targetSketchSize.
|
jpayne@68
|
1670 TaxServer now makes a new SketchTool as needed when minKeyCount is different in local mode.
|
jpayne@68
|
1671 Made some improvements to assemblyPipeline.sh.
|
jpayne@68
|
1672 37.41
|
jpayne@68
|
1673 Fixed a tiny bug in parsing Sketch single kmer lengths of under 31.
|
jpayne@68
|
1674 37.42
|
jpayne@68
|
1675 Updated BBSketch guide.
|
jpayne@68
|
1676 37.43
|
jpayne@68
|
1677 Changed default IMG path to the k=31,24 version.
|
jpayne@68
|
1678 Renamed minID to minWKID.
|
jpayne@68
|
1679 MutateGenome can now output a smaller genome fraction of the original genome.
|
jpayne@68
|
1680 Fixed a missing newline in Sketch server help info.
|
jpayne@68
|
1681 BBSketch now supports non-multiples-of-4 for k2.
|
jpayne@68
|
1682 Revised assemblyPipeline.sh.
|
jpayne@68
|
1683 Added assembleMito to /pipelines.
|
jpayne@68
|
1684 Increased hashing speed by 4-8% by switching from 2D to 1D matrix.
|
jpayne@68
|
1685 Increased Sketch max kmer length to 32.
|
jpayne@68
|
1686 Enabled pn0 (printseqname) flag for query.
|
jpayne@68
|
1687 Fixed CompareSketch ignoring read limit when loading input files; this was caused by parse order.
|
jpayne@68
|
1688 37.44
|
jpayne@68
|
1689 Reworded code description of maq to indicate it happens after trimming.
|
jpayne@68
|
1690 Added mbq to BBDuk.
|
jpayne@68
|
1691 Added Sketch ANI bisection, enabled by exactani flag. But it made the results less accurate at low ANI than linear interpolation.
|
jpayne@68
|
1692 Fixed a bug in which old 2D matrix was sometimes used instead of 1D matrix.
|
jpayne@68
|
1693 Discovered current K=31,24 server sketches were generated with a bug; regenerating.
|
jpayne@68
|
1694 Updated alapy compression support; speed flags are now enabled.
|
jpayne@68
|
1695 Updated TaxonomyGuide.txt.
|
jpayne@68
|
1696 Added testPlatformQuality.sh.
|
jpayne@68
|
1697 Updated callInsertions.sh.
|
jpayne@68
|
1698 Updated assemblyPipeline.sh.
|
jpayne@68
|
1699 Made a MapPacBio assertion error more explicit, for debugging.
|
jpayne@68
|
1700 TaxServer no longer logs usage queries.
|
jpayne@68
|
1701 37.45
|
jpayne@68
|
1702 Clumpify spanx was controlling both spanx and spany due to a parse error; fixed.
|
jpayne@68
|
1703 Added full range of delimiters to demuxbyname and clarified shellscript help.
|
jpayne@68
|
1704 Added demuxbyname column mode (e.g. column=2 to demux by the 2nd column).
|
jpayne@68
|
1705 Demuxbyname default compression level changed to 1 to cope with slow compression speed.
|
jpayne@68
|
1706 Improved CompareSketch parsing of flags shared by Parser and DisplayParams.
|
jpayne@68
|
1707 Added 3-column Sketch results.
|
jpayne@68
|
1708 Restarted servers with new format support.
|
jpayne@68
|
1709 37.46
|
jpayne@68
|
1710 Fixed a null pointer exception in Sketch format 3.
|
jpayne@68
|
1711 37.47
|
jpayne@68
|
1712 Sketch now supports minANI flag.
|
jpayne@68
|
1713 Added Sketch spid field and allowed spid and imgID to be set from SketchMaker command line.
|
jpayne@68
|
1714 37.48
|
jpayne@68
|
1715 taxid, imgid, spid, name, name0, and fname can now all be set or overriden from the command line of Sketch, CompareSketch, and SendSketch.
|
jpayne@68
|
1716 Fixed a bug in assigning spid to query sketches in the TaxServer. Restarted servers again.
|
jpayne@68
|
1717 37.49
|
jpayne@68
|
1718 For clarity, taxname is now an alias of name for Sketch.
|
jpayne@68
|
1719 Changed flags like useimgname to useImgAsName following feedback.
|
jpayne@68
|
1720 Added invert flag to Reformat.
|
jpayne@68
|
1721 Rewrote ReadStats addToIndelHistogram to increase speed and fix bugs.
|
jpayne@68
|
1722 37.50
|
jpayne@68
|
1723 Documented autosizefactor in sketch shellscripts.
|
jpayne@68
|
1724 Updated BBSketchGuide.txt with information about sketch sizing.
|
jpayne@68
|
1725 Wrote fetchSilva.sh.
|
jpayne@68
|
1726 Improved commenting of many /pipelines/ scripts.
|
jpayne@68
|
1727 Modified RenameIMG to handle dual IMG files.
|
jpayne@68
|
1728 Fixed img name parsing when no taxID is present.
|
jpayne@68
|
1729 Fixed a failure to increment in TaxTree.parseDelimitedNumber.
|
jpayne@68
|
1730 Sketch Amino mode now autosets k2, and a message is suppressed.
|
jpayne@68
|
1731 Fixed a bug with Sketch amino mode parsing (it is parsed in 3 places).
|
jpayne@68
|
1732 Really deleted ecc.sh from public BBTools distribution.
|
jpayne@68
|
1733 Started MakeContaminatedGenomes.
|
jpayne@68
|
1734 37.51
|
jpayne@68
|
1735 Sketch ref= flag can now accept # wildcard.
|
jpayne@68
|
1736 BBDuk Poly-A trimming now occurs before entropy-masking.
|
jpayne@68
|
1737 Documented BBDuk internal order of operations in BBDukGuide.
|
jpayne@68
|
1738 Wrote MakeContaminatedGenomes and makecontaminatedgenomes.sh.
|
jpayne@68
|
1739 Removed a couple references to nonexistent changelogs in shellscripts.
|
jpayne@68
|
1740 Fixed a bug in ConcurrentReadInputStream.getReads (failure to call start).
|
jpayne@68
|
1741 ImgID sketch results header now is padded by spaces.
|
jpayne@68
|
1742 MDWalker can now handle cigar N symbol.
|
jpayne@68
|
1743 37.52
|
jpayne@68
|
1744 Fixed CompareSketch replacing original header filenames with sketch filenames.
|
jpayne@68
|
1745 Fixed a bug in FilterByTile by forcing IntList initial size to at least 1.
|
jpayne@68
|
1746 Added eoom (ExitOnOutOfMemoryException) shellscript support.
|
jpayne@68
|
1747 Added shellscript parsing for degenerate terms like xmx= and ea.
|
jpayne@68
|
1748 Added DisplayParams taxFilter field, and SketchResults autoremoval of nonpassing results.
|
jpayne@68
|
1749 Added TaxTree.parseLevel and called it in many parsing routines.
|
jpayne@68
|
1750 Made shellscript formating slightly more standardized.
|
jpayne@68
|
1751 Added some error checking to SendSketch; it now uses a nonzero exit code when the connection fails.
|
jpayne@68
|
1752 Updates shell scripts with references to guides.
|
jpayne@68
|
1753 Sketch now supports taxonomic filtering.
|
jpayne@68
|
1754 Delete an obsolete redundant guide.
|
jpayne@68
|
1755 37.53
|
jpayne@68
|
1756 Sketches now scale heap size with sizemult, and default heap size is doubled.
|
jpayne@68
|
1757 Fixed Refseq server; it was using a whitelist.
|
jpayne@68
|
1758 KillSwitch kill and print methods are now synchronized.
|
jpayne@68
|
1759 Moved parse location of Sketch db names to Searcher.
|
jpayne@68
|
1760 Searcher.refFiles is now a LinkedHashSet.
|
jpayne@68
|
1761 Added Tools.isDigitOrSign and toString(Throwable).
|
jpayne@68
|
1762 TaxServer now returns error messages from doubleheader parsing.
|
jpayne@68
|
1763 TaxFilter now always adds specified nodes regardless of tax level, and stops promoting as soon as the target level is reached.
|
jpayne@68
|
1764 Fixed some taxonomy server issues with tax filtering.
|
jpayne@68
|
1765 Added IntHashSetList for creating concise sets.
|
jpayne@68
|
1766 Added blacklists to Silva and RefSeq server invocations; they were missing.
|
jpayne@68
|
1767 All shellscripts now load oracle-jdk/1.8_144_64bit on Genepool.
|
jpayne@68
|
1768 Sketches now have a count array.
|
jpayne@68
|
1769 Sketch reading and writing now supports the count array.
|
jpayne@68
|
1770 Sketch spid parsing fixed.
|
jpayne@68
|
1771 No more spurious warnings about missing blacklists when they are not being used.
|
jpayne@68
|
1772 Accelerated Sketch writing by around 20% by debranching.
|
jpayne@68
|
1773 37.54
|
jpayne@68
|
1774 Changed TaxServer back to prior Java version because 8u144 is not installed on gpwebs.
|
jpayne@68
|
1775 Changed default startSilvaServer.sh to old style since the silva keyword is conflated.
|
jpayne@68
|
1776 Added Sketch minQual and minProb processing.
|
jpayne@68
|
1777 RefSeq is now the default sketch server, since bigger references are more accurate.
|
jpayne@68
|
1778 Added support for NCBI merged.dmp file in TaxTree (now mandatory). This necessitates a coordinated push since the format changed.
|
jpayne@68
|
1779 TaxServer no longer crashes when there are missing TaxNodes.
|
jpayne@68
|
1780 taxpath now works better with printtaxonomy.sh.
|
jpayne@68
|
1781 Sketch unique and nohit counts are now calculated correctly when printcontam is disabled.
|
jpayne@68
|
1782 BBDuk now correctly removes reads that fail maxlen even when no trimming is performed.
|
jpayne@68
|
1783 TaxServer now correctly tracks external query counts through a proxy (at NERSC).
|
jpayne@68
|
1784 37.55
|
jpayne@68
|
1785 TaxServer now reports average and most recent query time.
|
jpayne@68
|
1786 Making a Sketch from a Heap moved from SketchTool to SketchHeap.
|
jpayne@68
|
1787 Sketch construction now adds counts when available.
|
jpayne@68
|
1788 SketchMaker now parses display params.
|
jpayne@68
|
1789 Fixed an array out of bounds in LongHeapMap.
|
jpayne@68
|
1790 PrintDepth is now working!
|
jpayne@68
|
1791 Swapped minProb and minQual in SketchObject; parsing was bugged.
|
jpayne@68
|
1792 Added #-symbol support for dual fastq files in Sketch.
|
jpayne@68
|
1793 Added contains(key) to LongHeapSet and LongHeapMap.
|
jpayne@68
|
1794 SendSketch now loads fastq files multithreaded. This is up to 6x as fast though slightly less efficient.
|
jpayne@68
|
1795 Reformat now can upsample via samplereadstarget/samplebasestarget.
|
jpayne@68
|
1796 37.56
|
jpayne@68
|
1797 SendSketch now does read validation in-thread and achieves up to 9x the speed of the singlethreaded version and better efficiency.
|
jpayne@68
|
1798 CompareSketch had bufferlen cap removed when processing fastq.
|
jpayne@68
|
1799 SketchMaker has a new fast path for onesketch of a single fastq file, and default bufferlen changed from 1 to 2 to better deal with short sequences. For fastq, speed was quadrupled.
|
jpayne@68
|
1800 SketchMaker no longer prints an error message if there were no output sketches; instead, there is a warning.
|
jpayne@68
|
1801 Sketch now allows internal merging of paired reads.
|
jpayne@68
|
1802 RQCFilter defaults to merging reads and using minprob=0.75.
|
jpayne@68
|
1803 Added Sketch arbitrary metadata tags.
|
jpayne@68
|
1804 Added Sketch depth2 (repeat-compensated depth).
|
jpayne@68
|
1805 Regenerated nt and RefSeq reference sketches with coverage information and restarted the servers.
|
jpayne@68
|
1806 Added Sketch volume column.
|
jpayne@68
|
1807 Added IntHashSetList.toPackedArray.
|
jpayne@68
|
1808 SketchIndex now returns SketchResults with taxHits instead of a raw ArrayList.
|
jpayne@68
|
1809 contam2 now appears to work.
|
jpayne@68
|
1810 SketchMaker now obeys read limit.
|
jpayne@68
|
1811 Sketch results are now sortable by depth and volume.
|
jpayne@68
|
1812 RQCFilter now uses some additional sketch flags like volume.
|
jpayne@68
|
1813 37.57
|
jpayne@68
|
1814 Changed default printOptions content.
|
jpayne@68
|
1815 Wrote MakePolymers.
|
jpayne@68
|
1816 Added period flag to MutateGenome.
|
jpayne@68
|
1817 Tested: Homopolymer blacklisting up to k=9 does not obviously improve sketch depth accuracy.
|
jpayne@68
|
1818 calcmem.sh now supports RQCMEM override flag (in megabytes).
|
jpayne@68
|
1819 BBSketch now supports intersection and printing sketch intersections.
|
jpayne@68
|
1820 Wrote Sketch.invertKey. Note that this requires the reference.
|
jpayne@68
|
1821 Fixed an issue of not including ref= with # flag in SketchSearcher loading.
|
jpayne@68
|
1822 Fixed a bug stemming from a null return in SketchIndex when there are no matches.
|
jpayne@68
|
1823 Fixed an infinite loop in Sketch comparebydepth and volume.
|
jpayne@68
|
1824 Sketch score moved to a field to make sorting faster.
|
jpayne@68
|
1825 Deleted BBMask_noSam.java
|
jpayne@68
|
1826 37.58
|
jpayne@68
|
1827 Added exception handlers for AssertionErrors in cris.
|
jpayne@68
|
1828 Added nucleotide support to sketch files.
|
jpayne@68
|
1829 Added Var.noPassDotGenotype.
|
jpayne@68
|
1830 Wrote EntropyTracker.
|
jpayne@68
|
1831 Modified BBDuk to use EntropyTracker.
|
jpayne@68
|
1832 Modified BBMask to use EntropyTracker.
|
jpayne@68
|
1833 Note that entropy calculation was slightly off prior to EntropyTracker.
|
jpayne@68
|
1834 BBSketch now supports entropy filtering.
|
jpayne@68
|
1835 BBMap now supports sambamba for bam input.
|
jpayne@68
|
1836 37.59
|
jpayne@68
|
1837 Increased memory for RefSeq sketches by 1g.
|
jpayne@68
|
1838 Set default Sketch entropy filter to 0.66.
|
jpayne@68
|
1839 Set default Sketch minprob to 0.0001, which is sufficient for 31bp at 74% (Q5.9).
|
jpayne@68
|
1840 Added EntropyTracker fast, slow, and superslow modes.
|
jpayne@68
|
1841 Added command-line flags for EntropyTracker speed, verify, and Sketch entropyK.
|
jpayne@68
|
1842 out=/dev/null no longer prompts you to delete it in most cases.
|
jpayne@68
|
1843 RQCFilter sketchminprob flag added, and default changed to 0.2 (95%; Q12.9).
|
jpayne@68
|
1844 Fixed a bug in EntropyTracker ns calculation and added it to verify().
|
jpayne@68
|
1845 RQCFilter now suppresses error messages when SendSketch fails due to connectivity issues.
|
jpayne@68
|
1846 Fully commented EntropyTracker.
|
jpayne@68
|
1847 Suppressed a race-condition-induced error message from closing the input stream early in Reformat and BBDuk.
|
jpayne@68
|
1848 37.60
|
jpayne@68
|
1849 Brought back UnicodeToAscii and changed it slightly. Still does not work as intended, but may work in most cases.
|
jpayne@68
|
1850 37.61
|
jpayne@68
|
1851 Made some slight changes to EntropyTracker.
|
jpayne@68
|
1852 Added ribomap flag to RQCFilter.
|
jpayne@68
|
1853 Added default adapter sequences to RandomReads.
|
jpayne@68
|
1854 Suppressed printing some unnecessary verbose stuff from CoveragePileup.
|
jpayne@68
|
1855 Added kmersIn tracking to kmer counters.
|
jpayne@68
|
1856 KmerCountExact now prints average depth.
|
jpayne@68
|
1857 Added Tools.observedToActualCoverage(). This allows conversion of observed kmer counts to average kmer depth.
|
jpayne@68
|
1858 BBMap now has printstats and printsettings flags to suppress verbose output.
|
jpayne@68
|
1859 Revised observedToActualCoverage with a more precise estimate with a reverse curve-fit.
|
jpayne@68
|
1860 Added observedToActualCoverage to BBNorm.
|
jpayne@68
|
1861 Updated BBSketchGuide.txt with entropy, depth, and merging.
|
jpayne@68
|
1862 37.62
|
jpayne@68
|
1863 Average kmer quality is now tracked in Sketch and stored in the header.
|
jpayne@68
|
1864 Fixed a place in SketchTool where genomeSequences was not being reset to 0 (should have no effect).
|
jpayne@68
|
1865 Added synonyms for onesketch and so forth so that the prefix mode= is no longer required.
|
jpayne@68
|
1866 CompareSketch can now use # notation for paired reads.
|
jpayne@68
|
1867 Added unique2 and unique3 flags.
|
jpayne@68
|
1868 Comparesketch now automatically generates an index if required by some columns.
|
jpayne@68
|
1869 Wrote TaxFilter.reviseByBestEffort(file) to allow closest available ancestors as output.
|
jpayne@68
|
1870 Added FilterByTaxa besteffort flag.
|
jpayne@68
|
1871 Improved FilterByTaxa output formatting.
|
jpayne@68
|
1872 TaxTree constructor became private.
|
jpayne@68
|
1873 TaxTree gained a static sharedTree which is used by default.
|
jpayne@68
|
1874 RQCFilter ribomap, chloromap, and mitomap automatically widen filter thresholds when nothing is found.
|
jpayne@68
|
1875 RQCFilter disables chloromap when the organism is not a plant (Viridiplantae), unless no taxa is given.
|
jpayne@68
|
1876 37.63
|
jpayne@68
|
1877 Fixed a bug when using Sketch constructor to pass average kmer quality and restarted servers.
|
jpayne@68
|
1878 Added anifromwkid flag to alternate between calculating ani from kid.
|
jpayne@68
|
1879 Added minbases to filter results, ignoring small reference sequences.
|
jpayne@68
|
1880 Added minsizeratio to filter results. Intended mainly for all-to-all comparisons.
|
jpayne@68
|
1881 Added Strain and Substrain to TaxTree.
|
jpayne@68
|
1882 Added RepresentativeSet and representative.sh for condensing sets of genomes by all-to-all ANI.
|
jpayne@68
|
1883 37.64
|
jpayne@68
|
1884 Fixed a bug in determining which levels to print in PrintTaxonomy.
|
jpayne@68
|
1885 37.65
|
jpayne@68
|
1886 TaxServer now caps sketch load threads at 2 for local files.
|
jpayne@68
|
1887 Added numChildren, minParentLevelExtended, and maxChildLevelExtended fields to TaxNode.
|
jpayne@68
|
1888 Added printChildren and printRange to taxonomy server URL parsing.
|
jpayne@68
|
1889 37.66
|
jpayne@68
|
1890 Changed tax server error response codes from 200 to 400.
|
jpayne@68
|
1891 Rewrote tax server URL parsing to be more flexible; /tax/ is no longer needed (though /sketch/ is).
|
jpayne@68
|
1892 Broke down server timing reports by local, remote, and usage.
|
jpayne@68
|
1893 Added TaxTree.getChildren() using a hashtable.
|
jpayne@68
|
1894 Depth and merge flags now work in sketch server local mode.
|
jpayne@68
|
1895 TaxServer has now enabled multithreaded local fastq sketch loading and capped the threads at 4 instead of 2 by default.
|
jpayne@68
|
1896 TaxServer handlers are now multithreaded, fixing poor response time when loading data in local mode.
|
jpayne@68
|
1897 RQCFilter now adds the original filename and organism name (if known) to sketch query headers.
|
jpayne@68
|
1898 RQCFilter now reports which microbes were used in filtering.
|
jpayne@68
|
1899 37.67
|
jpayne@68
|
1900 Fixed a extended/normal level bug when widening TaxFilter.
|
jpayne@68
|
1901 Updated CoveragePilup (pileup.sh) to give a more detailed summary, and import scaffold names from the reference sequences (default true) or reads (default false).
|
jpayne@68
|
1902 Fixed crash when SamTools version string contains letters.
|
jpayne@68
|
1903 RQCFilter now gathers chloro, ribo, mito references for mapping at the species level by default, rather than order. This dramatically speeds up mapping, by 20x in some cases.
|
jpayne@68
|
1904 Pileup now calculates kmer coverage.
|
jpayne@68
|
1905 BBMap can now output coverage statistics with the cov flag even if there are no coverage files specified.
|
jpayne@68
|
1906 Reformat can now calculate kmer statistics via the k flag.
|
jpayne@68
|
1907 Reformat now ties loglog k to counting k.
|
jpayne@68
|
1908 Setting loglogk now automatically enables loglog.
|
jpayne@68
|
1909 Fixed order of the conditional last column (name0) in Sketch output.
|
jpayne@68
|
1910 Sketch format 3 now prints qsize and rsize instead of size ratio.
|
jpayne@68
|
1911 RepresentativeSet now expects potentially 5 columns, with qsize and rsize.
|
jpayne@68
|
1912 Clarified an assertion error in Seal.
|
jpayne@68
|
1913 Added taxonomic filtering to RepresentativeSet.
|
jpayne@68
|
1914 RepresentativeSet now prints the size of genomes retained and discarded.
|
jpayne@68
|
1915 Strain can now be assigned to children of subspecies.
|
jpayne@68
|
1916 TaxServer now prints children for life node.
|
jpayne@68
|
1917 JsonObject now ignores attempts to add null values, preventing TaxServer from crashing.
|
jpayne@68
|
1918 Comparison.taxID() and imgID() now return -1 rather than 0 if the number is undefined.
|
jpayne@68
|
1919 Tweaked RepresentativeSet sorting to favor larger genomes; yields a slightly smaller output.
|
jpayne@68
|
1920 Added pJET and lambda to BBMap resources.
|
jpayne@68
|
1921 remote_files now additionally lists cat, dog, mouse, and microbial references.
|
jpayne@68
|
1922 Sketch format 3 now prints out query size in bases, to avoid including massive sets of E.coli all listed under the same taxID.
|
jpayne@68
|
1923 Added DedupeProtein, via the amino flag in dedupe.sh.
|
jpayne@68
|
1924 Fixed a bug in Dedupe in which sequences could subsume each other if both contained the other. This mainly happened when they were the same length but differed by substitutions.
|
jpayne@68
|
1925 37.68
|
jpayne@68
|
1926 Added Clumpify allowNs flag.
|
jpayne@68
|
1927 Clumpify can now process containments and affixes.
|
jpayne@68
|
1928 clumpify.sh no longer prints out the java version.
|
jpayne@68
|
1929 Clumpify now supports a dupesubrate flag.
|
jpayne@68
|
1930 Clarified some steps in variantPipeline.sh.
|
jpayne@68
|
1931 TaxFilter can now parse organism names if a tree is loaded.
|
jpayne@68
|
1932 37.69
|
jpayne@68
|
1933 Renamed kapatags.L40.fasta to kapatags.L40.fa and pJET1.2.fasta to pJET1.2.fa.
|
jpayne@68
|
1934 Added kapa support to RQCFilter.
|
jpayne@68
|
1935 Added pjet, lambda, mtst, and kapa keywords to BBDuk.
|
jpayne@68
|
1936 Added pjet, lambda, mtst, kapa, adapters, artifacts, and phix keywords to Seal to mirror BBDuk.
|
jpayne@68
|
1937 Moved breakReads from Reformat to Tools.
|
jpayne@68
|
1938 Wrote PreParser to allow output stream redirection.
|
jpayne@68
|
1939 Converted most classes to using PreParser.
|
jpayne@68
|
1940 Removed MakeCoverageHistogram.
|
jpayne@68
|
1941 Deprecated NormAndCorrectWrapper.
|
jpayne@68
|
1942 Generally got rid of printOptions(); help is in shellscripts, not code. This is handled in PreParser now.
|
jpayne@68
|
1943 37.70
|
jpayne@68
|
1944 Tightened project error and warning levels for compilation; modified a large amount of code to comply.
|
jpayne@68
|
1945 Deleted a redundant copy of KillSwitch.
|
jpayne@68
|
1946 Deleted redudant copies of safe array allocators.
|
jpayne@68
|
1947 37.71
|
jpayne@68
|
1948 Eliminated hyphen-stripping, java flag parsing, and null flag replacement from PreParser classes.
|
jpayne@68
|
1949 outstreams are now always closed in main, except in rare cases like TaxServer.
|
jpayne@68
|
1950 Added outstream to a few classes like BBMerge.
|
jpayne@68
|
1951 Moved some TaxServer parsing to ServerTools.
|
jpayne@68
|
1952 37.72
|
jpayne@68
|
1953 TaxServer no longer allows external file access by default.
|
jpayne@68
|
1954 TaxServer logs ip addresses of malformed queries.
|
jpayne@68
|
1955 Rewrote ServerTools.sendAndRecieve to be more robust.
|
jpayne@68
|
1956 Changed URLConnection to HttpURLConnection to allow error stream access.
|
jpayne@68
|
1957 Fixed a bug not displaying help in RemoveHuman.
|
jpayne@68
|
1958 calcmem.sh now supports SLURM_MEM_PER_NODE. However this is only set when the --mem= flag is specified for job submission.
|
jpayne@68
|
1959 Sketch metadata is now set in SketchMaker for per-taxa and per-sequence modes.
|
jpayne@68
|
1960 Sketch results can now be filtered by optional metadata fields.
|
jpayne@68
|
1961 37.73
|
jpayne@68
|
1962 Re-added libbbtoolsjni.so, which had somehow been removed.
|
jpayne@68
|
1963 Wrote DiskBench.java and diskbench.sh for comparing multithreaded I/O on local and networked disks.
|
jpayne@68
|
1964 Added RQCFilter flags for Clumpify groups and tmpdir.
|
jpayne@68
|
1965 37.74
|
jpayne@68
|
1966 Sketch servers now log the first 3 lines of the body of malformed queries to help diagnose the problem.
|
jpayne@68
|
1967 mouseCatDogHumanPath added to RQCFilter.
|
jpayne@68
|
1968 Changed parse order of silva flag in TaxServer.
|
jpayne@68
|
1969 Added RQCFilter dryrun flag.
|
jpayne@68
|
1970 Split RQCFilter aggressive flag into aggressivehuman and aggressivemicrobe.
|
jpayne@68
|
1971 Sketch servers no longer return error messages when query sketches are size 0.
|
jpayne@68
|
1972 Fixed a parse bug allowing minkeycount to be 0 for sketch processing.
|
jpayne@68
|
1973 Sketch k2 can now only be set via k.
|
jpayne@68
|
1974 Sketch k2 can no longer be set to k.
|
jpayne@68
|
1975 Enabled verbose output from SketchTool (for debugging).
|
jpayne@68
|
1976 37.75
|
jpayne@68
|
1977 Fixed AssemblyStats default outstream and printing Executing... message.
|
jpayne@68
|
1978 37.76
|
jpayne@68
|
1979 Added Shared.threadLocalRandom() to produce a ThreadLocalRandom when supported, and otherwise a Random.
|
jpayne@68
|
1980 Converted some programs to use Shared.threadLocalRandom(), but not BBNorm since it uses .nextLong(long).
|
jpayne@68
|
1981 DiskBench is now much faster in generating random text.
|
jpayne@68
|
1982 TestFilesystem now supports multiple sequential files and is probably generating correct data.
|
jpayne@68
|
1983 ReadWrite can now getRawOutputStream for /dev/null/* and will remove the * portion. This is much faster than writing to /dev/shm/*
|
jpayne@68
|
1984 Removed an invalid assertion from RepresentativeSet.
|
jpayne@68
|
1985 37.77
|
jpayne@68
|
1986 Wrote ExplodeTree and explodetree.sh, to create a directory structure mirroring the tax tree.
|
jpayne@68
|
1987 Rearranged parse order in A_SampleByteFile and A_SampleD.
|
jpayne@68
|
1988 Added some convenience methods to TaxTree and TaxNode.
|
jpayne@68
|
1989 Wrote LongLongHashMap.
|
jpayne@68
|
1990 Added sequence path lookup to TaxServer.
|
jpayne@68
|
1991 LongHashMap and LongLongHashMap no longer include invalid entries in toArray().
|
jpayne@68
|
1992 Wrote IntLongHashMap.
|
jpayne@68
|
1993 Wrote TaxSize and taxsize.sh to generate the size of tax nodes.
|
jpayne@68
|
1994 Added Silva header parsing to TaxServer.
|
jpayne@68
|
1995 Added size lookup to TaxServer and created a RefSeq size file.
|
jpayne@68
|
1996 ByteStreamWriter.print methods now return this, to allow chaining.
|
jpayne@68
|
1997 Rewrote Read.validate() to be faster, simpler, and more modular.
|
jpayne@68
|
1998 Read MIN_ and MAX_CALLED_QUALITY are now private, and generally replaced by a remapping array.
|
jpayne@68
|
1999 Read validation no longer turns . - X to N by default.
|
jpayne@68
|
2000 Fixed toSemicolon method in TaxTree.
|
jpayne@68
|
2001 Increased TaxServer default memory to 52G in response to frequent GC during high query volume.
|
jpayne@68
|
2002 ByteFile1 mode is no longer forced on Denovo or Cori.
|
jpayne@68
|
2003 Added Parser.validateStdio() to ensure interleaving and file formats are specified when piping. Currently only enabled for BBDuk, BBMap, and Reformat.
|
jpayne@68
|
2004 Added header and more columns to RepresentativeSet.
|
jpayne@68
|
2005 37.78
|
jpayne@68
|
2006 Updated citation guidlines.
|
jpayne@68
|
2007 Added validatebranchless flag and code path.
|
jpayne@68
|
2008 Improved validatebranchless to use binary instead of boolean or.
|
jpayne@68
|
2009 Removed invalid sequence cre_lox_lib_yadapt1 from reference collections.
|
jpayne@68
|
2010 Changed JsonObject handling of null values to be compliant.
|
jpayne@68
|
2011 Added JsonObject handling for floating-point types.
|
jpayne@68
|
2012 Added Json output for Sketch results.
|
jpayne@68
|
2013 37.79
|
jpayne@68
|
2014 RenameGiToNcbi now accepts multiple input files.
|
jpayne@68
|
2015 TaxServer now handles favicon.ico requests.
|
jpayne@68
|
2016 Modified SortByName to better handle large numbers of temp files with long sequences, by reducing buffers and adding a mem mult.
|
jpayne@68
|
2017 Redid JsonObject to remove name field.
|
jpayne@68
|
2018 Wrote JsonParser.
|
jpayne@68
|
2019 Added stopcov option to Pileup.
|
jpayne@68
|
2020 Fixed a bug with reporting invalid bases in Read.
|
jpayne@68
|
2021 Regenerated RefSeq and nt sketches from the latest versions.
|
jpayne@68
|
2022 37.80
|
jpayne@68
|
2023 Fixed hidden compile errors.
|
jpayne@68
|
2024 37.81
|
jpayne@68
|
2025 Fixed a Json display error for duplicate names.
|
jpayne@68
|
2026 Added Json parsing and printing support for escape characters and exponent numbers.
|
jpayne@68
|
2027 Added Json parsing and printing support for arrays.
|
jpayne@68
|
2028 Fixed a bug in ReadWrite failing to strip path correctly.
|
jpayne@68
|
2029 37.82
|
jpayne@68
|
2030 Fixed BBMap producing X8 (insert size) tag for improper pairs (on different contigs).
|
jpayne@68
|
2031 Added an early test for BBMap invalid input files.
|
jpayne@68
|
2032 Added a triple switch to shellscripts for genepool/cori/denovo.
|
jpayne@68
|
2033 37.83
|
jpayne@68
|
2034 Merged a branch.
|
jpayne@68
|
2035 37.84
|
jpayne@68
|
2036 Bump.
|
jpayne@68
|
2037 37.85
|
jpayne@68
|
2038 Removed an obsolete module name from shellscripts.
|
jpayne@68
|
2039 Fixed BBMap bug in which files with uppercase letters were erroneously not found.
|
jpayne@68
|
2040 Modified TetramerFrequencies to comply with stricter compilation rules.
|
jpayne@68
|
2041 37.86
|
jpayne@68
|
2042 Modified TetramerFrequencies to make k a variable.
|
jpayne@68
|
2043 Changed TetramerFrequencies printing to use ByteBuilder.
|
jpayne@68
|
2044 Wrote TestFormat and testformat2.sh.
|
jpayne@68
|
2045 Undefined amino acids are now assigned X instead of .
|
jpayne@68
|
2046 Fixed a race condition in ByteFile2 via a defensive copy.
|
jpayne@68
|
2047 37.87
|
jpayne@68
|
2048 Fixed a ByteBuilder overflow bug in append(long).
|
jpayne@68
|
2049 Changed TetramerFrequencies to use ByteBuilder.
|
jpayne@68
|
2050 Fixed missing else in CalcTrueQuality parser.
|
jpayne@68
|
2051 Added a new switch case to shellscripts to handle Shifter environment variables on Cori/Denovo.
|
jpayne@68
|
2052 Wrote multithreaded version of TestFormat.
|
jpayne@68
|
2053 Added merge and trim to TestFormat.
|
jpayne@68
|
2054 Better error message for ByteStreamWriter to read-only file.
|
jpayne@68
|
2055 TestFormat no longer crashes when trying to write to a read-only directory.
|
jpayne@68
|
2056 37.88
|
jpayne@68
|
2057 SummarizeSketch now supports colors.
|
jpayne@68
|
2058 Wrote CallVariants.findUniqueSubs to help locate bad NovaSeq reads.
|
jpayne@68
|
2059 Added variant-based read filtering to BBDuk.
|
jpayne@68
|
2060 Read.countSubs now supports shortmatch.
|
jpayne@68
|
2061 Fixed Read.countMatchSymbols().
|
jpayne@68
|
2062 Fixed clearfilters flag not clearing SamFilter, only VarFilter.
|
jpayne@68
|
2063 Var now parses depth, minusdepth, r1p, r2p, r1m, and r2m from VCF.
|
jpayne@68
|
2064 Added AD field to primary fields of VCF output for ease of parsing.
|
jpayne@68
|
2065 Wrote VcfLoader, a multithreaded VCF or var-format loader.
|
jpayne@68
|
2066 37.89
|
jpayne@68
|
2067 Wrote ByteBuilder.appendFast(double, int).
|
jpayne@68
|
2068 Changed Var to perform calculations with doubles instead of floats.
|
jpayne@68
|
2069 Fixed nondeterminisim in RevisedAlleleFraction calculation. This was not due to the use of floats vs doubles so the doubles can be changed back.
|
jpayne@68
|
2070 VCF/Var files are now written much faster, at around 55 MB/s up from 10 MB/s.
|
jpayne@68
|
2071 ByteStreamWriter now supports multithreaded input.
|
jpayne@68
|
2072 FileFormat now detects VCF and Var files.
|
jpayne@68
|
2073 Added some information to Var header.
|
jpayne@68
|
2074 Wrote VcfWriter class to write VCF/Var files multithreaded, at up to 630 MB/s.
|
jpayne@68
|
2075 Wrote Tools.isDigit, isLetter, toUpperCase, etc. Character.isDigit is slow.
|
jpayne@68
|
2076 ByteBuilder now implements CharSequence, allowing it to be used with TextStreamWriter.
|
jpayne@68
|
2077 Changed several instances of StringBuilder and String.Format to ByteBuilder.
|
jpayne@68
|
2078 37.90
|
jpayne@68
|
2079 Multithreaded TetramerFrequencies.
|
jpayne@68
|
2080 Fixed some printing errors.
|
jpayne@68
|
2081 Multithreaded var2.MergeSamples.
|
jpayne@68
|
2082 37.91
|
jpayne@68
|
2083 Multithreaded FilterVCF. Poor speedup with vcfline.toVar, for reasons that are hard to diagnose.
|
jpayne@68
|
2084 Fixed a bug in ScafMap.loadVcfHeader.
|
jpayne@68
|
2085 Wrote Tools.parseDelimited.
|
jpayne@68
|
2086 Var.fromVCF now optionally imports extended information.
|
jpayne@68
|
2087 Added maxReads, minCov, and maxCov to VarFilter.
|
jpayne@68
|
2088 Reordered VCF info fields for faster parsing.
|
jpayne@68
|
2089 Added code to convince compiler some possible null pointer acceses were safe.
|
jpayne@68
|
2090 Added ConcurrentReadInputStream.returnList(ListNum) with internal null check.
|
jpayne@68
|
2091 Added an assertion to most paring statements expecting a non-null b.
|
jpayne@68
|
2092 Fixed several other potential null accesses.
|
jpayne@68
|
2093 Made AccessionToTaxid/RenameGiToNcbi somewhat faster; running multiple concurrent unpigz processes makes it slow.
|
jpayne@68
|
2094 Fixed taxpath setting failure in RenameGiToNcbi and other programs.
|
jpayne@68
|
2095 Added G.species name format support for TaxServer and taxonomy in general.
|
jpayne@68
|
2096 PreParser now supports printexecuting flag for command-line suppression of repeating the parameters.
|
jpayne@68
|
2097 Wrote SuperLongList.
|
jpayne@68
|
2098 Column needed for percent of library in sketch output, something like depth * genome size.
|
jpayne@68
|
2099 TestFormat2 now works better with negative numbers for quality and broken quality scores.
|
jpayne@68
|
2100 TestFormat2 supports additional fields like length mode and stddev.
|
jpayne@68
|
2101 37.92
|
jpayne@68
|
2102 Bump for Jenkins.
|
jpayne@68
|
2103 37.93
|
jpayne@68
|
2104 RQCFilter now defaults to auto for taxTreeFile.
|
jpayne@68
|
2105 Fixed BBSplit crashing when parsing flags without an = symbol.
|
jpayne@68
|
2106 Fixed some missing accession numbers in TaxServer.
|
jpayne@68
|
2107 TaxServer now timestamps queries and displays the number of NotFound queries.
|
jpayne@68
|
2108 37.94
|
jpayne@68
|
2109 Found and replaced some instances of z2=Xmx with z2=Xms in shells.
|
jpayne@68
|
2110 Reimplemented ByteFile.pushBack(line) to sidestep a NERSC slowdown in multithreaded java reading.
|
jpayne@68
|
2111 Fixed VcfLine.type().
|
jpayne@68
|
2112 Wrote GffLine and vcf2gff.sh.
|
jpayne@68
|
2113 Added CallVariants gff output.
|
jpayne@68
|
2114 Fixed pairLength() and pairCount() swap.
|
jpayne@68
|
2115 Fixed the way sambamba was being called.
|
jpayne@68
|
2116 Re-tested bcftools 1.7 and BBMap 37.94. CallVariants is 14x more efficient and 180x faster.
|
jpayne@68
|
2117 *It is now difficult to replicate the memory/timing bug in 37.94 with CompareSketch bf1, but partially replicates with bf2.
|
jpayne@68
|
2118 TaxTree now checks for the auto keyword just before tree load.
|
jpayne@68
|
2119 Moved TaxNode size tracking from TaxServer to TaxTree.
|
jpayne@68
|
2120 Wrote SummarizeContamReport and summarizecontam.sh.
|
jpayne@68
|
2121 Fixed an off-by-one error in Var to GFF translation.
|
jpayne@68
|
2122 Added match generation from cigar, bases, and reference with no MDTags.
|
jpayne@68
|
2123 Fixed bug in MDWalker for substitutions immediately after deletions.
|
jpayne@68
|
2124 37.95
|
jpayne@68
|
2125 Reformat is now able to generate match strings from a reference instead of an MDTag.
|
jpayne@68
|
2126 Default SamStreamer threads increased to 6, to deal with match string generation from sam 1.3.
|
jpayne@68
|
2127 ref added as a flag for various programs to enable MD-free sam line processing.
|
jpayne@68
|
2128 Fixed an assertion preventing # replacement for BBMap input.
|
jpayne@68
|
2129 Fixed handling of assertion errors during fastq quality encoding autodetection during initialization, for paired files in which file 2 has corrupted quality scores.
|
jpayne@68
|
2130 Program now prints a warning instead of terminating when quality format is specified but it seems wrong, in at least one case.
|
jpayne@68
|
2131 Failed an attempt to accelerate FASTQ.quadToRead.
|
jpayne@68
|
2132 37.96
|
jpayne@68
|
2133 Wrote FindJiCJunctions and processhi-c.sh for identifying and trimming junctions.
|
jpayne@68
|
2134 Added formatting functions in Tools to handle printing reads and bases processed.
|
jpayne@68
|
2135 Fixed a crash bug in CallVariants realign mode.
|
jpayne@68
|
2136 Fixed missing sample names in CallVariants multisample mode.
|
jpayne@68
|
2137 Fastawrap now supports kmg extensions.
|
jpayne@68
|
2138 Fixed assemblers trying to get stats from stdout.fa.
|
jpayne@68
|
2139 Fuse now allows length limits of fused output.
|
jpayne@68
|
2140 Wrote preliminary junction detection for CallVariants.
|
jpayne@68
|
2141 Made new RiboKmers files from Silva 132, and made a script for replicating the creation process (in /pipelines/).
|
jpayne@68
|
2142 Wrote var2.SoftClipper.
|
jpayne@68
|
2143 37.97
|
jpayne@68
|
2144 Added FilterByTile to RQCFilter.
|
jpayne@68
|
2145 Fixed a Clumpify crash-hang with low memory.
|
jpayne@68
|
2146 Made a Clumpify KmerSort superclass to reduce code redundancy between KmerSort versions.
|
jpayne@68
|
2147 Changed an exception handler in FastaReadInputStream to handle null-pointer exceptions as well.
|
jpayne@68
|
2148 Wrote RQCFilter2, with dependencies in a single path set by the rqcfilterdata flag.
|
jpayne@68
|
2149 37.98
|
jpayne@68
|
2150 Fixed a bug in RQCFilter2 mousecatdoghuman mode with read-only files.
|
jpayne@68
|
2151 Added ksplit to BBDuk.
|
jpayne@68
|
2152 37.99
|
jpayne@68
|
2153 Improved error message when processing sam files with no MD tag in Reformat.
|
jpayne@68
|
2154 Possibly fixed a crash-hang during OutOfMemory exception handling in ConcurrentGenericReadInputStream.
|
jpayne@68
|
2155 Merged DNA and RNA artifact files for RQCFilter2; modified the primary artifact files, and removed redundancies.
|
jpayne@68
|
2156 Adapters are no longer present in Illumina.artifacts, only in adapters.fa.
|
jpayne@68
|
2157 Nextera linkers are no longer present in Illumina.artifacts.
|
jpayne@68
|
2158 PolyA is now a flag.
|
jpayne@68
|
2159 Created a second RQCFilterData - RQCFilterData_Local, identical but with unmasked sequence names.
|
jpayne@68
|
2160 Added ploidy flag to CallPeaks documentation.
|
jpayne@68
|
2161 Added polyA.fa.gz to resources.
|
jpayne@68
|
2162 Modified resources/sequencing_artifacts.fa.gz to remove adapter sequences and Nextera linkers.
|
jpayne@68
|
2163 Changed Read constructors to ensure amino acid flag is passed correctly.
|
jpayne@68
|
2164 Fixed an array length overflow in ByteBuilder.
|
jpayne@68
|
2165
|
jpayne@68
|
2166 TODO: Counting Cuckoo filter as Bloom filter replacement.
|
jpayne@68
|
2167 TODO: SamToRoc is broken.
|
jpayne@68
|
2168 TODO: Rob example of NM tag not counting deletions adjacent to soft-clipping.
|
jpayne@68
|
2169
|
jpayne@68
|
2170 TODO: Put new RiboKmers on gdrive or something.
|
jpayne@68
|
2171 TODO: Fasta input, when auto-split, misses sequences of exactly the split length. Remove fasta auto-split.
|
jpayne@68
|
2172 TODO: CallVariants line 861 and 883 assertion.
|
jpayne@68
|
2173 TODO: Anomaly with soft-clipped reads at junctions for BBMap in local mode - analyze left-clipped reads.
|
jpayne@68
|
2174 TODO: N rate reported by BBMap seems odd in local mode.
|
jpayne@68
|
2175
|
jpayne@68
|
2176 TODO: Since move to floating-point trimming, Reformat does not speed up with bf2 when trimming.
|
jpayne@68
|
2177 TODO: Move all packages with RQCFilter dep
|
jpayne@68
|
2178 TODO: Change RQCFilter to use a config file for paths.
|
jpayne@68
|
2179 TODO: BBMap: Scaffold ends have suspiciously low coverage, with few reads mapping that extend past the ends.
|
jpayne@68
|
2180 TODO: Local mode does not increase mapping rate.
|
jpayne@68
|
2181 TODO: Clumpify pick pivot from both reads in a pair.
|
jpayne@68
|
2182 TODO: Move all hard-coded references to a single umbrella director that can be tarred.
|
jpayne@68
|
2183 *TODO: TaxServer Children does not work if there are no children.
|
jpayne@68
|
2184 TODO: Column needed for percent of library in sketch output, something like depth * genome size.
|
jpayne@68
|
2185 TODO: For some reason, java after 1.7 release 51 is slow in multithreaded reading gzipped files with pigz. Submit a bug report.
|
jpayne@68
|
2186
|
jpayne@68
|
2187 TODO: TestFormat overrepresented kmers? Adapter sequence? Probable organisms?
|
jpayne@68
|
2188
|
jpayne@68
|
2189 TODO: ExplodeTree does not, strictly speaking, NEED to have a directory-making phase...
|
jpayne@68
|
2190
|
jpayne@68
|
2191 TODO: Add variable for BBNorm RAM to cell ratio modifier.
|
jpayne@68
|
2192
|
jpayne@68
|
2193 TODO: Better document server update and restart.
|
jpayne@68
|
2194 ******TODO: Stats N90/L90 is broken in some cases with huge genomes.
|
jpayne@68
|
2195 TODO: Modify RepresentativeSet to allow Strings as keys, or somehow add numbers to sequence names.
|
jpayne@68
|
2196 TODO: For mapping to ribo (and for filtering in general) try reducing BBMap's sites2 and test resulting speed.
|
jpayne@68
|
2197
|
jpayne@68
|
2198 TODO: ab indicates failed connections when concurrently accessing help.
|
jpayne@68
|
2199 TODO: Add taxa-level barriers to edges in RepresentativeSet.
|
jpayne@68
|
2200 TODO: Document changes due to tax tree adjustments
|
jpayne@68
|
2201
|
jpayne@68
|
2202 TODO: DBDate TaxServer field.
|
jpayne@68
|
2203 TODO: Sketching nt is slow and has poor threading. Reason is unclear but may relate to huge numbers of tiny sequences sharing taxIDs.
|
jpayne@68
|
2204
|
jpayne@68
|
2205 TODO: File sketching for duplicate detection on filesystem.
|
jpayne@68
|
2206
|
jpayne@68
|
2207 ***TODO: Race condition causes spurious error message when reads= is set in some cases. E.g. reformat.sh in=P1.fastq.gz out=stdout.fa reads=5. But this cannot be replicated on Genepool.
|
jpayne@68
|
2208 TODO: Sketch nucleotide encoding support? Would require sorting after hashing...
|
jpayne@68
|
2209
|
jpayne@68
|
2210 TODO: Allow sketch number field in output. For this purpose sketch number would need to be deterministic.
|
jpayne@68
|
2211
|
jpayne@68
|
2212 TODO: Try euk ribo blacklisting for depth accuracy.
|
jpayne@68
|
2213 TODO: Some RefSeq TaxIDs such as M.Ruber are duplicated which messes up depth2.
|
jpayne@68
|
2214
|
jpayne@68
|
2215 TODO: KmerCountExact and KCompress guides.
|
jpayne@68
|
2216 TODO: Autosize becomes unnaturally small in conjunction with whitelist.
|
jpayne@68
|
2217 TODO: Sketch should autodetect k from input sketches.
|
jpayne@68
|
2218 TODO: Write a new program for per-file sketches.
|
jpayne@68
|
2219 ***TODO: Incorrect dual kmer ANI is probably due to short kmer not being rcomp'ed.
|
jpayne@68
|
2220
|
jpayne@68
|
2221 TODO: Figure out how to parse amino in only 1 place, and add amino8 DisplayParams support.
|
jpayne@68
|
2222
|
jpayne@68
|
2223 TODO: For some reason DemuxByName is slow with compressed output files. They seem to be written serially.
|
jpayne@68
|
2224 TODO: Circular realigner.
|
jpayne@68
|
2225 TODO: Move more scripts to pipelines.
|
jpayne@68
|
2226 TODO: Explanation on BBMap stats output.
|
jpayne@68
|
2227 TODO: Multiple input file support for Reformat.
|
jpayne@68
|
2228
|
jpayne@68
|
2229 TODO: CompareSketch should try to parse the name of input fasta sequences, and include it by default in the Query header. *Actually it does - maybe this is only for IMG?
|
jpayne@68
|
2230 TODO: Sketch minID setting instead of/in addition to current minWKID setting. Needs server reboot and new distro, and DisplayParams changes.
|
jpayne@68
|
2231 TODO: Add BBMap gff output for spliced genes? Or maybe from a sam file makes more sense.
|
jpayne@68
|
2232
|
jpayne@68
|
2233
|
jpayne@68
|
2234 TODO: Move targetSketchSize to displayParams.
|
jpayne@68
|
2235
|
jpayne@68
|
2236 TODO: Dedupe ignores minoverlap for merging.
|
jpayne@68
|
2237
|
jpayne@68
|
2238 TODO: Shuffle is being slow and using a lot of memory.
|
jpayne@68
|
2239
|
jpayne@68
|
2240 TODO: Clumpify consensus should warn/quit when used with paired reads or deduplication.
|
jpayne@68
|
2241
|
jpayne@68
|
2242 TODO: Investigate "rescued" counts. R1 seems abnormally high with NovaSeq data.
|
jpayne@68
|
2243
|
jpayne@68
|
2244 TODO: Relaunch sketch servers with support for hk and hamino in doubleheader.
|
jpayne@68
|
2245 TODO: RQCFilter filterbytile.
|
jpayne@68
|
2246
|
jpayne@68
|
2247 TODO: BBMap trd default for sam output
|
jpayne@68
|
2248 TODO: Chongle request
|
jpayne@68
|
2249 TODO: IMG/NR servers
|
jpayne@68
|
2250 TODO: BBMap samtools in container
|
jpayne@68
|
2251
|
jpayne@68
|
2252 TODO: More informative TaxServer message when file is not present in URL.
|
jpayne@68
|
2253 TODO: Classify tax nodes under species as subspecies when possible.
|
jpayne@68
|
2254
|
jpayne@68
|
2255 TODO: Modify consect to print list of errors fixed/not fixed.
|
jpayne@68
|
2256
|
jpayne@68
|
2257 *TODO: BBMask should remove fully-masked sequences, and trim sequences after masking (on the right side, at least) to eliminate masked ends.
|
jpayne@68
|
2258 TODO: Modify Clump.removeDuplicates_inner to make a consensus of duplicates.
|
jpayne@68
|
2259 TODO: Add CallVariants SNP output format (?)
|
jpayne@68
|
2260
|
jpayne@68
|
2261 V36.
|
jpayne@68
|
2262 36.00
|
jpayne@68
|
2263 BBDuk now prints the exact number of reads removed.
|
jpayne@68
|
2264 Refactored BBDuk and Seal to consolidate some shared code.
|
jpayne@68
|
2265 Removed more colorspace-related code and comments.
|
jpayne@68
|
2266 Added renamebyinsert flag to RandomReads.
|
jpayne@68
|
2267 Fixed a custom parsing bug in BBMerge; it was being skipped.
|
jpayne@68
|
2268 Added outputonlyincorrect (ooi) flag to BBMerge.
|
jpayne@68
|
2269 Added TAG_CUSTOM field to BBMerge, to allow annotating reads with a feature vector.
|
jpayne@68
|
2270 Wrote ProcessBBMergeHeaders for converting feature vectors to tsv format.
|
jpayne@68
|
2271 Added minsecondratio flag to BBMerge.
|
jpayne@68
|
2272 Slightly increased default pfilter value of BBMerge.
|
jpayne@68
|
2273 Added quality-trimming to Dedupe to deal with contigs that have leading or trailing Ns.
|
jpayne@68
|
2274 Dedupe now defaults to ignoring kmers contain Ns to prevent extreme slowdown. Noted by Shijie.
|
jpayne@68
|
2275 Added tossjunk flag to Tadpole. Requested by Matt N.
|
jpayne@68
|
2276 Added tossdepth flag to Tadpole. Requested by Torbin N.
|
jpayne@68
|
2277 Disabled BBMerge JNI mode until the new code is ported.
|
jpayne@68
|
2278 36.01
|
jpayne@68
|
2279 Added bf1 flag (and parseCommonStatic) support to Pileup.
|
jpayne@68
|
2280 Temporarily defaulted bf1 flag to false until race condition is resolved. This does not manifest on Genepool, only external computers.
|
jpayne@68
|
2281 Added rem/rsem flags to BBMerge, for extra stringency. Large improvement in false-positives.
|
jpayne@68
|
2282 Added outd flag to Tadpole for discarded reads.
|
jpayne@68
|
2283 Modified tadpole.sh to remove references to ine/oute since they are confusing and no longer matched the documentation.
|
jpayne@68
|
2284 Modified order of table allocation and memory messages in AbstractKmerTable to better understand out-of-memory states.
|
jpayne@68
|
2285 Reduced available memory estimation for kmer tables from 0.75 to 0.72 when Xms is unset to avoid overallocation.
|
jpayne@68
|
2286 RQCFilter now specifies -Xms flag to prevent khist running out of memory. Noted by Seung-jin.
|
jpayne@68
|
2287 36.02
|
jpayne@68
|
2288 Disabled a debugging message and closed output stream for Tadpole tossjunk/outd flags.
|
jpayne@68
|
2289 36.03
|
jpayne@68
|
2290 Added support for RQCFilter microbial detection without removal. Stats are identical to removal.
|
jpayne@68
|
2291 Changed Tadpole tossdepth flag; it now discards reads with depth at or below the setting, rather than just below.
|
jpayne@68
|
2292 Fixed a bug with Tadpole2 tossdepth flag; it was not working properly with k>31.
|
jpayne@68
|
2293 Sped up BBMerge in rsem mode.
|
jpayne@68
|
2294 36.04
|
jpayne@68
|
2295 Added adapter search to BBMerge.
|
jpayne@68
|
2296 Changed efilter to allow a value of zero (perfect overlaps); -1 now disables it.
|
jpayne@68
|
2297 Modified BBMerge ecct such that when used in conjunction with rem/rsem, it only activates when extension fails.
|
jpayne@68
|
2298 Updated BBMerge guide.
|
jpayne@68
|
2299 36.05
|
jpayne@68
|
2300 Removed a condition that forced Tadpole2 to be used over Tadpole1.
|
jpayne@68
|
2301 Increased BBMerge adapter detection sensitivity.
|
jpayne@68
|
2302 36.06
|
jpayne@68
|
2303 BBMask now defaults to using all available memory instead of half.
|
jpayne@68
|
2304 Fixed BBMerge read2 adapter output.
|
jpayne@68
|
2305 Added poly-A removal to BBMerge adapter output, since that implies no signal.
|
jpayne@68
|
2306 Added renamebyinsert flag to BBMap.
|
jpayne@68
|
2307 Re-enabled bf2 mode on Genepool, where it does not appear to cause problems.
|
jpayne@68
|
2308 Changed BBSplit default output format to fastq.
|
jpayne@68
|
2309 Fixed parsecustom toggle deactivating in BBMerge.
|
jpayne@68
|
2310 Changed naming format of renamebyinsert to omit the slash.
|
jpayne@68
|
2311 36.07
|
jpayne@68
|
2312 Wrote MutateGenome and mutate.sh to create genome clones with a specified percent identity.
|
jpayne@68
|
2313 Added -Xms flag to bbmask.sh.
|
jpayne@68
|
2314 36.08
|
jpayne@68
|
2315 Added opfn flag to CrossBlock.
|
jpayne@68
|
2316 36.09
|
jpayne@68
|
2317 Wrote ParseCrossblockResults.
|
jpayne@68
|
2318 Wrote SummarizeCrossblock and summarizecrossblock.sh.
|
jpayne@68
|
2319 Analyzed Crossblock specificity - it appears that very little useful sequence is discarded, even with 20 same-species libraries.
|
jpayne@68
|
2320 Added minprob flag to CalcUniqueness. Requested by Alex S.
|
jpayne@68
|
2321 36.10
|
jpayne@68
|
2322 Added support for unlimited kmer lengths to CrossBlock.
|
jpayne@68
|
2323 SummarizeCrossblock can now complete despite missing files.
|
jpayne@68
|
2324 FilterByName now supports whitespace trimming.
|
jpayne@68
|
2325 Tax package can now handle missing IDs without crashing, if assertions are disabled. Requested by Jeff F.
|
jpayne@68
|
2326 DemuxByName now looks for longer affixes first. This allows for a longest-affex match, so it will prioritize e.g. sample10 over sample1.
|
jpayne@68
|
2327 Added keephumanreads flag to RQCFilter.
|
jpayne@68
|
2328 Wrote removecatdogmousehuman.sh.
|
jpayne@68
|
2329 36.11
|
jpayne@68
|
2330 Added alignment-quality filtering for Reformat. This includes idfilter, subfilter, etc.
|
jpayne@68
|
2331 Moved some functions for indel counting from BBMap to Read.
|
jpayne@68
|
2332 Tested Crossblock specificity with a longer kmer of 48. Specificity improves substantially.
|
jpayne@68
|
2333 36.12
|
jpayne@68
|
2334 Validated CrossBlock's sensitivity up to K=62 with benchmark data. At K=75 some contamination slips through (0.04%).
|
jpayne@68
|
2335 Wrote PartitionReads and partition.sh. Only supports round-robin currently.
|
jpayne@68
|
2336 Added Tadpole lowdepthfraction, requirebothbad, and tossuncorrectable flags for filtering.
|
jpayne@68
|
2337 Added Tadpole deadzone flag for controlling the amount of read tip left uncorrected.
|
jpayne@68
|
2338 Split Tadpole aggressive error correction flag into aggressive (use aggressive parameters) and eccfull (use tail correction on the entire read).
|
jpayne@68
|
2339 Rewrote SummarizeSealStats to use objects and calculate the total contamination rate.
|
jpayne@68
|
2340 Wrote RenameAndMerge and muxbyname.sh, a standalone implementation of the mux phase of Crossblock.
|
jpayne@68
|
2341 Multithreaded RenameAndMerge. It now scales with the number of input files.
|
jpayne@68
|
2342 Changed RandomReads illuminanames to use space-num-colon rather than slash-num for pair number indicator.
|
jpayne@68
|
2343 Changed RenameAndMerge to use space-num-colon also.
|
jpayne@68
|
2344 Fixed LogLog not clearing Kmer objects between re-use for K>31.
|
jpayne@68
|
2345 Modified BBMap to track last know numbers of mapped and unmapped reads in a static field.
|
jpayne@68
|
2346 RQCFilter now prints additional messages to stderr and log indicating the number of reads removed or remaining after each stage.
|
jpayne@68
|
2347 RQCFilter final log line changed to indicate what completed.
|
jpayne@68
|
2348 Fixed array-out-of-bounds parsing single-digit gi numbers and taxids in TaxTree.
|
jpayne@68
|
2349 Added correctfirst option to normAndCorrectWrapper.
|
jpayne@68
|
2350 Changed BBDuk to print 0.00% instead of NaN% when there are no input reads.
|
jpayne@68
|
2351 Changed RQCFilter stderr message format.
|
jpayne@68
|
2352 Added insert size histogram to Reformat from sam/bam files.
|
jpayne@68
|
2353 Changed RQCFilter pre-human qtrim from r to rl to match microbes. Note that the base count will be slightly off.
|
jpayne@68
|
2354 36.13
|
jpayne@68
|
2355 Added Tadpole conservative error correction flag.
|
jpayne@68
|
2356 Added minhash sketch creation to KmerCountExact.
|
jpayne@68
|
2357 Wrote MinHashSketch and comparesketch.sh to manipulate and compare sketches.
|
jpayne@68
|
2358 Fixed CrossBlock making Tadpole default to assembly rather than correction.
|
jpayne@68
|
2359 Multithreaded sketch creation.
|
jpayne@68
|
2360 Added sketch delta compression.
|
jpayne@68
|
2361 Added FileFormat support for int1d, long1d, and bitset extensions.
|
jpayne@68
|
2362 Improved hexadecimal parsing.
|
jpayne@68
|
2363 Added substring mode to demuxbyname; still needs improvement.
|
jpayne@68
|
2364 Added filterbyname prefix mode; also needs improvement.
|
jpayne@68
|
2365 *Paused work on Clumpify after modifying methods to allow return of multiple consensus reads from a single clump, in order to commit.
|
jpayne@68
|
2366 BBNorm prints more informative error messages when crashing due to invalid characters.
|
jpayne@68
|
2367 ConcurrentGenericReadOutputStream now exits successfully with an error state rather than crashing when closed prematurely (in some cases).
|
jpayne@68
|
2368 *Temporarily disabled JNI mode for BandedAligner due to differences in results versus Java version.
|
jpayne@68
|
2369 Increased max length of BBMap key buffers from 128 to 256 to prevent crashes when using very short kmers with vslow and long reads.
|
jpayne@68
|
2370 36.14
|
jpayne@68
|
2371 LogLog changed to allow first argument to be an input filename without in= flag.
|
jpayne@68
|
2372 Fixed a reversed condition causing some streams to indicate they finished unsuccesfully. The output was in fact valid.
|
jpayne@68
|
2373 Fixed an issue in which BBMap, with very short K and short reads, could throw an assertion error.
|
jpayne@68
|
2374 36.15
|
jpayne@68
|
2375 Made sketch package.
|
jpayne@68
|
2376 Renamed MinHashSketch to SketchTool.
|
jpayne@68
|
2377 Wrote Sketch.
|
jpayne@68
|
2378 Wrote SketchMaker and sketch.sh.
|
jpayne@68
|
2379 36.18
|
jpayne@68
|
2380 BBMerge now checks input adapter sequences to ensure they are valid nucleotides.
|
jpayne@68
|
2381 Capped BBMerge input adapter length at 21bp.
|
jpayne@68
|
2382 36.19
|
jpayne@68
|
2383 Fixed a couple of bugs in Sketch implementation.
|
jpayne@68
|
2384 Added better Sketch name support in single-sketch mode.
|
jpayne@68
|
2385 Sketches made by KmerCountExact now avoid duplicate hashcodes via LongHeapSet.
|
jpayne@68
|
2386 BBMerge no longer loads terminal Ns in adapter sequences.
|
jpayne@68
|
2387 BBMerge outadapter now trims trailing Ns.
|
jpayne@68
|
2388 Added an early exit condition to BBMergeOverlapper.
|
jpayne@68
|
2389 36.20
|
jpayne@68
|
2390 Temporarily disabled MPI to allow easier compilation.
|
jpayne@68
|
2391 36.21
|
jpayne@68
|
2392 Fixed a condition in config file parsing.
|
jpayne@68
|
2393 Added PrintTaxonomy support for sequence files, and generally improved robustness.
|
jpayne@68
|
2394 Updated sketch package to handle IMG headers.
|
jpayne@68
|
2395 Sketches can now be toggled between hex and ASCII-48 with the a48 flag.
|
jpayne@68
|
2396 Sketch delta-compression is controlled with the delta flag.
|
jpayne@68
|
2397 Sketches are now written in ASCII-48 by default.
|
jpayne@68
|
2398 Sketch loading now uses a ConcurrentLinkedQueue to decouple the number of files and threads.
|
jpayne@68
|
2399 36.22
|
jpayne@68
|
2400 Verified that Phix and E.coli K-12 do not share any 27-mers.
|
jpayne@68
|
2401 Renamed BBMerge normalmode to flatmode.
|
jpayne@68
|
2402 Fixed a few issues in BBMerge related to combinations of flags for error-correction, extension, and quality processing.
|
jpayne@68
|
2403 Added BBMerge minapproxoverlap flag.
|
jpayne@68
|
2404 Added RQCFilter removelambda flag.
|
jpayne@68
|
2405 Moved a TaxTree construction wrapper into TaxTree class.
|
jpayne@68
|
2406 Wrote CompareSketch, a more formal tool for comparing sketches.
|
jpayne@68
|
2407 Wrote Comparison class for tracking comparisons.
|
jpayne@68
|
2408 Added rollover capability to Heap.
|
jpayne@68
|
2409 Added sketch genome size field.
|
jpayne@68
|
2410 Made LongHeapSet subclass, SketchHeap with additional fields, including genome size.
|
jpayne@68
|
2411 Added raw mode to Sketch output (instead of hex or ASCII-48).
|
jpayne@68
|
2412 SketchMaker now names sketches by NCBI name if a TaxTree is loaded.
|
jpayne@68
|
2413 CalcUniqueness now defaults to 25-mers instead of 20-mers.
|
jpayne@68
|
2414 Sketch headers now contain an encoding field (CD).
|
jpayne@68
|
2415 Sketch headers support 2-character abbreviations.
|
jpayne@68
|
2416 Added multithreading and taxonomy support to CompareSketch.
|
jpayne@68
|
2417 36.23
|
jpayne@68
|
2418 FileFormat now supports sketch detection.
|
jpayne@68
|
2419 36.24
|
jpayne@68
|
2420 Removed a debugging assertion in BBMerge.
|
jpayne@68
|
2421 Fixed ConcurrentHashMap use in Sketch package.
|
jpayne@68
|
2422 Finshed IMG taxonomy support for Sketch package.
|
jpayne@68
|
2423 36.25
|
jpayne@68
|
2424 Added Bisulfite adapters to adapters.fa.
|
jpayne@68
|
2425 Added name0 (original name) field to sketches. Name is now for official taxonomic name.
|
jpayne@68
|
2426 Fixed some locations where ByteFile2 was being used off of Genepool.
|
jpayne@68
|
2427 Removed a debugging assertion in BBMerge that somehow did not get removed in 36.24.
|
jpayne@68
|
2428 36.26
|
jpayne@68
|
2429 Split RQCFilter kmer filtering into a short (k=19) and long (k=31) section, to increase accuracy.
|
jpayne@68
|
2430 36.27
|
jpayne@68
|
2431 Added N/L90 to stats output.
|
jpayne@68
|
2432 Wrote TadpolePipe and tadpipe.sh, for optimal Tadpole assemblies.
|
jpayne@68
|
2433 Revised TadpoleWrapper. Added many new options including early exit, search-space bisection, and search-space expansion.
|
jpayne@68
|
2434 36.28
|
jpayne@68
|
2435 Fixed Reformat ihist output; it was using reads classified as improper pairs.
|
jpayne@68
|
2436 Fixed BBMerge bug, it was outputting extended reads with extend+ecco.
|
jpayne@68
|
2437 Removed an unnecessary array copy from BBMerge.
|
jpayne@68
|
2438 Added a new error-correction method to Tadpole (reassemble).
|
jpayne@68
|
2439 BBMerge now passes read limit to Tadpole if a read limit is used.
|
jpayne@68
|
2440 Added countErrors() to Tadpole to skip error-correction on error-free reads.
|
jpayne@68
|
2441 Made ErrorTracker class for Tadpole to use instead of an array.
|
jpayne@68
|
2442 Fixed a bug in error-tracking statistics.
|
jpayne@68
|
2443 BBMerge now supports tail, pincer, and reassemble flags.
|
jpayne@68
|
2444 BBMerge and Tadpole both default to reassemble for error correction, instead of pincer and tail.
|
jpayne@68
|
2445 Changed Tadpole default deadzone to 0.
|
jpayne@68
|
2446 Added Tadpole flags controlling reassemble window limits.
|
jpayne@68
|
2447 Removed mpi package from all versions.
|
jpayne@68
|
2448 36.29
|
jpayne@68
|
2449 Added -Xms support to bbmerge.sh.
|
jpayne@68
|
2450 Fixed a bug with extra flag in bbmerge.
|
jpayne@68
|
2451 Disabled read validation and increased buffers for LogLog speedup.
|
jpayne@68
|
2452 Added LogLog non-atomic mode support, but speed difference is minor.
|
jpayne@68
|
2453 Added Tadpole rollback capability for reads that cause problems during error correction.
|
jpayne@68
|
2454 Increased Tadpole ability to avoid correcting indels.
|
jpayne@68
|
2455 36.30
|
jpayne@68
|
2456 KmerTable regenerate now removes kmers less than or equal to a specified limit.
|
jpayne@68
|
2457 Tadpole now removes all low-depth kmers missed by the prefilter.
|
jpayne@68
|
2458 Added quality-related window flags for Tadpole.
|
jpayne@68
|
2459 Fixed some incorrect assertions regarding degenerate bases in BBNorm.
|
jpayne@68
|
2460 Revised Tadpole error-correction defaults to be more aggressive.
|
jpayne@68
|
2461 Included quality value in Tadpole error detection.
|
jpayne@68
|
2462 Accelerated Tadpole reassemble count regeneration.
|
jpayne@68
|
2463 Accelerated Tadpole initial error correction count filling, leading to a 50% speedup.
|
jpayne@68
|
2464 Added a missing consensus flag to Clumpify.
|
jpayne@68
|
2465 readlength.sh now defaults to nzo=t.
|
jpayne@68
|
2466 Added a default adapter list to BBMerge, usable with the adapters=default flag.
|
jpayne@68
|
2467 Added probabilityErrorFree() to Read class.
|
jpayne@68
|
2468 Added CalcUniqueness columns for average quality and probablity of error free reads.
|
jpayne@68
|
2469 Added fixspikes flag to CalcUniqueness to enforce monotonicity.
|
jpayne@68
|
2470 Wrote AnalyzeFlowCell and filterbytile.sh to remove low-quilty reads using positional information.
|
jpayne@68
|
2471 Added minprob flag to LogLog.
|
jpayne@68
|
2472 Fixed a bug in FilterByName in which the parse order prevented minlen from being applied.
|
jpayne@68
|
2473 Added random mode to Shred. Requested by Shijie.
|
jpayne@68
|
2474 Fixed pileup ignoring nzo flag.
|
jpayne@68
|
2475 36.31
|
jpayne@68
|
2476 Clumpify now uses low compression for creation of temp files.
|
jpayne@68
|
2477 KmerSort now uses all threads for compression instead of half.
|
jpayne@68
|
2478 Clumpify now supports ecco flag.
|
jpayne@68
|
2479 Added qfout support to BBDuk, but only for primary output stream.
|
jpayne@68
|
2480 Added oneline format to FileFormat.
|
jpayne@68
|
2481 Added oneline output support to most tools using the .oneline extension.
|
jpayne@68
|
2482 Added bisulfite flag to RQCFilter.
|
jpayne@68
|
2483 Modified TadPipe to use adapters=default with BBMerge.
|
jpayne@68
|
2484 Modified TadPipe to use Clumpify for speed.
|
jpayne@68
|
2485 36.32
|
jpayne@68
|
2486 Tools other than BBMerge using the ecco flag now default to using a static adapter list.
|
jpayne@68
|
2487 Added contamination references to public distribution.
|
jpayne@68
|
2488 Fixed issue with Ns in error-correction mode in BBNorm.
|
jpayne@68
|
2489 Added trimming option to filterbytile.
|
jpayne@68
|
2490 Sketch now uses ByteStreamWriter instead of TextStreamWriter.
|
jpayne@68
|
2491 Fixed a bug in reporting the number of sketches creater by SketchMaker.
|
jpayne@68
|
2492 Changed hash function of Clumpify and Sketch.
|
jpayne@68
|
2493 Clumpify now reports the number of clumps. Interestingly, unhashed kmers result in fewer clumps and better compression.
|
jpayne@68
|
2494 36.33
|
jpayne@68
|
2495 Finished Clumpify pivot-split function.
|
jpayne@68
|
2496 Added Clump.toStringStaggered().
|
jpayne@68
|
2497 Improved Clumpify consensus generation.
|
jpayne@68
|
2498 Fixed a crash for Tadpole error correcting fasta reads.
|
jpayne@68
|
2499 Changed Clumpify comparator to reverse the order of sorted clumps.
|
jpayne@68
|
2500 Implemented preliminary error correction in Clumpify. It works well but requires multiple passes.
|
jpayne@68
|
2501 Added Clumpify bloom filter and fixed mincount functionality.
|
jpayne@68
|
2502 Added Clumpify minprob for pivots. Does not help much.
|
jpayne@68
|
2503 Added Clumpify border for restricting pivots to near the middle of reads. Very slight increase in clumps, decent increase in error correction.
|
jpayne@68
|
2504 Tested and noted that 0 hashes increases correction efficiency in the first pass.
|
jpayne@68
|
2505 Accelerated in-memory multipass error-correction.
|
jpayne@68
|
2506 Rewrote paired read name test to be faster.
|
jpayne@68
|
2507 Added unpair and repair options to Clumpify. Repair currently only works with 1 group.
|
jpayne@68
|
2508 Fixed a bug in KCountArray hashing when threads=1.
|
jpayne@68
|
2509 Added Clumpify multipass depth-filter regeneration.
|
jpayne@68
|
2510 Clumpify can now retain pairing and clumping for free in single-group mode, with unpair and repair flags.
|
jpayne@68
|
2511 Wrote SortByName and sortbyname.sh, an out-of-memory sort program.
|
jpayne@68
|
2512 Wrote MergeSorted as a wrapper to merge multiple name-sorted files into a single name-sorted file.
|
jpayne@68
|
2513 Wrote CrisContainer.
|
jpayne@68
|
2514 Clumpify now has all functions enabled (such as restoring read pairing) when groups>1, but single-group is faster.
|
jpayne@68
|
2515 Clumpify will now autodetect the number of groups needed based on the input file size.
|
jpayne@68
|
2516 36.34
|
jpayne@68
|
2517 Added parallel sort option to Clumpify. This uses Reflection to verify whether Arrays has a parallel sort method.
|
jpayne@68
|
2518 Fixed a bug with Clumpify assigning 0 quality scores to corrected bases.
|
jpayne@68
|
2519 Parallelized clump formation.
|
jpayne@68
|
2520 clumpify.sh now loads Java 1.8 for parallel sorting.
|
jpayne@68
|
2521 Moved Clumpify read verification to worker threads for faster loading.
|
jpayne@68
|
2522 Wrote Splitter for Clump-splitting methods.
|
jpayne@68
|
2523 Updated Genepool build scripts to use Java 8.
|
jpayne@68
|
2524 Updated Genepool shellscripts to load Java 8.
|
jpayne@68
|
2525 36.35
|
jpayne@68
|
2526 Added some new parallel sort wrappers to Shared.
|
jpayne@68
|
2527 36.36
|
jpayne@68
|
2528 Fixed granularity in gc plot downsampling.
|
jpayne@68
|
2529 Changed pileup coverage variable column order. Requested by Jasmine.
|
jpayne@68
|
2530 Fixed some issues in TadPipe, and added error correction to the Clumpify phase.
|
jpayne@68
|
2531 Added several flags to Clumpify such as minci and minqi.
|
jpayne@68
|
2532 36.37
|
jpayne@68
|
2533 Fixed a sort bug.
|
jpayne@68
|
2534 36.38
|
jpayne@68
|
2535 Fixed a clumpify pivot bug.
|
jpayne@68
|
2536 36.39-36.44
|
jpayne@68
|
2537 Refactored and accelerated Clumpify.
|
jpayne@68
|
2538 36.45
|
jpayne@68
|
2539 Added sort package for holding sort programs and comparators.
|
jpayne@68
|
2540 Added verbose2 flag to SortyByName for better verbosity control.
|
jpayne@68
|
2541 Wrote KmerSort3, which does fetching asynchronously in multiple threads.
|
jpayne@68
|
2542 Made SortByName slightly more efficient.
|
jpayne@68
|
2543 Fixed a bug in which SortByName was resetting interleaved status when merging.
|
jpayne@68
|
2544 Fixed a bug in interleaved testing.
|
jpayne@68
|
2545 Fixed bug in Pileup: Bases on zero-depth contigs were being excluded from histogram.
|
jpayne@68
|
2546 Made BBMerge mix flag nonstatic.
|
jpayne@68
|
2547 36.48
|
jpayne@68
|
2548 Fixed a bug in randomreads; it was not parsing the out flag.
|
jpayne@68
|
2549 Improved FilterByTile read name parsing to support more Illumina software versions.
|
jpayne@68
|
2550 36.49
|
jpayne@68
|
2551 RQCFilter now writes intermediate files in ASCII-33.
|
jpayne@68
|
2552 Read.failsBarcode now should works properly for old Illumina reads with no barcodes.
|
jpayne@68
|
2553 Added Clumpify minratioqmult flag and quality-sensitive error ratio formula.
|
jpayne@68
|
2554 Clumpify error correction uses a more conservative minidentity on the first pass.
|
jpayne@68
|
2555 Fixed an RQCFilter bug in which a log file (KmerStats2) was not being generated.
|
jpayne@68
|
2556 36.50
|
jpayne@68
|
2557 Clumpify now stores per-read pivot information in new ReadKey class instead of a long[].
|
jpayne@68
|
2558 Clumpify sorting now takes strand into account, which slightly increases compression (~1% for paired reads).
|
jpayne@68
|
2559 BBTools no longer use multithreaded sorting when threads are set to 1.
|
jpayne@68
|
2560 Clumps are now hashable and comparable.
|
jpayne@68
|
2561 Clumpify can now temporarily merge reads for pivot calculation with the flag mergefirst. This increases compression slightly but gives odd results for correction.
|
jpayne@68
|
2562 Wrote SummarizeQuast for combining Quast reports into a box plot.
|
jpayne@68
|
2563 Added BBMerge option to not change quality scores.
|
jpayne@68
|
2564 Clumpify now uses more groups by default, to prevent running out of memory with highly unbalanced groups.
|
jpayne@68
|
2565 36.51
|
jpayne@68
|
2566 Added ordered flag to RQCFilter.
|
jpayne@68
|
2567 BBMerge now tracks and displays the number of errors corrected in ecco mode.
|
jpayne@68
|
2568 Tweaked clump.Splitter to better handle polymorphism, reducing chimeric corrections.
|
jpayne@68
|
2569 KmerComparator now resolves ties using read name.
|
jpayne@68
|
2570 Wrote Consect, a tool for making a consensus of multiple error-correction tools.
|
jpayne@68
|
2571 Wrote consect.sh.
|
jpayne@68
|
2572 Added Clumpify allele correlation calculator to Splitter.
|
jpayne@68
|
2573 Finished Clumpify biallelic split function.
|
jpayne@68
|
2574 Clumpify can now sort clumps for increased compression with the resort flag.
|
jpayne@68
|
2575 Clumpify can also sort clumps of paired reads for even more compression with the resortpaired flag.
|
jpayne@68
|
2576 Added support for pigz level 11 compression.
|
jpayne@68
|
2577 Clumpify now accepts the flag changequality (cq) and defaults to false.
|
jpayne@68
|
2578 RQCFilter now uses zl6 for intermediate and chaff files and zl8 at the end.
|
jpayne@68
|
2579 Removed obsolete code for tracking of validReadsWritten and validBasesWritten.
|
jpayne@68
|
2580 Wrote LoadReads and loadreads.sh to test the predicted and actual memory usage of compressed files.
|
jpayne@68
|
2581 Revised some constants to improve memory usage prediction.
|
jpayne@68
|
2582 Improved KillSwitch memory-kill functionality, and added more protected memory allocation points in read streams.
|
jpayne@68
|
2583 Clumpify now takes into account per-read and per-Clump overhead when predicting memory usage.
|
jpayne@68
|
2584 All calls to Arrays.copyOfRange now go through KillSwitch.
|
jpayne@68
|
2585 Many calls to Arrays.copyOf now go through KillSwitch.
|
jpayne@68
|
2586 36.52
|
jpayne@68
|
2587 Added Reformat input file tests.
|
jpayne@68
|
2588 Reformat now tests qual files as well.
|
jpayne@68
|
2589 Wrote StreamToOutput; when a Clumpify group is too big, it can stream the group to the output file(s) with no processing.
|
jpayne@68
|
2590 KmerSort3 now tests group size and streams overly-large groups to output.
|
jpayne@68
|
2591 KmerSplit and KmerSort3 now track memory bytes read to help estimate memory requirements.
|
jpayne@68
|
2592 Calls to Sort are now wrapped by KillSwitch because they use additional memory.
|
jpayne@68
|
2593 Fixed a bug in which SortByName did not work for external sorts of empty files.
|
jpayne@68
|
2594 Fixed a bug in quality-format detection in which N with quality 2 was not correctly being flagged as ASCII-64.
|
jpayne@68
|
2595 Improved some quality-format warning messages.
|
jpayne@68
|
2596 Clumpify now parses quality flags directly rather than passing them down.
|
jpayne@68
|
2597 KmerCountExact and Tadpole now support a GC column in the kmer frequency histogram.
|
jpayne@68
|
2598 36.53
|
jpayne@68
|
2599 Clumpify error-correction is now done in conservative mode for the first half of passes.
|
jpayne@68
|
2600 Fixed a bug with indels in CalcTrueQuality.
|
jpayne@68
|
2601 36.54
|
jpayne@68
|
2602 Fixed MDTag parsing, which was completely broken with indels.
|
jpayne@68
|
2603 36.55
|
jpayne@68
|
2604 Wrote new half-open Var class for masking heterozygous variants when recalibrating quality scores.
|
jpayne@68
|
2605 Wrote CallVariants for use with the new Var class.
|
jpayne@68
|
2606 Added ByteBuilder.append(ByteBuilder), which was missing.
|
jpayne@68
|
2607 Wrote SamStreamer, which doubles the speed of reading sam files.
|
jpayne@68
|
2608 Wrote SamLineStreamer for streaming SamLines rather than Reads.
|
jpayne@68
|
2609 CallVariants now calculates coverage.
|
jpayne@68
|
2610 Added SamLineStreamer support to Pileup, which doubles the speed.
|
jpayne@68
|
2611 Added SamStreamer support to CalcTrueQuality, which increases the speed moderately.
|
jpayne@68
|
2612 Added SamLine.PARSE_OPTIONAL_MD_ONLY flag to accelerate parsing of sam lines.
|
jpayne@68
|
2613 SamLineStreamer and SamReadStreamer are now subclasses of SamStreamer.
|
jpayne@68
|
2614 36.56
|
jpayne@68
|
2615 Fixed failure to remove bam processes from the process table.
|
jpayne@68
|
2616 Added ByteFile.pushBack() to make it easier to process sam headers in a seperate function.
|
jpayne@68
|
2617 Disabled some assertions in FastaReadInputStream that fired in race conditions.
|
jpayne@68
|
2618 36.57
|
jpayne@68
|
2619 Cleared some additional static fields after BBMap/BBSplit termination. These interfered with RQCFilter.
|
jpayne@68
|
2620 Fixed some invalid assertions and masked an exception in FastaReadInputStream; these are harmless and due to a known race condition.
|
jpayne@68
|
2621 36.58
|
jpayne@68
|
2622 Slightly improved Clumpify allele-pair selection by factoring in distance.
|
jpayne@68
|
2623 Fixed a crash due to unexpected whitespace in RQCFilter.
|
jpayne@68
|
2624 Added coverage target and metagenome mode to RandomReads.
|
jpayne@68
|
2625 36.59
|
jpayne@68
|
2626 Adjusted RandomReads coverage flag for paired reads.
|
jpayne@68
|
2627 36.60
|
jpayne@68
|
2628 Fixed BBMap issue in which secondary alignments of read 2 would not have their names changed to match read 1.
|
jpayne@68
|
2629 Added addcolon flag to Reformat and RandomReads, to add 1: and 2: to read names.
|
jpayne@68
|
2630 SplitPairsAndSingles (repair.sh) now identifies pair numbers from sam files.
|
jpayne@68
|
2631 Moved Shuffle to sort package.
|
jpayne@68
|
2632 Added ReadQualityComparator.
|
jpayne@68
|
2633 Added setAscending to some read comparators.
|
jpayne@68
|
2634 SortByName can now sort ascending or descending, by name, length, or quality.
|
jpayne@68
|
2635 Rewrote the calctruequality.sh usage information.
|
jpayne@68
|
2636 Added qap matrix to CalcTrueQuality. Works pretty well for 1-pass.
|
jpayne@68
|
2637 Moved quality score quantization to Quantizer.
|
jpayne@68
|
2638 Changed Quantizer to never assign 0 to non-zero scores.
|
jpayne@68
|
2639 Added quantization option to Clumpify.
|
jpayne@68
|
2640 Changed the default quant matrix to match current NexSeq bins.
|
jpayne@68
|
2641 Added slash option for Quantizer; e.g. quantize=/2 will quantize to even numbers only.
|
jpayne@68
|
2642 Fixed a bug in RandomReads - errors added from quality scores were biased toward certain bases and had a 33% chance of remaining as the original base.
|
jpayne@68
|
2643 Added sticky quality score quantization. Slightly increases compression.
|
jpayne@68
|
2644 Wrote Var functions for quality scores based on various metrics.
|
jpayne@68
|
2645 Added CallVariants variant filtering flags.
|
jpayne@68
|
2646 Increased default number of compression threads to all available threads.
|
jpayne@68
|
2647 Q0 and Q1 error probabilties are now assigned fixed values of 0.75 and 0.7.
|
jpayne@68
|
2648 Quality score position histogram now better reflects no-calls.
|
jpayne@68
|
2649 Variant-calling is now integrated into CalcTrueQuality (for substitutions).
|
jpayne@68
|
2650 BBDuk can now accept a varfile and ignore those variants when making quality-related histograms from sam files.
|
jpayne@68
|
2651 CalcTrueQuality can now use a varfile also.
|
jpayne@68
|
2652 Increased default filter thresholds for CallVariants.
|
jpayne@68
|
2653 36.61
|
jpayne@68
|
2654 Added VCF input support.
|
jpayne@68
|
2655 Added VCF output support.
|
jpayne@68
|
2656 Wrote ScafMap for storing scaffold information.
|
jpayne@68
|
2657 Wrote VarMap for concurrent variation processing.
|
jpayne@68
|
2658 Wrote VarFilter for variant filtering.
|
jpayne@68
|
2659 Accelerated variant calling and filtering.
|
jpayne@68
|
2660 Made coverage a mandatory column for variants.
|
jpayne@68
|
2661 Ploidy and pairingRate are now required header lines for variant files.
|
jpayne@68
|
2662 Modified Clumpify to support sam headers. However, it is unusable because Clumpify needs the obj field.
|
jpayne@68
|
2663 Added positional sorting and sam support to SortByName. May not correspond to sam recommendations for pairs. Needs to alter sam header to indicate sorted.
|
jpayne@68
|
2664 ScafMap and CallVariants can now load a fasta reference.
|
jpayne@68
|
2665 VarMap variant filtering is now multithreaded.
|
jpayne@68
|
2666 Added identity tracking to Var.
|
jpayne@68
|
2667 Improved functionality and correctness of Read.identitySkewed.
|
jpayne@68
|
2668 Fixed a parse error in TaxFilter.
|
jpayne@68
|
2669 36.62
|
jpayne@68
|
2670 Deleted CalcTrueQuality_single.
|
jpayne@68
|
2671 Eliminated a difference between sam parsing via CallVariants and CalcTrueQuality.
|
jpayne@68
|
2672 Increased CalcTrueQuality default memory allocation to handle variant calling.
|
jpayne@68
|
2673 Implemented trimming for mapped reads with match strings, for CallVariants.
|
jpayne@68
|
2674 CallVariants now has a border flag, default 5.
|
jpayne@68
|
2675 Reformat can now trim sam files, though the optional fields may become incorrect.
|
jpayne@68
|
2676 36.63
|
jpayne@68
|
2677 Moved sam streamers to stream package.
|
jpayne@68
|
2678 Wrote SamStreamerWrapper for fast sam -> fastq conversion.
|
jpayne@68
|
2679 Fixed compression level getting reset in RQCFilter.
|
jpayne@68
|
2680 Changed max compression threads for ziplevel 6 from 16 to 24.
|
jpayne@68
|
2681 Fixed RQCFilter failure to parse null.
|
jpayne@68
|
2682 Added clumpify option to RQCFilter.
|
jpayne@68
|
2683 Changed default length sort to descending (except for SortByName which defaults to ascending).
|
jpayne@68
|
2684 Reformat can now change cigar strings to 1.4 format.
|
jpayne@68
|
2685 Improved Reformat sam=1.3 converter; it was allowing adjacent M operations.
|
jpayne@68
|
2686 Added pairedonly and unpairedonly flags to Reformat.
|
jpayne@68
|
2687 Fixed pairing count in CallVariants.
|
jpayne@68
|
2688 Added support for pigz blocksize and iterations parameters.
|
jpayne@68
|
2689 36.64
|
jpayne@68
|
2690 Added qdhist and qfhist as aliases for qchist.
|
jpayne@68
|
2691 Added an optional bloom filter to CallVariants.
|
jpayne@68
|
2692 36.65
|
jpayne@68
|
2693 Added a CallVariants scoring function for substitutions in homopolymers.
|
jpayne@68
|
2694 Variant quality score is now further penalized for being below average base quality.
|
jpayne@68
|
2695 Added some columns to variant output files.
|
jpayne@68
|
2696 36.66
|
jpayne@68
|
2697 Fixed a bug in Clumpify noted by WDC: when called (non-N) bases had quality scores of 0, Clumpify with the reorder option failed an assertion.
|
jpayne@68
|
2698 Added bzip2 and pbzip2 toggles for enabling/disabling those subprocesses.
|
jpayne@68
|
2699 36.67
|
jpayne@68
|
2700 Verified that bzip2 works correctly for either bzip2 or pbzip2.
|
jpayne@68
|
2701 bzip2 now always uses compression level 9, since lower is not faster.
|
jpayne@68
|
2702 Added allowziplevelchange flag.
|
jpayne@68
|
2703 Increased default number of pigz threads allowed per compression level.
|
jpayne@68
|
2704 Bzip compression now defaults to threads, not threads-1.
|
jpayne@68
|
2705 Fixed a bug in SamStreamer; it did not work for long headers.
|
jpayne@68
|
2706 Clumpify reorder was renaming reads improperly if reads were single-ended.
|
jpayne@68
|
2707 36.68
|
jpayne@68
|
2708 Fixed an issue in which FilterByTaxa was capping RQCFilter buffers at 4.
|
jpayne@68
|
2709 Fixed var file header bug.
|
jpayne@68
|
2710 36.69
|
jpayne@68
|
2711 Added ordered flag to Dedupe and Dedupe2.
|
jpayne@68
|
2712 Made Colors class for colored text.
|
jpayne@68
|
2713 Made changes to VCF output to increase compatibility.
|
jpayne@68
|
2714 RQCFilter now pulls output read and bases numbers from BBMerge, the final step.
|
jpayne@68
|
2715 Improved Dedupe sorting.
|
jpayne@68
|
2716 36.70
|
jpayne@68
|
2717 Modified BBMap bs option to work with both samtools 1.x and 0.x.
|
jpayne@68
|
2718 Fixed a crash in Dedupe sorting. Possibly a Java 1.8 bug since it is not clear what the problem is.
|
jpayne@68
|
2719 Wrote Realigner for realigning reads during variant calling.
|
jpayne@68
|
2720 Added fqz support to BBTools.
|
jpayne@68
|
2721 Added delimiter support to DemuxByName.
|
jpayne@68
|
2722 DemuxByName now disables pigz if the number of output streams grows beyond 8.
|
jpayne@68
|
2723 Added soft-clipping to realigner when it goes out of bounds.
|
jpayne@68
|
2724 Added uptional unclipping as well.
|
jpayne@68
|
2725 Fixed Read.calcMatchLength, which was incorrect for reads with Y match symbols.
|
jpayne@68
|
2726 Paired read percent is now tracked independently of proper pair percent.
|
jpayne@68
|
2727 36.72
|
jpayne@68
|
2728 Added TaxServer class, with help from Shijie.
|
jpayne@68
|
2729 Added JsonObject for Json formatting.
|
jpayne@68
|
2730 36.73
|
jpayne@68
|
2731 Made improvements to TaxServer.
|
jpayne@68
|
2732 Upgraded tax server to support accessions.
|
jpayne@68
|
2733 Fixed a bug in BBMap/MapPacBio related to Y symbols during realignment.
|
jpayne@68
|
2734 Started CallVariants2, capable of multisample pileup.
|
jpayne@68
|
2735 36.74
|
jpayne@68
|
2736 Finished CallVariants2.
|
jpayne@68
|
2737 36.75
|
jpayne@68
|
2738 Added input file list support to CallVariants and CallVariants2.
|
jpayne@68
|
2739 Removed a print statement from TaxServer.
|
jpayne@68
|
2740 Added trimrname flag to Reformat.
|
jpayne@68
|
2741 Added covpenalty and rarity flags to CallVariants.
|
jpayne@68
|
2742 36.76
|
jpayne@68
|
2743 Added CallVariants score and pass/fail per sample.
|
jpayne@68
|
2744 36.77
|
jpayne@68
|
2745 Wrote CallVariants guide.
|
jpayne@68
|
2746 Increased speed of samtools processing mostly unmapped bam files with CallVariants.
|
jpayne@68
|
2747 Fixed a bug in ScafMap.getScaffold handling of whitespace.
|
jpayne@68
|
2748 Assorted changes to CallVariants handling of multiple sample names.
|
jpayne@68
|
2749 36.78
|
jpayne@68
|
2750 Sketch now works with nt if the prefilter flag is used.
|
jpayne@68
|
2751 36.79
|
jpayne@68
|
2752 Added Clump.removeDuplicates.
|
jpayne@68
|
2753 Wrote class hiseq.FlowcellCoordinate.
|
jpayne@68
|
2754 Added KmerComparator compareSequence and compareQuality.
|
jpayne@68
|
2755 Added Clumpify deduplication modes.
|
jpayne@68
|
2756 Inter-cluster distance calculation may now span tiles.
|
jpayne@68
|
2757 Added keepsingletons flag to DedupeByMapping.
|
jpayne@68
|
2758 Added multipass support to Clumpify deduplication.
|
jpayne@68
|
2759 Wrote class MultiLogLog.
|
jpayne@68
|
2760 Wrote class MultiKmerCounter.
|
jpayne@68
|
2761 Wrote multiloglog.sh.
|
jpayne@68
|
2762 Fixed some text in loglog.sh.
|
jpayne@68
|
2763 Changed Read.expectedErrors to allow reads with incorrect quality values.
|
jpayne@68
|
2764 Made a sketch package superclass, SketchObject, for static fields.
|
jpayne@68
|
2765 Wrote SketchMakerMini for lightweight sketch generation.
|
jpayne@68
|
2766 CompareSketch can now accept fasta files.
|
jpayne@68
|
2767 Removed duplicate version strings from BBWrap output.
|
jpayne@68
|
2768 Changed RenameReads pair identifier from /1 to 1:.
|
jpayne@68
|
2769 Fixed logic causing DedupeByMapping to not detect reverse-complementary duplicates in ipo mode.
|
jpayne@68
|
2770 Added flowcell coordinate filtering to BBDuk.
|
jpayne@68
|
2771 Created package shared and moved some utility classes there.
|
jpayne@68
|
2772 36.80
|
jpayne@68
|
2773 CallVariants rarity flag will now automatically reduce minAlleleFraction if rarity is lower.
|
jpayne@68
|
2774 Fixed a bug in CallVariants multisample mode; samples missing a variant would get assigned the sum of all samples rather than 0.
|
jpayne@68
|
2775 Added minallelefraction to VCF header.
|
jpayne@68
|
2776 36.81
|
jpayne@68
|
2777 Fixed some VCF header issues.
|
jpayne@68
|
2778 Changed VCF QUAL column to 2 decimal places.
|
jpayne@68
|
2779 36.82
|
jpayne@68
|
2780 Fixed a parse error for allduplicates in Clumpify.
|
jpayne@68
|
2781 36.83
|
jpayne@68
|
2782 Fixed a bug in inter-tile distance calculation in Clumpify.
|
jpayne@68
|
2783 36.84
|
jpayne@68
|
2784 Fixed Reformat failure to correctly generate insert size histogram from sam files.
|
jpayne@68
|
2785 36.85
|
jpayne@68
|
2786 Fixed N contribution to error rate in qhist; it was underestimated by 50%.
|
jpayne@68
|
2787 Clumpify now supports twin files.
|
jpayne@68
|
2788 Added Histogram to the names of some histogram-statistics methods.
|
jpayne@68
|
2789 Fixed a bug in weighted-average calculation by KmerCountMulti.
|
jpayne@68
|
2790 Reduced contribution of paired score to overall variant score.
|
jpayne@68
|
2791 Reduced effect of paired score near contig ends.
|
jpayne@68
|
2792 Multisample VCF info field is now the sum of samples rather than the best sample.
|
jpayne@68
|
2793 Added bgzip support.
|
jpayne@68
|
2794 Added variant score histogram output to CallVariants.
|
jpayne@68
|
2795 Fixed variant sorting issue for insertions after subs.
|
jpayne@68
|
2796 Fixed a bug in clipTipIndels() related to variant realignment.
|
jpayne@68
|
2797 Added homopolymer score calculation for indels.
|
jpayne@68
|
2798 Wrote VCFFile.
|
jpayne@68
|
2799 Added VCFLine compare and hashing.
|
jpayne@68
|
2800 Added VCFLine String caching.
|
jpayne@68
|
2801 Wrote CompareVCF and comparevcf.sh.
|
jpayne@68
|
2802 36.86
|
jpayne@68
|
2803 Fixed a bug in Var initialization from vcf; alleles were not being canonicalized.
|
jpayne@68
|
2804 Added kill flag to TaxServer.
|
jpayne@68
|
2805 Fixed CalcTrueQuality crash with multiple sam files.
|
jpayne@68
|
2806 FlowCell now retains total read count when written to disk.
|
jpayne@68
|
2807 Added average read length tracking to CallVariants.
|
jpayne@68
|
2808 Added insertion-length allele-frequency adjustment. Substantially improves insertion calling.
|
jpayne@68
|
2809 Fixed some issues with RandomReads adding /1 to read headers that interfere with SamToRoc parsing.
|
jpayne@68
|
2810 Added contig end dist field in variant files.
|
jpayne@68
|
2811 Fixed a bug in ScafMap initialization from CompareVCF.
|
jpayne@68
|
2812 36.87
|
jpayne@68
|
2813 Fixed a Sketch bug adding a leading colon to sequence names.
|
jpayne@68
|
2814 Removed an assertion preventing sketch loading without delta compression.
|
jpayne@68
|
2815 Long insertions now reduce nearby SNPs that would be implied by misalignment.
|
jpayne@68
|
2816 Fixed major bug in Sketch with very small genomes.
|
jpayne@68
|
2817 36.88
|
jpayne@68
|
2818 TaxTree now automatically parses headers for name and accession, but only if they contain pipe symbols (rather than replacing with underscore), since underscores may be present in accessions.
|
jpayne@68
|
2819 Adjusted insertion-induced substitution revised frequency reduction down by 50%, as roughly half occur on either side.
|
jpayne@68
|
2820 Added SketchMaker taxlevel, tossjunk, and accession flags.
|
jpayne@68
|
2821 Accession processing now outputs a list of accession symbols.
|
jpayne@68
|
2822 Added ftl, ftr, ftr2 flags to RQCFilter.
|
jpayne@68
|
2823 Added support of - symbols in accession strings.
|
jpayne@68
|
2824 Tested maskmiddle for Sketch; does not change sensitivity.
|
jpayne@68
|
2825 Fixed CompareSketch output formatting.
|
jpayne@68
|
2826 Added support for uppercase taxa level names.
|
jpayne@68
|
2827 Wrote Query class for pulling data from TaxServer.
|
jpayne@68
|
2828 Fixed vcf score at a constant 2 decimal places.
|
jpayne@68
|
2829 Fixed windows-style slashes in taxpath argument.
|
jpayne@68
|
2830 Added accession file input support to SketchMaker and various taxonomy classes.
|
jpayne@68
|
2831 36.89
|
jpayne@68
|
2832 Fixed negative coverage in VCF output when coverage overflows.
|
jpayne@68
|
2833 Added suggested resolution to warning when coverage overflows.
|
jpayne@68
|
2834 Changed Clumpify spantiles default to false.
|
jpayne@68
|
2835 36.90
|
jpayne@68
|
2836 Added Tadpole mincontig=auto, which sets mincontig at max(124, 2*k).
|
jpayne@68
|
2837 Added Tadpole trimends=auto flag.
|
jpayne@68
|
2838 Fixed VCF sorting, again.
|
jpayne@68
|
2839 Fixed some VCF header bugs.
|
jpayne@68
|
2840 Reduced native var format precision to 4 decimals maximum.
|
jpayne@68
|
2841 VCF lines are now right-trimmed to a canonical representation from text.
|
jpayne@68
|
2842 Changed TaxTree to use the full taxonomic tree.
|
jpayne@68
|
2843 36.92
|
jpayne@68
|
2844 Added Sketch support to TaxServer.
|
jpayne@68
|
2845 36.93
|
jpayne@68
|
2846 Add Locale.ROOT to String formatting.
|
jpayne@68
|
2847 Fixed module being printed in bamscript output.
|
jpayne@68
|
2848 BBMap no longer prints information about ambig mode when just indexing.
|
jpayne@68
|
2849 Parallel sorting is no longer used when threads are set to 1.
|
jpayne@68
|
2850 BBDuk now automatically turns on findBestMatch in rename mode.
|
jpayne@68
|
2851 HashBuffer can now support HashArray2D and HashArrayHybrid.
|
jpayne@68
|
2852 Improved taxonomy name parsing.
|
jpayne@68
|
2853 SendSketch now sends multiple sketches in a single transaction rather than opening a new connection each time.
|
jpayne@68
|
2854 36.94
|
jpayne@68
|
2855 Revised CompareSketch to use indexed sketches.
|
jpayne@68
|
2856 Regenerated nt sketches and restarted TaxServer with better parsing of names.
|
jpayne@68
|
2857 36.95
|
jpayne@68
|
2858 Fixed typo tar -xvzf typo in documentation.
|
jpayne@68
|
2859 Wrote ProcessGC for printing gc content by interval.
|
jpayne@68
|
2860 Added BBMap nfilter.
|
jpayne@68
|
2861 36.96
|
jpayne@68
|
2862 Version bump.
|
jpayne@68
|
2863 36.97
|
jpayne@68
|
2864 Minor changes to AccessionToTaxid.
|
jpayne@68
|
2865 Changed default location of taxonomy files to a symlinked directory.
|
jpayne@68
|
2866 Updated taxonomy files.
|
jpayne@68
|
2867 36.98
|
jpayne@68
|
2868 Fixed a crash in read header parsing for flowcell coordinates.
|
jpayne@68
|
2869 Fixed some VCF header lines (FORMAT should have been FILTER).
|
jpayne@68
|
2870 Added RAF (revised allele frequency) and SB (strand bias) columns to VCF.
|
jpayne@68
|
2871 36.99
|
jpayne@68
|
2872 Clumpify now assigns short kmers to short reads or reads with too many Ns to have a legit kmer. Should reduce the issue of large clumps.
|
jpayne@68
|
2873 Reversed clump order to keep low-quality reads at the end rather than beginning of the file.
|
jpayne@68
|
2874 Added 1-based offset option for PlotGC.
|
jpayne@68
|
2875 Removed binary mode from Sketches.
|
jpayne@68
|
2876 Documented some BBMap flags.
|
jpayne@68
|
2877 Added amino acid support to Sketch.
|
jpayne@68
|
2878
|
jpayne@68
|
2879 TODO: Examine impact of duplicates for both de novo genome assembly and for single cell applications
|
jpayne@68
|
2880
|
jpayne@68
|
2881 TODO: More verbose output from FilterByTaxa in RQCFilter pipeline.
|
jpayne@68
|
2882 TODO: Call peaks from raw count, not unique kmers.
|
jpayne@68
|
2883
|
jpayne@68
|
2884 TODO: Test garbage collectors for throughput.
|
jpayne@68
|
2885 TODO: Bisect seems to not work when highest K is best...
|
jpayne@68
|
2886
|
jpayne@68
|
2887 TODO: Add taxpath to everything in tax.
|
jpayne@68
|
2888
|
jpayne@68
|
2889 TODO: minscore2 minallelefrequency2, etc. For "fail" but retain.
|
jpayne@68
|
2890 TODO: S-curves for variant scores rather than linear curves.
|
jpayne@68
|
2891 TODO: "Warning: Zero reads processed.". Possibly, tiles don't get widened with stored flowcells?
|
jpayne@68
|
2892 *TODO: ins homopolymer score always 1...
|
jpayne@68
|
2893 TODO: Tadpole extender
|
jpayne@68
|
2894
|
jpayne@68
|
2895
|
jpayne@68
|
2896 TODO:
|
jpayne@68
|
2897 *Some variants in groups are still overall fail.
|
jpayne@68
|
2898 hiseq clump
|
jpayne@68
|
2899 miseq clump
|
jpayne@68
|
2900 test assembly w/dedupe
|
jpayne@68
|
2901 server for sketch
|
jpayne@68
|
2902 compare how sketches work vs BLAST
|
jpayne@68
|
2903 Write monthly plan
|
jpayne@68
|
2904
|
jpayne@68
|
2905
|
jpayne@68
|
2906 *TODO: Add CallVariants assertion for ref allele = alt allele.
|
jpayne@68
|
2907 TODO: CompareSketch quit early in fast mode.
|
jpayne@68
|
2908 TODO: CompareSketch index by first key.
|
jpayne@68
|
2909
|
jpayne@68
|
2910 *TODO: Tadpole needs an "extra" flag
|
jpayne@68
|
2911
|
jpayne@68
|
2912 TODO: TRD flag for Reformat/BBMap - allow independent versions for fasta headers and sam rname/qname fields.
|
jpayne@68
|
2913 TODO: TranslateSixFrames silent mode (?).
|
jpayne@68
|
2914 TODO: add the date/version of the tax dump file you're using to the doc page
|
jpayne@68
|
2915
|
jpayne@68
|
2916 TODO: Generate match strings from cigar strings with no MD tag when ref is present.
|
jpayne@68
|
2917 TODO: Parse multi-allelic VCF lines.
|
jpayne@68
|
2918 TODO: Consider tracking insert size when calling variants.
|
jpayne@68
|
2919
|
jpayne@68
|
2920
|
jpayne@68
|
2921 TODO: Do quick allele-splitting on clumps where one allele is less than 90% identity to consensus, for compression.
|
jpayne@68
|
2922
|
jpayne@68
|
2923
|
jpayne@68
|
2924 TODO: BBMap output chimerically-mapped reads to a different file. Or, at least, annotate them in some way.
|
jpayne@68
|
2925
|
jpayne@68
|
2926 TODO: Make a SamHeader class.
|
jpayne@68
|
2927 TODO: Add a SamLine field to Read (?).
|
jpayne@68
|
2928 REGRESSION: CallVariants is faster under Java 7 than Java 8.
|
jpayne@68
|
2929
|
jpayne@68
|
2930 TODO: Shijiefilter with taxa and environmental
|
jpayne@68
|
2931 TODO: Streamer should support maxreads like cris does.
|
jpayne@68
|
2932
|
jpayne@68
|
2933 TODO: Investigate TadPipe misassembly rate factors.
|
jpayne@68
|
2934
|
jpayne@68
|
2935 TODO: Annotate peaks.txt with average gc content of kmers in each peak.
|
jpayne@68
|
2936
|
jpayne@68
|
2937 TODO: Have BBMap quit after X mapped reads.
|
jpayne@68
|
2938 TODO: Fix java 8 metaspace problem with something like "-XX:CompressedClassSpaceSize=64m -XX:MaxMetaspaceSize=128m".
|
jpayne@68
|
2939
|
jpayne@68
|
2940
|
jpayne@68
|
2941 TODO: BBTools homebrew
|
jpayne@68
|
2942
|
jpayne@68
|
2943 TODO: Add ability to depth-filter during multipass Clumpify ecc.
|
jpayne@68
|
2944
|
jpayne@68
|
2945 TODO: Microbe filter is printing out "chr14" and etc for scafstats.
|
jpayne@68
|
2946 TODO: Test Clumpify parameters. An empirical metric for false-positives would be handy.
|
jpayne@68
|
2947
|
jpayne@68
|
2948 TODO: Write KmerSplitSort which operates on multiple input and output files in multiple passes.
|
jpayne@68
|
2949 TODO: Consider binning reads by modulo while reading input for quicker sort and easier threading.
|
jpayne@68
|
2950 TODO: Use quality scores to determine whether to correct.
|
jpayne@68
|
2951 TODO: A high raw quality score should yield a lower corrected quality score, and vice-versa. Nice to get some data on this.
|
jpayne@68
|
2952 TODO: MPI version in which data is split across nodes instead of files.
|
jpayne@68
|
2953 TODO: Clumpify pipeline multithreading.
|
jpayne@68
|
2954
|
jpayne@68
|
2955 ****TODO: Move all instances of Parser to the end of the parsing block.
|
jpayne@68
|
2956 *TODO: Tadpole rollback if mincountcorrect is violated.
|
jpayne@68
|
2957
|
jpayne@68
|
2958
|
jpayne@68
|
2959
|
jpayne@68
|
2960 TODO: Check other places where extra flag occurs
|
jpayne@68
|
2961 ***TODO: Add ability to add ftr2=1 to RQCFilter (possibly for only reads that have been trimmed...). (Vasanth)
|
jpayne@68
|
2962
|
jpayne@68
|
2963 TODO: BBMap 500 Mbp max chrom length causes trouble with Wheat 850 Mbp chromosome.
|
jpayne@68
|
2964 ***TODO: Growable LongHeapSet (?). Useful for nt.
|
jpayne@68
|
2965 TODO: Kmer tables quality vector to determine average quality of stored kmers.
|
jpayne@68
|
2966 TODO: CompareSketch taxonomy trace, starting at hit with highest identity.
|
jpayne@68
|
2967 TODO: CompareSketch raw and adjusted identity.
|
jpayne@68
|
2968 TODO: SketchMaker min and max taxlevel.
|
jpayne@68
|
2969 TODO: Add adapter detection to RQCFilter.
|
jpayne@68
|
2970 TODO: Allow quality recalibration Q-change limits.
|
jpayne@68
|
2971 TODO: Test Tadpole assembly with recalibrated Q-scores using gentler minprob.
|
jpayne@68
|
2972 TODO: Filter out reads causing uniqueness spikes.
|
jpayne@68
|
2973 TODO: 20-mer uniqueness -> 25-mer.
|
jpayne@68
|
2974 TODO: sketch loading does not *really* need 1 object per line. 1 object per sketch is better.
|
jpayne@68
|
2975
|
jpayne@68
|
2976
|
jpayne@68
|
2977
|
jpayne@68
|
2978
|
jpayne@68
|
2979 TODO: Kmer -> OldKmer. New Kmer should be packed 64-bit.
|
jpayne@68
|
2980 TODO: New headers: LN: NM: ID: CD:DH, D33-6, etc
|
jpayne@68
|
2981 TODO: KmerCountExact - ensure it supports prefilter.
|
jpayne@68
|
2982 TODO: BBMerge adapter trimming
|
jpayne@68
|
2983
|
jpayne@68
|
2984 *TODO: Genomax question.
|
jpayne@68
|
2985
|
jpayne@68
|
2986 ***TODO: config= overrides tree file for Taxonomy.sh.
|
jpayne@68
|
2987 TODO: BBMerge should not do adapter-detection on extended reads.
|
jpayne@68
|
2988 TODO: KmerCountExact sketchname flag.
|
jpayne@68
|
2989 TODO: Ensure fastq and fasta input generate identical sketches.
|
jpayne@68
|
2990
|
jpayne@68
|
2991 TODO: Multithread sketch compare.
|
jpayne@68
|
2992 TODO: BBMerge adapter-matching based on stringency.
|
jpayne@68
|
2993 TODO: BBDuk adapter detection.
|
jpayne@68
|
2994 TODO: Outa for BBDuk.
|
jpayne@68
|
2995
|
jpayne@68
|
2996 TODO: LogLog minprob.
|
jpayne@68
|
2997 ***TODO: IDMatrix gives wrong answers in some cases (?) probably involving different starting/ending locations.
|
jpayne@68
|
2998 TODO: It gives even worse answers with JNI enabled.
|
jpayne@68
|
2999 TODO: Reformat outu stream, particularly for gc splitting.
|
jpayne@68
|
3000 TODO: BBNorm does not use extra files in second pass.
|
jpayne@68
|
3001 *TODO: LongHashSet
|
jpayne@68
|
3002 TODO: Move IntList and so forth into their own package.
|
jpayne@68
|
3003
|
jpayne@68
|
3004 *TODO: Progressively square major/minor allele ratio for low-depth error-correction up to the max ratio.
|
jpayne@68
|
3005 TODO: Filterbyname prefix, suffix, or even fixed-length substring hashing for super speed.
|
jpayne@68
|
3006
|
jpayne@68
|
3007 TODO: Kmer sketch comparison.
|
jpayne@68
|
3008 TODO: Tadpole diploid mode.
|
jpayne@68
|
3009 TODO: Crossblock remove reads under a certain length during multiplexing or normalization.
|
jpayne@68
|
3010 TODO: Increase multiplexing threads?
|
jpayne@68
|
3011 TODO: Reduce demultiplexing pigz threads?
|
jpayne@68
|
3012
|
jpayne@68
|
3013 TODO: BBDuk allows input and output files to have same name (?)
|
jpayne@68
|
3014
|
jpayne@68
|
3015 TODO: Tadpole full-pass correction
|
jpayne@68
|
3016 TODO: Tadpole count reduction check
|
jpayne@68
|
3017 TODO: De-Partition and target-size serial mode
|
jpayne@68
|
3018 TODO: Seal K>31 true support.
|
jpayne@68
|
3019
|
jpayne@68
|
3020
|
jpayne@68
|
3021 TODO: Option to redirect all BBMap all stderr output to a file.
|
jpayne@68
|
3022
|
jpayne@68
|
3023 TODO: Rewrite BBMask so that when sam files are not used, memory needs are reduced. Or, support sorted sam. Requested by Shijie.
|
jpayne@68
|
3024
|
jpayne@68
|
3025 TODO: Make Tadpole outu accept discarded contigs as well.
|
jpayne@68
|
3026 TODO: Find the best 16s copy. Then use only the ~2 copies with highest identity to that for consensus.
|
jpayne@68
|
3027
|
jpayne@68
|
3028 TODO: Use Java library unique filename generator.
|
jpayne@68
|
3029
|
jpayne@68
|
3030 TODO: More accurate LogLog with multiple hash functions (basically, xor masks).
|
jpayne@68
|
3031
|
jpayne@68
|
3032 TODO: Tadpole valid extension byte array. Return as a long to include count.
|
jpayne@68
|
3033 TODO: Write BBSplit guide.
|
jpayne@68
|
3034 TODO: BBMap's scafstats are different from Seal's stats format.
|
jpayne@68
|
3035
|
jpayne@68
|
3036 TODO: Dedupe should not hang when it runs out of memory. (Shijie).
|
jpayne@68
|
3037 TODO: crc (checkrcomp) flag for BBDuk adapter trimming.
|
jpayne@68
|
3038 TODO: Clumpify error correction.
|
jpayne@68
|
3039
|
jpayne@68
|
3040 TODO: Standard way of determining whether a program crashed, or finished, or logging.
|
jpayne@68
|
3041 TODO: Tadpole read exclusion for reads that won't assemble or have bad kmers.
|
jpayne@68
|
3042
|
jpayne@68
|
3043 V35.
|
jpayne@68
|
3044 35.00
|
jpayne@68
|
3045 Changed Gene.toChromosome to return an int rather than a byte.
|
jpayne@68
|
3046 Changed gitable.int2d name to gitable.int1d since it is a 1D array.
|
jpayne@68
|
3047 Added taxa support for ArrayListSet.
|
jpayne@68
|
3048 Added % support for output in Reformat. Requested by Alex Copeland.
|
jpayne@68
|
3049 Added gitable.sh script for generating gitable.int1d.gz tax translator.
|
jpayne@68
|
3050 35.01
|
jpayne@68
|
3051 Fixed BBDuk crash when K>31 and stats output was enabled. Noted by Alex Spunde.
|
jpayne@68
|
3052 Fixed repair.sh failure on fint flag.
|
jpayne@68
|
3053 Fixed SplitPairsAndSingles not working on interleaved input anymore.
|
jpayne@68
|
3054 Split Tadpole's mincount flag into mincountseed and mincountextend (mcs and mce).
|
jpayne@68
|
3055 Added rcomp flag to BBMap. Requested by Bryce Foster.
|
jpayne@68
|
3056 Added merge flag to KmerCountExact.
|
jpayne@68
|
3057 35.02
|
jpayne@68
|
3058 maq flag now accepts 2 arguments: maq=Q,B. If second argument is specified, only the initial B bases will be used to calculate the quality.
|
jpayne@68
|
3059 Added minprob and maq flags to Tadpole and KmerCountExact.
|
jpayne@68
|
3060 Fixed memory detection in calcmem.sh not working when ulimit=unlimited. Thanks for the debugging help from Jason S!
|
jpayne@68
|
3061 Added some getters to KmerForest and KmerNode.
|
jpayne@68
|
3062 Enabled Tadpole kmer harvesting from victim buffer.
|
jpayne@68
|
3063 Greatly accelerated Tadpole by allowing threads to compete for tables, rather than using fixed allocation.
|
jpayne@68
|
3064 Accelerated Tadpole by increasing default number of tables per thread.
|
jpayne@68
|
3065 35.03
|
jpayne@68
|
3066 Wrote Shred and shred.sh.
|
jpayne@68
|
3067 BBMap can now output mapping stats to a file with the statsfile= flag. Requested by Vasanth.
|
jpayne@68
|
3068 35.04
|
jpayne@68
|
3069 Integrated extension into BBMerge (extend= flag).
|
jpayne@68
|
3070 BBNorm now does ecc after deciding whether to discard a read, not before.
|
jpayne@68
|
3071 35.05
|
jpayne@68
|
3072 Fixed FilterByCoverage ignoring minCoverage if pre-normalization covstats not given.
|
jpayne@68
|
3073 35.06
|
jpayne@68
|
3074 Added BBMap lengthtag. Requested by Esther Singer.
|
jpayne@68
|
3075 35.07
|
jpayne@68
|
3076 Fixed rbb flag in BBNorm not working (conflated with parser flag).
|
jpayne@68
|
3077 Integrated a shellscript modification that allows shellscripts to be symlinked and still find the correct classpath. Thanks Elmar Pruess!
|
jpayne@68
|
3078 35.08
|
jpayne@68
|
3079 Fixed rcompmate flag; it was triggering an assertion error.
|
jpayne@68
|
3080 Added BBMap showprogress2 flag.
|
jpayne@68
|
3081 Got rid of ReadInputStream.preferBlocks and associated methods.
|
jpayne@68
|
3082 Simplified how Reformat works with in1 vs ffin1, and sam files.
|
jpayne@68
|
3083 Fixed bug in which Reformat was dropping header lines. Reported by Gloria F.
|
jpayne@68
|
3084 Fixed bug in BBMergeOverlapper pfilter for reads of different length.
|
jpayne@68
|
3085 Fixed bug in BBMergeOverlapper for reads of different length.
|
jpayne@68
|
3086 35.09
|
jpayne@68
|
3087 Removed BBMerge hist2 and hist3 which were redundant; added showhiststats flag.
|
jpayne@68
|
3088 Added BBMerge prealloc and prefilter flags.
|
jpayne@68
|
3089 Removed some old BBMerge functionality (outinsert, perfectonly, etc).
|
jpayne@68
|
3090 Changed extend to extend1 and extend2.
|
jpayne@68
|
3091 Completely rewrote BBMerge's code path to break it into small modular functions.
|
jpayne@68
|
3092 Memory allocation exceptions in HashArray are now handled gracefully.
|
jpayne@68
|
3093 BBMerge now uses tadpole for kfilter.
|
jpayne@68
|
3094 BBMerge can now extend before or after merge attempts.
|
jpayne@68
|
3095 Tadpole can now do error correction via pincer.
|
jpayne@68
|
3096 Tadpole can now do error correction via tail also.
|
jpayne@68
|
3097 Added genome size esimation to KmerCountExact (via CallPeaks). This will be printed in the peaks output header.
|
jpayne@68
|
3098 Fixed BBMap slowdown caused by rescue in LMP libraries. Thanks Marc Strous and Xiaoli Dong (Metawatt team) for helping me track it down!
|
jpayne@68
|
3099 35.10
|
jpayne@68
|
3100 Removed a debugging line from Tadpole that made it creash when extending reads.
|
jpayne@68
|
3101 35.11
|
jpayne@68
|
3102 Removed a debug assertion from SamReadInputStream. Found by Kurt LaButti.
|
jpayne@68
|
3103 Improved descriptions in kmercountexact.sh.
|
jpayne@68
|
3104 Centralized memory statistics printing in Shared.
|
jpayne@68
|
3105 Separated Tadpoles load phase (KmerTableSet) from build phase (Tadpole).
|
jpayne@68
|
3106 Added catch blocks for memory exceptions when reading objects from disk.
|
jpayne@68
|
3107 Added catch blocks for memory exceptions when indexing.
|
jpayne@68
|
3108 Added catch blocks for memory exceptions in ChromosomeArrays and CoverageArrays.
|
jpayne@68
|
3109 RandomReads now correctly outputs names in fasta format.
|
jpayne@68
|
3110 RandomReads now has simple names without custom BBMap coordinates.
|
jpayne@68
|
3111 KmerCountExact now uses KmerTableSet.
|
jpayne@68
|
3112 Parsing is more robust for Tadpole, KmerCountExact, and KmerTableSet.
|
jpayne@68
|
3113 Coverage estimate (based on first peak) now in KmerCountExact. Requested by Vasanth.
|
jpayne@68
|
3114 Added ihist PercentOfPairs header line.
|
jpayne@68
|
3115 Added trim support to KmerTableSet.
|
jpayne@68
|
3116 Added triangle filter for smoothing histograms before peak calling. Vastly improves result. Requested by Alex Copeland.
|
jpayne@68
|
3117 35.12
|
jpayne@68
|
3118 Updated shellscripts to have consistent formatting, and fixed various typos.
|
jpayne@68
|
3119 Reimplemented outinsert for BBMerge. Requested by Matt Nolan.
|
jpayne@68
|
3120 35.13
|
jpayne@68
|
3121 Wrote Tadpole.explore.
|
jpayne@68
|
3122 Removed debugging parameters (rid, pos) from Tadpole/KmerTableSet ownership functions.
|
jpayne@68
|
3123 Fixed massive performance problem in KmerArray - victim buffer was being searched for nonexistent kmers.
|
jpayne@68
|
3124 Wrote function to clear and regenerate tables after shaving.
|
jpayne@68
|
3125 Shaving now seems to work correctly.
|
jpayne@68
|
3126 Reduced mcs (minclustersize) in Dedupe from 2 to 1, to match the documentation.
|
jpayne@68
|
3127 Added Shaver bubble-removal and improved statistics tracking.
|
jpayne@68
|
3128 Added Tadpole markBadBases (mbb) flag for turning bases covered by low-count kmers into N.
|
jpayne@68
|
3129 Added Tadpole mode=correct/ecc for correction without extension.
|
jpayne@68
|
3130 Tadpole now uses in/out when in extend/ecc mode and ine/oute are not specified.
|
jpayne@68
|
3131 35.14
|
jpayne@68
|
3132 Added iterative seeding with decreasing depth to Tadpole, via contigpasses and contigpassmult flags.
|
jpayne@68
|
3133 Added Tadpole mdo (markdeltaonly) flag; default true.
|
jpayne@68
|
3134 Tadpole can now do error marking (mbb) without error correction (ecc).
|
jpayne@68
|
3135 Tadpole ownership is now automatic.
|
jpayne@68
|
3136 Added driver.FilterLines and filterlines.sh for filtering text lines.
|
jpayne@68
|
3137 Verified that an issue with transcriptome mapping is due to an incorrect transcriptome rather than a bug.
|
jpayne@68
|
3138 Fixed bug in which BBMap subfilter passed sites with no cigar string. Noted by lankage (SeqAnswers).
|
jpayne@68
|
3139 mapq of reads with primary site filtered out is now very low (under 4).
|
jpayne@68
|
3140 BBDuk can now stop after X outm or outu bases. Requested by R. Westerman.
|
jpayne@68
|
3141 35.15
|
jpayne@68
|
3142 Fixed minor bug in Seal in which unmatched reads were not being incremented, causing unmatched read rate to be displayed as 0.
|
jpayne@68
|
3143 Fixed a bug in parseKMG for decimal values.
|
jpayne@68
|
3144 BBMerge now supports error-correction with Tadpole.
|
jpayne@68
|
3145 BBMerge now supports iterative extension.
|
jpayne@68
|
3146 BBMerge will now always output the original read sequence for reads that don't get merged, rather than the extended or error-corrected sequence.
|
jpayne@68
|
3147 BBMerge minor output formatting bugs fixed.
|
jpayne@68
|
3148 BBMerge now calls Tadpole rather than Tadpole_old.
|
jpayne@68
|
3149 Wrote a shellscript for TaxTree.
|
jpayne@68
|
3150 Wrote Postfilter and postfilter.sh, a wrapper for BBMap and FilterByCoverage to postfilter SPAdes assemblies.
|
jpayne@68
|
3151 35.16
|
jpayne@68
|
3152 Fixed a bug in FilterByCoverage that was filtering everything if cov0 was not specified.
|
jpayne@68
|
3153 Found and fixed some small bugs in Tadpole, such as not add the very last base of a contig.
|
jpayne@68
|
3154 Fixed non-determinism in Tadpole by looking for hidden back branches and using extension return codes.
|
jpayne@68
|
3155 Created ukmer package and Tadpole2, which supports unlimited kmer lengths. Tadpole2 will be automatically called by Tadpole if K>31.
|
jpayne@68
|
3156 35.17
|
jpayne@68
|
3157 Made Tadpole an abstract superclass of Tadpole1 and Tadpole2, with massive reduction in duplicate code.
|
jpayne@68
|
3158 BBMerge now supports unlimited kmer lengths.
|
jpayne@68
|
3159 Made AbstractKmerTableSet superclass of KmerTableSet and KmerTableSetU.
|
jpayne@68
|
3160 KmerCountExact now supports unlimited kmer lengths.
|
jpayne@68
|
3161 35.18
|
jpayne@68
|
3162 Made Shaver and abstract superclass for Shaver1 and Shaver2.
|
jpayne@68
|
3163 KmerCountExact now supports shave and rinse operations.
|
jpayne@68
|
3164 Prefilter now works optimally with K>31, thanks to new hash routine.
|
jpayne@68
|
3165 Fixed KmerCountExact not writing peaks without khist set.
|
jpayne@68
|
3166 35.19
|
jpayne@68
|
3167 Fixed a crash in read extension with K>31.
|
jpayne@68
|
3168 35.20
|
jpayne@68
|
3169 Re-added lines for unmerged read to BBMerge outinsert stream. Requested by Matt Nolan.
|
jpayne@68
|
3170 Added some new header lines to KmerCountExact peaks output - ploidy, het rate, repeat content, etc.
|
jpayne@68
|
3171 35.21
|
jpayne@68
|
3172 Fixed Tadpole contig coverage estimation.
|
jpayne@68
|
3173 Added Tadpole mincoverage flag.
|
jpayne@68
|
3174 Fixed crash in ReadWrite when attempting to create filenames containing the pipe symbol.
|
jpayne@68
|
3175 Fixed an invalid assertion in HashArrayHybrid resize().
|
jpayne@68
|
3176 Added IntList.contains().
|
jpayne@68
|
3177 Fixed a tricky bug with Seal qhdist not looking for matches with substitutions if it first found a match without substitutions. Noted by sdriscoll.
|
jpayne@68
|
3178 Dedupe for some reason had interleaved name detection disabled. This is now enabled. Noted by Bede Constantinides.
|
jpayne@68
|
3179 Fixed new BBMerge crash bug with outinsert. Noted by Matt Nolan.
|
jpayne@68
|
3180 35.22
|
jpayne@68
|
3181 Added truncateheadersymbol flag to filterbyname.
|
jpayne@68
|
3182 Added some postfilter flags with defaults optimized to increase speed.
|
jpayne@68
|
3183 Fixed Seal qhdist again; I had forgotten to sort a list. Noted by sdriscoll.
|
jpayne@68
|
3184 Enabled pigz by default in BBNorm.
|
jpayne@68
|
3185 Made some improvements to peak-calling.
|
jpayne@68
|
3186 35.23
|
jpayne@68
|
3187 Fixed a Tadpole2 assertion error when error-correcting with K>31 and variable-length reads.
|
jpayne@68
|
3188 Added minconsecutivebases flag to Reformat/BBDuk.
|
jpayne@68
|
3189 Added locking and lock testing to HashBuffer; unclear whether speed increased.
|
jpayne@68
|
3190 35.24
|
jpayne@68
|
3191 Added BBDuk maskfullycovered flag.
|
jpayne@68
|
3192 Added SummarizeSealStats ignoresametaxa and ignoresamebarcode flags.
|
jpayne@68
|
3193 Wrote ReduceSilva and reducesilva.sh for shrinking Silva database by removing entries with redundant taxonomy.
|
jpayne@68
|
3194 35.25
|
jpayne@68
|
3195 Added SummarizeSealStats ignoresamelocation flag.
|
jpayne@68
|
3196 35.26
|
jpayne@68
|
3197 Fixed ignoresamelocation pulling from incorrect field.
|
jpayne@68
|
3198 Documented mlf flag. Requested by Alex Copeland.
|
jpayne@68
|
3199 Added kmg support to minlength and maxlength. Requested by Bill A.
|
jpayne@68
|
3200 Fixed a bug in BBMap when handling subfilter on multimapped reads. Noted by vout.
|
jpayne@68
|
3201 Improved BBMap fixXY() function.
|
jpayne@68
|
3202 Fixed major bug in BBMerge; outu read2 was reverse-complemented.
|
jpayne@68
|
3203 Added a function to soft-clip reads with a long indel anchored by very few bases.
|
jpayne@68
|
3204 Added ftm, ftl, ftr, ftr2 flags to BBMerge.
|
jpayne@68
|
3205 Added qtrim2 flag to BBMerge (trim on overlap failure).
|
jpayne@68
|
3206 Fixed implementation of shave and rinse to properly handle backward branches.
|
jpayne@68
|
3207 Fixed shave mindepth at 1 instead of variable.
|
jpayne@68
|
3208 35.27
|
jpayne@68
|
3209 Fixed a crash bug in BBMap tip indel clipping during fast mode.
|
jpayne@68
|
3210 35.28
|
jpayne@68
|
3211 Checksites now verifies correct site ordering.
|
jpayne@68
|
3212 ensureMatchStringOnPrimary now ensures correct ss ordering if it changes anything.
|
jpayne@68
|
3213 ensureMatchStringsOnSiteScores now ensures correct ss ordering if it changes anything.
|
jpayne@68
|
3214 These changes resolved an assertion crash bug.
|
jpayne@68
|
3215 35.29
|
jpayne@68
|
3216 Fixed an assertion bug in tip indel clipping. Noted by Bryce Foster.
|
jpayne@68
|
3217 Fixed a couple places where clipped bases were not counted when calculating match position.
|
jpayne@68
|
3218 Fixed another bug with fast match strings related to clipping tip indels.
|
jpayne@68
|
3219 35.30
|
jpayne@68
|
3220 Fixed another bug related to clipping tip indels and resorting. Noted by Bryce Foster and Matt Nolan.
|
jpayne@68
|
3221 Clipped tip indels are now replaced with matches or mismatches.
|
jpayne@68
|
3222 Fixed a missing else in SamLine.
|
jpayne@68
|
3223 35.31
|
jpayne@68
|
3224 Fixed a bug in toLocalAlignment() when a read has zero matches to reference.
|
jpayne@68
|
3225 35.32
|
jpayne@68
|
3226 Fixed an instance where alignments exceeding window size yielded ss with inconsistent lengths.
|
jpayne@68
|
3227 Improved calculation of amount of padding needed when alignments exceed window.
|
jpayne@68
|
3228 35.33
|
jpayne@68
|
3229 Removed kmersamplerate and readsamplerate from bloom package to simplify code.
|
jpayne@68
|
3230 Corrected handling of minprob in bloom package.
|
jpayne@68
|
3231 Added Tadpole minprobprefilter and minprobmain flags.
|
jpayne@68
|
3232 KmerCountExact now disables minProbMain when prefilter is enabled.
|
jpayne@68
|
3233 Tadpole can now do multipass prefiltering.
|
jpayne@68
|
3234 Tadpole can prefilter for an automatic number of passes.
|
jpayne@68
|
3235 Tadpole now supports 1-bit final-pass prefiltering.
|
jpayne@68
|
3236 Fixed a bug in fixLimitsXY() - only Y needs adjustment, not X.
|
jpayne@68
|
3237 Fixed a bug in generateMatchString in which sorting was not redone when results changed.
|
jpayne@68
|
3238 Added functionality to Bloom prefilter to allow arbitrary cutoffs, rather than just using the filter's max value.
|
jpayne@68
|
3239 35.34
|
jpayne@68
|
3240 Fixed a compile error due to Bloom filter changes.
|
jpayne@68
|
3241 35.35
|
jpayne@68
|
3242 Added MergeBigelow for combining custom Bigelow text files.
|
jpayne@68
|
3243 Updated Shred to add equal flag, to shred reads into equal lengths rather than a fixed length.
|
jpayne@68
|
3244 Tracked down a few bugs regarding ss score-setting order.
|
jpayne@68
|
3245 Temporarily disabled CHECKSITES and SiteScore.setScore() assertions, which are mainly of interest for efficiency, not correctness.
|
jpayne@68
|
3246 35.36
|
jpayne@68
|
3247 Added mlf flag to BBDuk.
|
jpayne@68
|
3248 Fixed qhdist crash with values over 1 (params were reversed).
|
jpayne@68
|
3249 Stripped qfin/qfout support from rqcfilter since nobody will ever use it.
|
jpayne@68
|
3250 Made files of the common kmers found in ribosomes (/global/projectb/sandbox/gaag/bbtools/ribo) using ReduceSilva, Dedupe, and KmerCountExact.
|
jpayne@68
|
3251 Added ribo filtering to RQCFilter.
|
jpayne@68
|
3252 35.37
|
jpayne@68
|
3253 Fixed a ribo filtering flag for RQCFilter.
|
jpayne@68
|
3254 Added RQCFilter ribodb, ribohdist, riboedist flags.
|
jpayne@68
|
3255 Added RQCFilter extend flag (allows BBMerge read extension).
|
jpayne@68
|
3256 Fixed path in file-list (directory was being prepended).
|
jpayne@68
|
3257 35.38
|
jpayne@68
|
3258 Improved Tadpole help info.
|
jpayne@68
|
3259 Added Pileup coverage standard deviation calculation. Requested by Bill A.
|
jpayne@68
|
3260 Fixed one last (?) assertion error in BBMap. Reported by Vasanth and Shijie.
|
jpayne@68
|
3261 35.39
|
jpayne@68
|
3262 Postfilter now unloads Data after mapping.
|
jpayne@68
|
3263 Added trim flag to postfilter and filterbycoverage.
|
jpayne@68
|
3264 35.40
|
jpayne@68
|
3265 Fixed a null pointer in Read.validate().
|
jpayne@68
|
3266 Fixed read extension in Tadpole when K<=31; the wrong method was called, causing a crash. Noted by Westerman.
|
jpayne@68
|
3267 Added Tadpole contig trimming flag.
|
jpayne@68
|
3268 Fixed colossal BBMerge bug - read 2 was being merged as a reverse complement. Not sure when that started...
|
jpayne@68
|
3269 35.41
|
jpayne@68
|
3270 Added spaceslash flag to RandomReads to allow space to be omitted from read names prior to slash pairnum. Requested by Rob Egan.
|
jpayne@68
|
3271 35.42
|
jpayne@68
|
3272 Slightly altered Tadpole1 to allow condensed assembly of kmer sets; added flag ibo (ignore bad owner).
|
jpayne@68
|
3273 Made KmerCompressor and kcompress.sh. Generates a concise fasta representation of the set of kmers occuring at least N times.
|
jpayne@68
|
3274 35.43
|
jpayne@68
|
3275 Fixed crash in BBDuk wheen using MinKmerFraction (mkf) flag on single-ended reads.
|
jpayne@68
|
3276 Added fuse flag to KmerCompressor.
|
jpayne@68
|
3277 Fixed a crash in BBMapPacBio versions, caused by not percolating over a change in normal BBMap. Noted by Teshome.
|
jpayne@68
|
3278 35.44
|
jpayne@68
|
3279 KmerCountExact khist was overflowing if there were more than 2 billion kmers of a given depth. Converted counts to long array.
|
jpayne@68
|
3280 Wrote AbstractRemoveThread for removing kmers with counts outside of a certain range.
|
jpayne@68
|
3281 Added mincr and maxcr (min count to retain and max count to retain) flags to Tadpole.
|
jpayne@68
|
3282 Fixed incorrect haploid_fold_coverage in peaks.txt. Noted by Kurt.
|
jpayne@68
|
3283 Fixed KmerCountExact not writing peaks file if no khist was specified.
|
jpayne@68
|
3284 Fixed Tadpole differentiation between in/out and ine/oute. Now only in and out are needed.
|
jpayne@68
|
3285 Added Reads and Bases columns to Dedupe output. Requested by Esther S.
|
jpayne@68
|
3286 35.45
|
jpayne@68
|
3287 Added driver.CountSharedLines and countsharedlines.sh. Requested by Esther S.
|
jpayne@68
|
3288 35.46
|
jpayne@68
|
3289 Added smoothing control flags for KmerCountExact.
|
jpayne@68
|
3290 Caught invalid values of K in BBMap.
|
jpayne@68
|
3291 Added some additional header lines in peaks output.
|
jpayne@68
|
3292 35.47
|
jpayne@68
|
3293 Fixed BBMap incorrect NM tags for reads with soft-clipping. Noted by Rob Egan.
|
jpayne@68
|
3294 35.48
|
jpayne@68
|
3295 Disabled second parameter being automatically interpreted as an output file when = is not specified, in most cases. This is ambiguous as the second parameter might be a file for input read 2.
|
jpayne@68
|
3296 Fixed a new bug in newly fixed NM tag gen ^^;. Also noted by Rob Egan.
|
jpayne@68
|
3297 Identity calculations no longer penalize regions skipped as introns if the intronlen flag is set. Suggested by Rob Egan.
|
jpayne@68
|
3298 35.49
|
jpayne@68
|
3299 Clarified error messages for reads failing barcode filter.
|
jpayne@68
|
3300 Added cutprimers flag to include flanking primer sequence.
|
jpayne@68
|
3301 BBMerge trimq can now be an array for multiple attempts.
|
jpayne@68
|
3302 Multithreaded memory allocation for bloom filters; moderate speed increase.
|
jpayne@68
|
3303 Added mingc/maxgc to BBDuk and BBDuk2.
|
jpayne@68
|
3304 Added BBDuk mcf (min covered fraction) flag.
|
jpayne@68
|
3305 35.50
|
jpayne@68
|
3306 Added KmerCompressor max flag.
|
jpayne@68
|
3307 Clarified crossblock help regarding input file lists.
|
jpayne@68
|
3308 Enclosed all iterations of Dedupe overlapLists with a null check.
|
jpayne@68
|
3309 KmerCompressor is not deterministic when multithreaded (kmers may be used more than once); reduced buildthreads to 1.
|
jpayne@68
|
3310 Added LogLog class for cardinality estimation, and loglog.sh.
|
jpayne@68
|
3311 Enabled loglog flag for Reformat and BBDuk.
|
jpayne@68
|
3312 35.51
|
jpayne@68
|
3313 Added X bit to bamscript generated by BBMap.
|
jpayne@68
|
3314 Loglog can now accept multiple files.
|
jpayne@68
|
3315 Changed settings of removehuman in rqcfilter to be faster (requires 2 hits now).
|
jpayne@68
|
3316 Fixed a null pointer exception in BBMerge with quality.
|
jpayne@68
|
3317 Fixed a bug with BBMerge ecco flag being ignored.
|
jpayne@68
|
3318 Upgraded Seal to allow requiring full containments of ref sequences. Requested by Ernst O.
|
jpayne@68
|
3319 35.52
|
jpayne@68
|
3320 Fixed issue with "ignorebadquality" flag being ignored in some cases. Noted by Alicia C.
|
jpayne@68
|
3321 35.53
|
jpayne@68
|
3322 Added mouse to RQCFilter.
|
jpayne@68
|
3323 Switched RQCFilter and RemoveHuman to k=14 for a 4x speedup.
|
jpayne@68
|
3324 Modified rqcfilter.sh to allow mouse, cat, dog, and human removal concurrently on 40GB nodes.
|
jpayne@68
|
3325 Disabled test for too-high quality scores because it was annoying when dealing with PacBio reads.
|
jpayne@68
|
3326 35.54
|
jpayne@68
|
3327 Added TadpoleWrapper and tadwrapper.sh, which runs Tadpole multiple times with different kmer lengths and recommends the best length. Requested by Alex Copeland.
|
jpayne@68
|
3328 Added normandcorrectwrapper.sh, which runs BBNorm then Tadpole. Requested by Stephan Trong.
|
jpayne@68
|
3329 Added clear() operation to KmerTableSets.
|
jpayne@68
|
3330 35.55
|
jpayne@68
|
3331 Changed Character.isAlphabetic() calls to Character.isLetter().
|
jpayne@68
|
3332 Modified CountBarcodes to add more flags (for dual barcodes).
|
jpayne@68
|
3333 RQCFilter pigz and unpigz now default to true.
|
jpayne@68
|
3334 Moved parsing of threads and recalpairnum from parseCommon to parseCommonStatic.
|
jpayne@68
|
3335 Increased sensitivity of ribo removal (96.6% to 98.8%) by using a larger kmer set.
|
jpayne@68
|
3336 Adjusted BBMap default per-thread memory usage estimate after a crash. Noted by Matt Nolan.
|
jpayne@68
|
3337 35.57
|
jpayne@68
|
3338 Fixed a change to removehuman.sh default memory allocation.
|
jpayne@68
|
3339 35.58
|
jpayne@68
|
3340 Fixed SynthMDA handling of minlen flag.
|
jpayne@68
|
3341 35.59
|
jpayne@68
|
3342 Added sam 1.4 -> 1.3 support to reformat, via sam=1.3 flag.
|
jpayne@68
|
3343 Added RQCFilter filterqhdist flag. Requested by Adam Rivers.
|
jpayne@68
|
3344 Slightly reduced default mininsert in BBDuk from 50 to 40.
|
jpayne@68
|
3345 Added some additional comments to BBDuk.
|
jpayne@68
|
3346 Added # support for BBMap output files. Requested by Adrian Pelin.
|
jpayne@68
|
3347 Fixed rqcfilter.sh not grab enough memory on slot-scheduled Mendel nodes... hopefully.
|
jpayne@68
|
3348 Added name flag to FuseSequence.
|
jpayne@68
|
3349 35.60
|
jpayne@68
|
3350 Data.clear() now also clears scaffold information in BBSplitter.
|
jpayne@68
|
3351 Added removemicrobes flag to RQCFilter.
|
jpayne@68
|
3352 Added removehuman2.sh for aggressive human contaminant removal versus an unmasked reference. Requested by Alicia Clum.
|
jpayne@68
|
3353 Modified RQCFilter to support unmasked mouse, cat, dog, and human references. Requested by Alicia Clum.
|
jpayne@68
|
3354 Allowed entropyfilter bbduk flag instead of just entropy.
|
jpayne@68
|
3355 Unpigz is now used in certain cases where it was prevented before, like reading lists of names in filterbyname.
|
jpayne@68
|
3356 35.61
|
jpayne@68
|
3357 Corrected some names in RQCFilter file-list.txt.
|
jpayne@68
|
3358 35.62
|
jpayne@68
|
3359 Split writeReproduceHeader off from writeReproduceFile.
|
jpayne@68
|
3360 Added BBTools version and RQCFilter command to RQCFilter reproduce.sh log.
|
jpayne@68
|
3361 Added humanpath, catpath, dogpath, mousepath flags to RQCFilter and clarified them in the documentation.
|
jpayne@68
|
3362 Improved documentation of bbduk2.sh.
|
jpayne@68
|
3363 Fixed BBMergeOverlapper.c to match java version.
|
jpayne@68
|
3364 35.63
|
jpayne@68
|
3365 Fixed some BBMergeOverlapper.c syntax errors.
|
jpayne@68
|
3366 35.64
|
jpayne@68
|
3367 Fixed more BBMergeOverlapper.c syntax errors.
|
jpayne@68
|
3368 35.65
|
jpayne@68
|
3369 Fixed a BBMergeOverlapper.c runtime errors in quality-free mode.
|
jpayne@68
|
3370 Finally working again!
|
jpayne@68
|
3371 35.66
|
jpayne@68
|
3372 Changed testQuality() to assign ASCII-64 to the specific case of N bases with quality B.
|
jpayne@68
|
3373 Added preliminary support for dsrc compression. However, the program does not appear to work correctly in Windows.
|
jpayne@68
|
3374 Added header output (.header extension). Requested by Matt Nolan.
|
jpayne@68
|
3375 Added coverage calculation ignoring deletions.
|
jpayne@68
|
3376 AssemblyStats (stats.sh) now has fastq support; requested by several people.
|
jpayne@68
|
3377 Added file type and extension documentation as readme_filetypes.txt.
|
jpayne@68
|
3378 Removed obsolete changelogs for BBDuk and Reformat.
|
jpayne@68
|
3379 35.67
|
jpayne@68
|
3380 Added "_part" suffix before the part number to names of automatic-split fasta reads. This fixes a problem with underscore-number-named sequences in BBEst. Noted by Kurt L.
|
jpayne@68
|
3381 Fixed a corner-case in filterbycoverage's handling of trimmed reads that drop below the length cutoff. Noted by Stephan Trong.
|
jpayne@68
|
3382 35.68
|
jpayne@68
|
3383 Wrote KmerComparator and KmerComparator2 for comparing reads by pivot kmers.
|
jpayne@68
|
3384 Wrote KmerSort for sorting reads by pivots.
|
jpayne@68
|
3385 Wrote KmerSplit for binning reads by pivots.
|
jpayne@68
|
3386 Wrote Clump class for storing ordered overlapping reads.
|
jpayne@68
|
3387 Wrote Clumpify to wrap KmerSplit and KmerSort.
|
jpayne@68
|
3388 Wrote ClumpList to turn a list of clumped reads into a list of clumps.
|
jpayne@68
|
3389 Added preliminary consensus operations to Clump and KmerSort.
|
jpayne@68
|
3390 Added KmerReduce to produce the set of pivot kmers.
|
jpayne@68
|
3391 Fixed an out-of-bounds error in CutPrimers. Noted by vmikk (SeqAnswers).
|
jpayne@68
|
3392 Moved UnicodeToAscii (which did not work), TableLoaderLockFreeU, and TableReaderU to z_old, since they cause compilation problems. Noted by Martin M.
|
jpayne@68
|
3393 Fixed prefilter onepass mode causing a crash.
|
jpayne@68
|
3394 Added clump package.
|
jpayne@68
|
3395 Added kmer-count restrictions to Clump pivot selection. Not clear whether it is useful.
|
jpayne@68
|
3396 Added local maximum capability to KmerComparator.
|
jpayne@68
|
3397 Fixed BBNorm to work with kmers>31, for normalization (not error-correction yet). Not fully tested, though. Noted by Kurt L.
|
jpayne@68
|
3398 KmerTableSet read loading now does read validation per thread. This allows better multithreaded scaling.
|
jpayne@68
|
3399 BBDuk and Seal also now do validation per thread.
|
jpayne@68
|
3400 35.69
|
jpayne@68
|
3401 Multithreaded kmer table dumping by KmerCountExact and Tadpole; over 4x speedup.
|
jpayne@68
|
3402 35.70
|
jpayne@68
|
3403 Tadpole now does validation per-thread when error-correcting. Slight speed increase.
|
jpayne@68
|
3404 Fixed a bug in KmerCountExact in which prefilter did not work with K>31, due to using key() instead of xor(). Noted by Kurt L.
|
jpayne@68
|
3405 35.71
|
jpayne@68
|
3406 Added A_SampleMT, with full line-by-line comments.
|
jpayne@68
|
3407 Improved A_Sample's comments.
|
jpayne@68
|
3408 Added kmer histogram generation to rqcfilter (khist flag).
|
jpayne@68
|
3409 Reorganized rqcfilter.sh documentation.
|
jpayne@68
|
3410 Added a_sample_mt.sh.
|
jpayne@68
|
3411 Fixed documentation in shuffle.sh.
|
jpayne@68
|
3412 Running any program with -version, -help, etc now prints a useful message.
|
jpayne@68
|
3413 35.72
|
jpayne@68
|
3414 LogLog now retains last cardinality estimate in a static field.
|
jpayne@68
|
3415 RQCFilter now chooses BBNorm or KmerCountExact for the khist depending on the estimated cardinality.
|
jpayne@68
|
3416 Kmer histograms now have a header by default.
|
jpayne@68
|
3417 35.73
|
jpayne@68
|
3418 HashBufferU now only tries to acquire a lock every 16th time, like HashBuffer.
|
jpayne@68
|
3419 Removed some checks for the literal string null.
|
jpayne@68
|
3420 Fixed some else-if fallthroughs where else was missing.
|
jpayne@68
|
3421 Addressed some compiler warnings in kmer, ukmer packages.
|
jpayne@68
|
3422 Wrote FilterByTaxa for filtering of sequences labelled with their taxonomy (gi number or ncbi taxID).
|
jpayne@68
|
3423 Wrote PrintTaxonomy.
|
jpayne@68
|
3424 Wrote taxonomy.sh and filterbytaxa.sh.
|
jpayne@68
|
3425 35.74
|
jpayne@68
|
3426 Added peaks output to rqcfilter.
|
jpayne@68
|
3427 Made TaxFilter class and revised FilterByTaxa to use it.
|
jpayne@68
|
3428 Added FilterByTaxa support to RQCFilter for microbial decontamination.
|
jpayne@68
|
3429 Fixed a bug in which empty files had their format misdetected.
|
jpayne@68
|
3430 Fixed a couple array-out-of-bounds errors with unicode characters in genetic sequence. They are now converted to N.
|
jpayne@68
|
3431 35.75
|
jpayne@68
|
3432 Enabled worker thread read validation in A_SampleMT.
|
jpayne@68
|
3433 Wrote SplitByTaxa and splitbytaxa.sh.
|
jpayne@68
|
3434 35.76
|
jpayne@68
|
3435 Changed documentation structure. There is now changelog.txt, readme.txt, UsageGuide.txt, and ToolDescriptions.txt.
|
jpayne@68
|
3436 Fixed IntList resize overflow bug. Noted by jazz710 (SeqAnswers).
|
jpayne@68
|
3437 Removed unicode2ascii.sh since it does not work.
|
jpayne@68
|
3438 Wrote FungalRelease and fungalrelease.sh. Requested by Kurt and Jasmyn.
|
jpayne@68
|
3439 35.77
|
jpayne@68
|
3440 BBDuk can now call CalcTrueQuality to generate matrices if given a sam file.
|
jpayne@68
|
3441 Added scaffold name remapping legend to FungalRelease. Requested by Kurt and Jasmyn.
|
jpayne@68
|
3442 Wrote BBDukGuide.
|
jpayne@68
|
3443 Wrote BBMergeGuide.
|
jpayne@68
|
3444 Wrote TadpoleGuide.
|
jpayne@68
|
3445 35.78
|
jpayne@68
|
3446 Fixed infinite recursion when setting threadcount. Found by Matt Nolan.
|
jpayne@68
|
3447 35.79
|
jpayne@68
|
3448 Changed BBNorm defaults to target=100 min=5.
|
jpayne@68
|
3449 Wrote Reformat guide.
|
jpayne@68
|
3450 Wrote Seal guide.
|
jpayne@68
|
3451 Wrote Taxonomy guide.
|
jpayne@68
|
3452 Added Tools.startsWith(byte[], String)
|
jpayne@68
|
3453 Revised GiToNcbi and TaxTree functions to allow gi_ as well as gi|, to avoid pipe symbol.
|
jpayne@68
|
3454 35.79
|
jpayne@68
|
3455 Standardized syntax of gitable and taxtree flags, and added "auto" option.
|
jpayne@68
|
3456 35.80
|
jpayne@68
|
3457 Added FilterBySequence and filterbysequence.sh.
|
jpayne@68
|
3458 Wrote PreprocessingGuide.
|
jpayne@68
|
3459 Wrote DedupeGuide.
|
jpayne@68
|
3460 Wrote BBNormGuide.
|
jpayne@68
|
3461 Removed bbmap20.sh, bbnorm20.sh, bbsplit20.sh, and khist20.sh since they can now have memory set explicitly.
|
jpayne@68
|
3462 35.81
|
jpayne@68
|
3463 Unified shellscipts between private and public release - module load commands now only run if NERSC_HOST==genepool, and "-l" removed from /bin/bash header.
|
jpayne@68
|
3464 JNI mode is now enabled by default if NERSC_HOST==genepool.
|
jpayne@68
|
3465 35.82
|
jpayne@68
|
3466 Fixed a double-print of BBMap version number.
|
jpayne@68
|
3467 Updated projectb pre-deploy version of BBTools compiled jni code and disabled auto-NJI-enable when NERSC_HOST==genepool.
|
jpayne@68
|
3468 Wrote guides for A_Sample, BBMask, Stats, CalcUniqueness, Repair, SplitNextera, Clumpify, and AddAdapters.
|
jpayne@68
|
3469 Wrote BBMapGuide.
|
jpayne@68
|
3470 35.83
|
jpayne@68
|
3471 Added "banns" to RandomReads.
|
jpayne@68
|
3472 35.84
|
jpayne@68
|
3473 Fixed an unnecessary assertion for negative values of pairedScore in Tools.removeLowQualitySitesUnpaired2. Noted by Jason S.
|
jpayne@68
|
3474 Fixed a bug in SamLine.makeMdTag for handling deletions called off the end of a scaffold. Noted by Jason S.
|
jpayne@68
|
3475 35.85
|
jpayne@68
|
3476 Fixed a bug in which BBMap was verifying the presence of the wrong input file. Noted by Adrian P.
|
jpayne@68
|
3477 Added a check for a memory environment variable to calcmem.sh (only affects jobs run at JGI on Genepool)
|
jpayne@68
|
3478 35.87
|
jpayne@68
|
3479 Changed version in shared.java to 35.87, fixed bug in calcmem.sh (was calling itself)
|
jpayne@68
|
3480 35.90
|
jpayne@68
|
3481 Added ProcessSpeed and ProcessFragMerging for collating output in a BBMerge comparison.
|
jpayne@68
|
3482 ByteFile1 had an error related to Windows-formatted (\r\n newline) files. This was fixed by dropping support for legacy Mac (\r) newlines. So now, valid newlines are \n (Unix/MaxOS X) or \r\n (Windows) but not very old Mac (\r) which I have never seen anyway. This change also increased ByteFile1 speed by 10%.
|
jpayne@68
|
3483 Added GC-filter mode toggle between individual reads and pair averages for paired reads with the "pairgc" flag. Requested by Torben.
|
jpayne@68
|
3484 GC histogram mode now also obeys the pairgc flag. Default is true; previously, it was false. So, now the filter and histogram defaults match.
|
jpayne@68
|
3485 35.91
|
jpayne@68
|
3486 Fixed an assertion error that fired on fasta files containing blank lines. Noted by Vasanth S.
|
jpayne@68
|
3487 35.92
|
jpayne@68
|
3488 Fixed interleaving detection when BBDuk generates calibration matrices.
|
jpayne@68
|
3489 Fixed a bug in calcmem that ignored used memory when an SGI flag is set.
|
jpayne@68
|
3490 Removed debug code from quality trimming.
|
jpayne@68
|
3491 35.93
|
jpayne@68
|
3492 Rewrote coverage stats line parsing to be header-defined.
|
jpayne@68
|
3493 35.94
|
jpayne@68
|
3494 Wrote MDWalker to help parse MD tags.
|
jpayne@68
|
3495 Modified sam line parsing to use MD tags and base calls to differentiate between substitutions and nocalls.
|
jpayne@68
|
3496 35.95
|
jpayne@68
|
3497 Added commonAncestor methods to TaxTree.
|
jpayne@68
|
3498 Added positional startsWith method to Tools.
|
jpayne@68
|
3499 Wrote jgi.A_SampleByteFile as a text-stream processing template.
|
jpayne@68
|
3500 Wrote tax.FindAncestor and gi2ancestors.sh for converting sets of GI numbers into a single taxonomy.
|
jpayne@68
|
3501 Wrote driver.ProcessWebcheck and webcheck.sh to calculate statistics on the RQC site uptime. Requested by Bryce.
|
jpayne@68
|
3502 Removed an assertion from KmerCompressor that fired with very long contigs. Noted by Eugene H.
|
jpayne@68
|
3503 35.96
|
jpayne@68
|
3504 Fixed quality trimming of fasta reads. Terminal Ns were supposed to be removed, but it was not happening.
|
jpayne@68
|
3505 Fixed fake quality score generation of N-containing fasta reads.
|
jpayne@68
|
3506 Changed tax package so that tax queries by name return a list of names when there are multiple hits (try bacteria, for example).
|
jpayne@68
|
3507 Modified PrintTaxonomy to accomodate multiple hits.
|
jpayne@68
|
3508 GiToNcbi now capable of accepting a comma-delimited list of dump files, since NCBI keeps nucleotide and protein sequences in different files.
|
jpayne@68
|
3509 Improved parsing in TaxTree.
|
jpayne@68
|
3510 Changed do loops to while loops in TaxFilter to allow filtering by tree root (life).
|
jpayne@68
|
3511 35.97
|
jpayne@68
|
3512 Added TaxFilter ability to require specific ancestor nodes to be defined, such as phylum.
|
jpayne@68
|
3513 FileFormat now recognizes gzip extension (in addition to gz).
|
jpayne@68
|
3514 Added BBMap excludefraction (ef) flag to manually override the fraction of the kmers discarded as low-information.
|
jpayne@68
|
3515 Added BBMap ignorefrequentkmers (ifk) flag to determine whether to ignore low-information kmers.
|
jpayne@68
|
3516 Added BBMap greedy flag to manually override whether to discard the least informative kmers on a per-read basis.
|
jpayne@68
|
3517 FilterReadsByName can now accept position to only output a fraction of a sequence.
|
jpayne@68
|
3518 Wrote FilterAssemblySummary for processing NCBI assembly_summary_refseq.txt and assembly_summary_genbank.txt files using TaxTree.
|
jpayne@68
|
3519 Wrote filterassemblysummary.sh.
|
jpayne@68
|
3520 35.98
|
jpayne@68
|
3521 Capped Output buffer became full messages at one.
|
jpayne@68
|
3522 Modified KCompress to assemble forward kmers only (rcomp=f flag).
|
jpayne@68
|
3523 35.99
|
jpayne@68
|
3524 Wrote RenameByHeader to rename files based on their headers. This is designed for NCBI RefSeq genomes.
|
jpayne@68
|
3525 Re-added missing statsfile flag in BBMap. Noted by Vasanth S.
|
jpayne@68
|
3526 Wrote HeaderInputStream to support input of header files with no sequence.
|
jpayne@68
|
3527 Wrote ReplaceHeaders and replaceheaders.sh to insert headers into a sequence file from a different file.
|
jpayne@68
|
3528 Default FileFormat detection type for files without fasta- or fastq-specific symbols is now DEFAULT.
|
jpayne@68
|
3529 Wrote removemicrobes.sh to use the new common microbe filter.
|
jpayne@68
|
3530 Added removemicrobes build support to RQCFilter.
|
jpayne@68
|
3531 BBMask now supports wildcards for sam file paths.
|
jpayne@68
|
3532
|
jpayne@68
|
3533
|
jpayne@68
|
3534 TODO: Read magic number of potentially gzipped files?
|
jpayne@68
|
3535 TODO: String indicating exit failure.
|
jpayne@68
|
3536 TODO: BBMap does not correctly track semiperfect sites and N rate when read Ns align with reference Ns.
|
jpayne@68
|
3537
|
jpayne@68
|
3538 TODO: RQCFilter runs out of memory during khist for metagenomes.
|
jpayne@68
|
3539
|
jpayne@68
|
3540 TODO: Add minprob to LogLog.
|
jpayne@68
|
3541
|
jpayne@68
|
3542 TODO: Heejung encountered a random null-pointer exception in ByteFile2.run() at "list[loc]=s;". However, I manually examined the code and this state appears to be impossible. Perhaps it is a JVM bug?
|
jpayne@68
|
3543 TODO: Autoset bits and prefilter for khist based on cardinality.
|
jpayne@68
|
3544
|
jpayne@68
|
3545 TODO: KmerSet prefilter=1 onepass does not work (prefilter=2 onepass does work).
|
jpayne@68
|
3546
|
jpayne@68
|
3547 TODO: Validate BBNorm results with k>31.
|
jpayne@68
|
3548 TODO: Add summary of how many reads got removed to BBDuk. (Hemant).
|
jpayne@68
|
3549 TODO: Add Tadpole ability to screen reads with kmers only occurring at most N times, or having errors/Ns after correction. (Torben).
|
jpayne@68
|
3550
|
jpayne@68
|
3551 TODO: Program that can demux a sequence file into multiple sequence files randomly.
|
jpayne@68
|
3552 TODO: SynthMDA with a short reference outputs lots of reads with Ns (Alex Copeland).
|
jpayne@68
|
3553 TODO: Parser.parse should go at the end, not beginning, of parse blocks for all programs.
|
jpayne@68
|
3554 TODO: Tadpole should keep nodes with only outward branches.
|
jpayne@68
|
3555 TODO: Print kmer coverage information after Tadpole assemblies (Alex Copeland).
|
jpayne@68
|
3556 ***TODO: Replace QuadHeap with a heap of longs. The current implementation is very slow on NUMA machines.
|
jpayne@68
|
3557 **TODO: Compare Seal performance and correctness with countvector flag. One may be faster for large numbers of ref sequences.
|
jpayne@68
|
3558 TODO: Seal mcf flag.
|
jpayne@68
|
3559 TODO: Represent covariant depth as a vector with 1.0 for max depth for binning.
|
jpayne@68
|
3560 *TODO: Allow kcompress direct set subtraction and intersection.
|
jpayne@68
|
3561 *TODO: Add outu support to filterbyname
|
jpayne@68
|
3562 *TODO: Speed up BBMap indexing.
|
jpayne@68
|
3563
|
jpayne@68
|
3564 TODO: Print information about which reference sequences hit which locations in which reads, for Seal.
|
jpayne@68
|
3565 TODO: Second extra base for BBDuk edit distance...?
|
jpayne@68
|
3566 TODO: Thoroughly vet the assertions in CHECKSITES and SiteScore.setScore() to ensure they will do not incur false positive error messages.
|
jpayne@68
|
3567 TODO: BBMap shave and rinse are reducing contig length at level 2.
|
jpayne@68
|
3568 TODO: bbcountunique should use a longer K and have an offset rather than just looking at the first K bases.
|
jpayne@68
|
3569 TODO: Pincer could handle arbitrary problems - indels, error bursts, etc.
|
jpayne@68
|
3570 TODO: Tail can handle bursts if it simply continues until X bases concur.
|
jpayne@68
|
3571 TODO: Port pincer/tail over to BBNorm.
|
jpayne@68
|
3572 TODO: Use entropy to determine how many bases to extend past errors.
|
jpayne@68
|
3573
|
jpayne@68
|
3574 TODO: BBMap is not handling pairing when ambig=all. Pairing should be done at a SS level. (Elmar P).
|
jpayne@68
|
3575 TODO: Tadpole multipass prefilter, and auto prefilter passes.
|
jpayne@68
|
3576 TODO: BBMap MPI mode.
|
jpayne@68
|
3577 TODO: Seal needs behavior with qhdist to be toggleable between searching or not searching for mutant kmers if a nonmutant kmer is found.
|
jpayne@68
|
3578 TODO: BBDuk with hdist should reprocess the reference multiple times, first with hdist=0, then hdist=1, etc. That will improve specificity.
|
jpayne@68
|
3579 *TODO: qtrim=r trimq=6 (or even 3) improves BBMerge rate for 2x250, 422 insert library.
|
jpayne@68
|
3580 TODO: BBMerge - Track number of errors detected/corrected and error locations.
|
jpayne@68
|
3581 TODO: Use small heap to reorder HashArray1D to optimize it.
|
jpayne@68
|
3582 TODO: Dump kmers to text by way when max size is exceeded, then reload by way and re-dump low count kmers.
|
jpayne@68
|
3583
|
jpayne@68
|
3584 TODO: Tadpole degenerate contig output.
|
jpayne@68
|
3585 TODO: Locked/managed HashArray expansion.
|
jpayne@68
|
3586 TODO: Fractional (1/4) way allocation per build thread.
|
jpayne@68
|
3587 *TODO: extendToRight should return an exit code, not just true or false. May not need to be released.
|
jpayne@68
|
3588 *TODO: Tadpole - first, classify all kmers as junction or non-junction (via ownership).
|
jpayne@68
|
3589 ***TODO: Always verify that left max yields prev kmer/evicted base. If not, that is a hidden branch.
|
jpayne@68
|
3590 *TODO: Allocation schedule for HashArrays.
|
jpayne@68
|
3591 *TODO: Optional synchronized resize on final schedule slot to minimize memory use.
|
jpayne@68
|
3592
|
jpayne@68
|
3593 TODO: MS state for MSA, always, for M1 state.
|
jpayne@68
|
3594 TODO: extin and extout flags for BBMap.
|
jpayne@68
|
3595 TODO: FastaReadInputStream asserts false for headers with no sequence.
|
jpayne@68
|
3596
|
jpayne@68
|
3597 TODO: Speed up shaving (exploration) where possible.
|
jpayne@68
|
3598
|
jpayne@68
|
3599 TODO: Seal kmer rank promotion with 1D arrays.
|
jpayne@68
|
3600 TODO: Partition program, round-robin with equal number of bp or equal number of sequences per output.
|
jpayne@68
|
3601 TODO: msa.sh should accept a file instead of literal.
|
jpayne@68
|
3602 TODO: BBMap bed format (Alex C).
|
jpayne@68
|
3603 TODO: BBMap fix for crash in filterbyname on sam file - SamLine 1490, assert(start_<=stop_).
|
jpayne@68
|
3604 TODO: Reformat lhist and readlength.sh should have equivalent information. "I prefer readlength.sh info"
|
jpayne@68
|
3605 TODO: Tadpole/KCE double-lock and double-buffer with LongLists for loading.
|
jpayne@68
|
3606 TODO: xmx=auto or percentage
|
jpayne@68
|
3607 TODO: reformat: multithread?
|
jpayne@68
|
3608 TODO: (write scaffolder)
|
jpayne@68
|
3609 TODO: (write polishing/consensus tool)
|
jpayne@68
|
3610 TODO: (write breaker)
|
jpayne@68
|
3611
|
jpayne@68
|
3612
|
jpayne@68
|
3613 V34.
|
jpayne@68
|
3614 34.00
|
jpayne@68
|
3615 Fixed a bug in BandedAlignerConcrete related to width being allowed to be even.
|
jpayne@68
|
3616 34.01
|
jpayne@68
|
3617 IdentityMatrix is now much faster for ghigh-identity sequences, and allows the 'width' flag to increase speed.
|
jpayne@68
|
3618 Updated FilterReadsByName to allow "names=<read filename>", supporting fastq, fasta, and sam. So, one file will be filtered according to the names of reads in a second file. "names=<file>" where the file is just a list of names is still supported.
|
jpayne@68
|
3619 34.02
|
jpayne@68
|
3620 Fixed a couple errors in ConcurrentReadInputStreamD.
|
jpayne@68
|
3621 Added fetching of a dummy list from "empty" for crisD, both master and slave.
|
jpayne@68
|
3622 Added A_SampleD, which uses crisD. It now works correctly for master.
|
jpayne@68
|
3623 Renamed various ConcurrentReadStreamInterface classes.
|
jpayne@68
|
3624 Added an abstract superclass for all ConcurrentReadInputStreams, which extends Thread. Now, cris can be started directly without making a new thread.
|
jpayne@68
|
3625 Changed all instances of wrapping cirs in a thread to just use start directly. These are mostly commented with "//4567" to find if something was missed (like starting the cris twice).
|
jpayne@68
|
3626 Increased cris stability by removing "returnList(ListNum, boolean)" and replacing it with "returnList(long, boolean)". Lists may no longer be recycled.
|
jpayne@68
|
3627 34.03
|
jpayne@68
|
3628 Added scaffoldstats to BBQC and RQCFilter fileList logs. Requested by Bryce F.
|
jpayne@68
|
3629 Fixed a strange deadlock in Dedupe/ConcurrentCollectionReadInputStream caused by making CRIS a Thread subclass. This will still occur if CRIS goes back to being a Thread. Noted by Shoudan.
|
jpayne@68
|
3630 34.04
|
jpayne@68
|
3631 Removed hitCount tracking from Seal.
|
jpayne@68
|
3632 "qtrim=<integer>" is now allowed for all classes using Parser.parseTrim().
|
jpayne@68
|
3633 Parser.parseZip, parseInterleaved, parseQuality, parseTrim, parseFasta, and parseCommonStatic were integrated into most classes; reduced code size by almost 200kb.
|
jpayne@68
|
3634 Parser.parseTrim got some extra functionality, like maxNs.
|
jpayne@68
|
3635 Made an abstract superclass for KmerCount* classes, allowing removal of some code.
|
jpayne@68
|
3636 Removed all KmerCount.countFasta() methods; they must now use a CRIS.
|
jpayne@68
|
3637 Retired ErrorCorrectMT (superceded ny KmerNormalize).
|
jpayne@68
|
3638 Fixed bug in BBDuk, Seal, and ReformatReads - when quality trimming and force-trimming, count of trimmed reads could go over 100%. Now these counts are independent. Noted by ysnapus (SeqAnswers).
|
jpayne@68
|
3639 Removed "minscaf" and "mincontig" flags from Parser.parseFasta() because they were conflated.
|
jpayne@68
|
3640 Determined cause of Kurt's error message in Dedupe - lower-case letters can trigger a failure.
|
jpayne@68
|
3641 Dedupe now defaults to "tuc=t" (all input is made upper-case).
|
jpayne@68
|
3642 Moved CRIS factory from CGRIS to CRIS.
|
jpayne@68
|
3643 Copied cc2-cc5 to /global/projectb/sandbox/gaag/TestData/SingleCell/SimMockCommunity/plate*/. These are simulated cross-contaminated single cell plates.
|
jpayne@68
|
3644 Removed conflated "qual" flag from RandomReads; "q" should be used instead to set all read quality values to a single number.
|
jpayne@68
|
3645 Fixed conflated "renamebymapping" flag in RenameReads.
|
jpayne@68
|
3646 "tbr" flag is conflated in KmerNormalize; adjusted so that it now controls both "tossBadReads" (reads with errors) and "tossBrokenReads" (reads with the wrong number of quality scores).
|
jpayne@68
|
3647 Conflated "gzip" flag in ChromArrayMaker/FastaToChromArrays changed to "gz".
|
jpayne@68
|
3648 Handled conflated "ziplevel" flag in AbstractMapper.
|
jpayne@68
|
3649 Conflated "fakequality" flag resolved by moving from BBMap to Parser and renaming "fakefastaquality"/"ffq".
|
jpayne@68
|
3650 Added hdist2 and edist2 to BBDuk. These allow independently specifying hdist/edist for full-length kmers and short kmers when using mink.
|
jpayne@68
|
3651 Added trimhdist2 to RQCFilter/BBQC.
|
jpayne@68
|
3652 *** Added path and mapref flags to RQCFilter/BBQC; they can now map to an arbitrary genome instead of just human.
|
jpayne@68
|
3653 Added Shared.USE_MPI field (parsed by Parser.parseCommonStatic; "mpi" or "usempi").
|
jpayne@68
|
3654 Added Shared.MPI_RANK field (should be set automatically).
|
jpayne@68
|
3655 Added Shared.MPI_KEEP_ALL field. This controls whether CRISD objects retain all reads, or just some of them.
|
jpayne@68
|
3656 CRIS now automatically returns a CRISD when USE_MPI is enabled, as a slave or master depending on whether rank==0.
|
jpayne@68
|
3657 ListNum is now Serializable.
|
jpayne@68
|
3658 CRISD now transmits ListNum objects rather than ArrayLists, so that the number is preserved.
|
jpayne@68
|
3659 Added Maxns flag to reformat.
|
jpayne@68
|
3660 Fixed BBQC and RQCFilter's unnecessary addition of "usejni" to BBMap phase, since it is now already parsed by parseCommonStatic.
|
jpayne@68
|
3661 BBQC now defaults to normalization and ecc off, but can be enabled with the "ecc" and "norm" flags, and supports cecc flag.
|
jpayne@68
|
3662 Added notes on compiling JNI version suggested by sdriscoll.
|
jpayne@68
|
3663 34.05
|
jpayne@68
|
3664 Commented out a reference to ErrorCorrectMT in MateReadsMT.
|
jpayne@68
|
3665 34.06
|
jpayne@68
|
3666 FindPrimers (msa.sh) now accepts multiple queries (primers) and will use the best-matching of them.
|
jpayne@68
|
3667 Added a BBMap flag to disable score penalty due to ambiguous alignments ("penalizeambiguous" or "pambig"). Requested by Matthias.
|
jpayne@68
|
3668 Fixed failure to start CRIS in A_SampleD.
|
jpayne@68
|
3669 Fixed some incorrect division in CRISD.
|
jpayne@68
|
3670 Added MPI_NUM_RANKS to Shared. This is parsed by parser via e.g. "mpi=4".
|
jpayne@68
|
3671 Added BBMap flags subfilter, insfilter, delfilter, inslenfilter, dellenfilter, indelfilter, editfilter. These function similarly to idfilter. Requested by sdriscoll.
|
jpayne@68
|
3672 34.07
|
jpayne@68
|
3673 Dedupe now automatically calls Dedupe2 if more than 2 affixes are requested.
|
jpayne@68
|
3674 Added "subset" (sst) and "subsetcount" (sstc) flags to Dedupe.
|
jpayne@68
|
3675 Added "printLengthInEdges" (ple) flag to Dedupe.
|
jpayne@68
|
3676 34.08
|
jpayne@68
|
3677 Finished Dedupe subset processing for graph file generation.
|
jpayne@68
|
3678 34.09
|
jpayne@68
|
3679 Fixed bug where 'k' was not added to filename in RQCFilter. Noted by Vasanth.
|
jpayne@68
|
3680 34.10
|
jpayne@68
|
3681 Documented "ordered" and "trd" flags for BBDuk/Seal.
|
jpayne@68
|
3682 Added crismpi flag to allow switching between crisd and crismpi.
|
jpayne@68
|
3683 Added shared.mpi package, containing MPIWrapper and ConcurrentReadInputStreamMPI.
|
jpayne@68
|
3684 34.11
|
jpayne@68
|
3685 Added detection of read length, interleaving, and quality coding to FileFormat objects, but these fields are not currently read.
|
jpayne@68
|
3686 FileFormat.main() now outputs read length, if in fastq format.
|
jpayne@68
|
3687 Reformat will now allow sam -> sam conversion; not useful in practice, but maybe useful in testing.
|
jpayne@68
|
3688 Added flag "mpikeepall", default true.
|
jpayne@68
|
3689 Fixed deadlock when mpikeepall=false. Noted by Jon Rood.
|
jpayne@68
|
3690 34.12
|
jpayne@68
|
3691 Added 'auto' option to gcbins and idbins flags. Requested by Seung-jin.
|
jpayne@68
|
3692 Added dedupe "addpairnum" flag to control whether ".1" is appended to numbered graph nodes.
|
jpayne@68
|
3693 Added real quality to qhist plot, when mhist is being generated.
|
jpayne@68
|
3694 Moved maxns and maq to AFTER quality trimming in RQCFilter and BBDuk.
|
jpayne@68
|
3695 Added "ftm" (forcetrimmodulo) flag to BBDuk/Reformat/RQCFilter/BBQC. Default 5 for RQCFilter/BBQC, 0 otherwise.
|
jpayne@68
|
3696 34.13
|
jpayne@68
|
3697 Fixed a missing "else" in RQCFilter/BBQC. Noted by Kurt LaButti.
|
jpayne@68
|
3698 34.14
|
jpayne@68
|
3699 Added .size() to ListNum.
|
jpayne@68
|
3700 CrisD gained "unicast" method. Also, unicast and listen now have mandatory toRank parameter.
|
jpayne@68
|
3701 Made CrisD MPI methods protected rather than private, so they can be overridden.
|
jpayne@68
|
3702 Refactored RTextOutputStream3.
|
jpayne@68
|
3703 34.15
|
jpayne@68
|
3704 Added Shared.LOW_MEMORY:
|
jpayne@68
|
3705 Disables multithreaded index gen.
|
jpayne@68
|
3706 Disables multithreaded ReadWrite writeObjectInThread method.
|
jpayne@68
|
3707 Disables ByteFile2.
|
jpayne@68
|
3708 For some reason it does not really seem to reduce memory consumption...
|
jpayne@68
|
3709 Added BBMap qfin1 and qfin2 flags.
|
jpayne@68
|
3710 Updated BBMap to use more modern input stream initialization.
|
jpayne@68
|
3711 Added mapnt.sh for mapping to nt on a 120g node.
|
jpayne@68
|
3712 34.16
|
jpayne@68
|
3713 Changed RQCFilter "t" to mean "trimmed"; "k" was removed.
|
jpayne@68
|
3714 Added parser noheadersequences (nhs) flag for sam files with millions of ref sequences.
|
jpayne@68
|
3715 Documented "ambig" flag in Seal.
|
jpayne@68
|
3716 Fixed issue where Shared.READ_BUFFER_NUM_BUFFERS was not getting changed with THREADS was changed. Now both are private and get set together.
|
jpayne@68
|
3717 Verified that mapnt.sh works on 120G nodes.
|
jpayne@68
|
3718 34.17
|
jpayne@68
|
3719 RTextOutputStream3 renamed to ConcurrentReadOutputStream.
|
jpayne@68
|
3720 ReadStreamByteWriter refactored to be cleaner.
|
jpayne@68
|
3721 Merged MPI dev branch into master.
|
jpayne@68
|
3722 34.18
|
jpayne@68
|
3723 Moved Seal's maxns/maq to after trimming.
|
jpayne@68
|
3724 Added chastity filter to bbduk and reformat (reads containing " 1:Y:" or " 2:Y:"). Requested by Lynn A.
|
jpayne@68
|
3725 Dedupe outd stream now produces correctly interleaved reads. Requested by Lynn A.
|
jpayne@68
|
3726 Replaced Dedupe TextStreamWriters with ByteStreamWriters, for read output.
|
jpayne@68
|
3727 34.19
|
jpayne@68
|
3728 Added parseCommon() to BBDuk, allowing samplerate flag.
|
jpayne@68
|
3729 34.20
|
jpayne@68
|
3730 FASTA_WRAP moved to Shared.
|
jpayne@68
|
3731 Numeric qual output is now wrapped after the same number of bases as fasta output.
|
jpayne@68
|
3732 "Low quality discards:" line is now triggered by chastity filter.
|
jpayne@68
|
3733 SPLIT_READS and MIN_READ_LEN are now disabled when processing reference in BBDuk/Seal.
|
jpayne@68
|
3734 Seal gained parseCommon and parseQuality.
|
jpayne@68
|
3735 34.21
|
jpayne@68
|
3736 Fixed MIN_READ_LEN bug (set to 0; should have been set to 1)
|
jpayne@68
|
3737 34.22
|
jpayne@68
|
3738 Added qfin (qual file) flags to BBDuk/Seal.
|
jpayne@68
|
3739 Applied BBDuk restrictleft and restrictright to filtering and masking; before, it was only valid for trimming.
|
jpayne@68
|
3740 Added calcCigarBases.
|
jpayne@68
|
3741 Required includeHardClip parameter for all calls to calcCigarLength(), start(), or stop().
|
jpayne@68
|
3742 Fixed bug in pileup caused by hard-clipped reads. Noted by Casey B.
|
jpayne@68
|
3743 34.23
|
jpayne@68
|
3744 DecontaminateByNormalization was excluding contigs with length under 50bp, which caused an assertion error.
|
jpayne@68
|
3745 Fixed a crash in BBDuk2 when not using a reference. Noted by Dave O.
|
jpayne@68
|
3746 Added entropy filter to BBDuk/BBDuk2. Set "entropy=X" where X is 0 to 1 to filter reads with less entropy than X.
|
jpayne@68
|
3747 34.24
|
jpayne@68
|
3748 Added maxreads flag to readlength.sh.
|
jpayne@68
|
3749 Fixed bug in BBMap - when directly outputting coverage, secondary alignments were never being used.
|
jpayne@68
|
3750 BBMap now uses the "ambig" and "secondary" flags to determine whether to include secondary site coverage. Specifically, "ambig=all" will use secondary sites, while other modes will not unless "secondary=t". In other words, use of secondary sites in coverage will be exactly the same as use of them in a sam output file. Removed "uscov=t Include secondary alignments when calculating coverage." from shellscript.
|
jpayne@68
|
3751 Fixed minid trumping minratio when both were specified. Now, the last one specified will be used.
|
jpayne@68
|
3752 Added pileup support for reads with asterisks instead of bases, as long as they have a cigar string. Also sped up calculation of read stop position.
|
jpayne@68
|
3753 Cigar string 'M' symbols are now converted to match string 'N' symbols if there is no reference.
|
jpayne@68
|
3754 34.25
|
jpayne@68
|
3755 BBMerge initialization order bug fixed; it was preventing jni from being used with the "loose" or "vloose" flags. Noted by sarvidsson (SeqAnswers).
|
jpayne@68
|
3756 34.26
|
jpayne@68
|
3757 Fixed semiperfect mode allowing non-semiperfect rescued alignments. Noted by Dave O.
|
jpayne@68
|
3758 Fixed ReadStats columns header for qhist when mhist was also generated.
|
jpayne@68
|
3759 Fixed an inequality in BBMergeOverlapper that favored shorter overlaps with an equal number of mismatches, in some cases. Had no impact on a normal 1M read benchmark except when margin=0, where it tripled the false-positive rate.
|
jpayne@68
|
3760 34.27
|
jpayne@68
|
3761 Enabled verbose mode in BBMergeOverlapper.
|
jpayne@68
|
3762 34.28
|
jpayne@68
|
3763 Added "align2." to sam header command line of BBMap.
|
jpayne@68
|
3764 Fixed bug in BBMap that could cause "=" to be printed for "rnext" even when pairs were on different scaffolds. Noted by rkanwar (SeqAnswers).
|
jpayne@68
|
3765 34.30
|
jpayne@68
|
3766 Reformat can now produce indelhist from sam files prior to v1.4.
|
jpayne@68
|
3767 Fixed a crash bug in BBMap caused by an improper assertion. Noted by Rob Egan.
|
jpayne@68
|
3768 34.31
|
jpayne@68
|
3769 BBDuk/Seal now recognize "scafstats" flag as equivalent to "stats".
|
jpayne@68
|
3770 Seal now defaults to 5 stats columns (includes #bp).
|
jpayne@68
|
3771 Wrote BBTool_ST, and abstract superclass for singlethreaded BBTools.
|
jpayne@68
|
3772 Clarified documentation of "trimq=X" as meaning "regions with average quality under X will be trimmed".
|
jpayne@68
|
3773 Fixed major bug in RQCFilter/BBQC: "forcetrimmod" was being set to same value as "ktrim". Noted by Brian Foster.
|
jpayne@68
|
3774 34.32
|
jpayne@68
|
3775 Changed the way BBMerge handles qualities to make it 40% faster (in java mode). Reduced size of jni matrix accordingly.
|
jpayne@68
|
3776 Fixed lack of readgroup tags for unmapped reads in sam format. Noted by Rahul (SeqAnswers).
|
jpayne@68
|
3777 Ensured Read.CHANGE_QUALITY affects both lower (<0) and upper (>41) values.
|
jpayne@68
|
3778 34.33
|
jpayne@68
|
3779 Pushed BBMergeOverlapper.c to commit.
|
jpayne@68
|
3780 34.34
|
jpayne@68
|
3781 Documented trimfragadapter and removehuman in RQCFilter.
|
jpayne@68
|
3782 Added Parser flag for Shared.READ_BUFFER_LENGTH (readbufferlength).
|
jpayne@68
|
3783 Added Parser flag for Shared.READ_BUFFER_MAX_DATA (readbufferdata).
|
jpayne@68
|
3784 Added Parser flag for Shared.READ_BUFFER_NUM_BUFFERS (readbuffers).
|
jpayne@68
|
3785 RQCFilter now accepts multiple references for decontamination by mapping.
|
jpayne@68
|
3786 Added FuseSequence (the first BBTool_ST subclass) and fuse.sh, for gluing contigs together with Ns.
|
jpayne@68
|
3787 Reformatted many scripts' help info to remove echo statements.
|
jpayne@68
|
3788 Fixed bugs in stats and countgc; they were not including undefined bases when printing the length in gcformat=1 and gcformat=4.
|
jpayne@68
|
3789 Replaced all instances of .bases.length with .length(), to prevent null pointer exceptions (for example in sam lines with no bases).
|
jpayne@68
|
3790 Added cat and dog flags to rqcfilter.
|
jpayne@68
|
3791 Changed defaults of BBMask to reduce amount masked in cat and dog to ~1% of genome. This still masks all of the coincidental low-complexity hits from fungi.
|
jpayne@68
|
3792 Determined that dog is contaminated with fungus, particularly chr7 and chr13.
|
jpayne@68
|
3793 34.35
|
jpayne@68
|
3794 Fixed a bug in which data was retained from the prior index when indexing a second fasta file in nodisk mode.
|
jpayne@68
|
3795 34.36
|
jpayne@68
|
3796 Disabled an assertion in BBMerge that the input is paired; it crashes if the input file is empty.
|
jpayne@68
|
3797 34.37
|
jpayne@68
|
3798 NSLOTS is now ignored if at least 16, to account for new 20-core nodes.
|
jpayne@68
|
3799 ReadWrite.getOutputStream now creates the directory structure if it does not already exist. Problem discovered by Brian Foster.
|
jpayne@68
|
3800 BBQC and RQCFilter now strip directory names before writing temp files.
|
jpayne@68
|
3801 BBDuk now correctly reports number of reads quality-filtered.
|
jpayne@68
|
3802 Added "unmappedonly" flag to Reformat.
|
jpayne@68
|
3803 RQCFilter now defaults to using TMPDIR.
|
jpayne@68
|
3804 34.38
|
jpayne@68
|
3805 BBMap now prints reads/second correctly. Before, it actually displayed pairs/second with paired data.
|
jpayne@68
|
3806 Added maxq flag to BBMerge, which allows quality values over 41 where reads overlap. Requested by Eric J.
|
jpayne@68
|
3807 Changed CoveragePileup from TextFiles to ByteFiles; increased read speed by 3.65x.
|
jpayne@68
|
3808 Changed CoveragePileup from TextStreamWriters to ByteStreamWriters; increased write speed by 1.46x.
|
jpayne@68
|
3809 Fixed a bug in BBQC/RQCFilter: paired input and interleaved output was getting its paired status lost. Noted by Simon P.
|
jpayne@68
|
3810 Reformat, when in "mappedonly" or "unmappedonly" mode, now excludes reads with no bases or secondary alignments.
|
jpayne@68
|
3811 34.39
|
jpayne@68
|
3812 Human contaminant removal is now optional in BBQC.
|
jpayne@68
|
3813 34.40
|
jpayne@68
|
3814 ConcurrentReadOutputStream made abstract superclass.
|
jpayne@68
|
3815 Added ConcurrentGenericReadInputStream, the default implementation.
|
jpayne@68
|
3816 Added ConurrentReadOutputStreamD, distributed template.
|
jpayne@68
|
3817 Merged some duplicate methods in MPIWrapper/ConcurrentReadInputStreamMPI.
|
jpayne@68
|
3818 34.41
|
jpayne@68
|
3819 Added some features to CoveragePileup, FilterByCoverage, and DecontaminateByNormalization to quantify low-coverage regions on otherwise high-coverage contigs.
|
jpayne@68
|
3820 Added parser fastadump flag to toggle dumping of kmers as fasta vs 2-columns.
|
jpayne@68
|
3821 Fixed a couple bugs in RQCFilter which mixed up names of stats files for trimming and filtering.
|
jpayne@68
|
3822 RQCFilter will now map to cat, dog, and human together with BBSplit if all three are specified, and produce "refstats.txt".
|
jpayne@68
|
3823 BBDuk/Seal now support ambiguous IUPAC codes in reference sequences.
|
jpayne@68
|
3824 34.42
|
jpayne@68
|
3825 ByteFile now returns empty lines as byte[0] instead of null. This allows processing of fastq files with 0-length reads. Noted by lankage (SeqAnswers).
|
jpayne@68
|
3826 Fixed a bug in FastaToChromArrays2 - blank lines in fasta files were interpreted as breaks between sequences. Noted by Alex Spunde.
|
jpayne@68
|
3827 Fixed "unmappedonly" flag in reformat.sh - it was providing inverted output. Noted by Kristin T.
|
jpayne@68
|
3828 34.43
|
jpayne@68
|
3829 Improved MD tag generation. Reference Ns were not being counted, and unnecessary zeros were appearing between adjacent substitutions. Noted by Jason S.
|
jpayne@68
|
3830 expectedErrors() and averageQuality() both now require a boolean parameter, includeUndefined.
|
jpayne@68
|
3831 Fixed a bug in BBQC's output directory - primary output was going to scratch. Noted by Simon P.
|
jpayne@68
|
3832 Added path to BBSplit's help menu. Noted by Ed K.
|
jpayne@68
|
3833 34.44
|
jpayne@68
|
3834 TranslateSixFrames now can accept AA input and produce NT output.
|
jpayne@68
|
3835 Merged dev branch into master.
|
jpayne@68
|
3836 Enabled CrosMPI to be created when CRIS_MPI is set to true.
|
jpayne@68
|
3837 BBDuk and Seal now use MPI streams correctly for reading the reference (when MPI is enabled).
|
jpayne@68
|
3838 Added truseq_rna.fa.gz to resources.
|
jpayne@68
|
3839 34.45
|
jpayne@68
|
3840 Added BBDuk skipr1/skipr2 flags. Requested by Stephanie H.
|
jpayne@68
|
3841 Fixed a null pointer in ConcurrentReadOutputStreamD.
|
jpayne@68
|
3842 34.46
|
jpayne@68
|
3843 Added SamLine.parseFlagOnly(byte[]) for rapid classification of sam lines.
|
jpayne@68
|
3844 Revised SplitSamFile and added splitsam.sh to the public distribution. It's now fast (~540MB/s).
|
jpayne@68
|
3845 Added a table of contents to /resources/.
|
jpayne@68
|
3846 34.47
|
jpayne@68
|
3847 Fixed bug parens around in FindTipDeletions; it was sometimes running when it should have been disabled.
|
jpayne@68
|
3848 Added swap flag to Reformat, for substituting one base for another (as in bisulfite treatment).
|
jpayne@68
|
3849 Added underscore flag to Reformat.
|
jpayne@68
|
3850 Fixed threads flag in BBDuk; it was getting parsed in 2 places and never set.
|
jpayne@68
|
3851 Added qhdist/qhdist2 flags to BBDuk/Seal for mutating query kmers. Suggested by sdriscoll (SeqAnswers).
|
jpayne@68
|
3852 Corrected mkh flag in Seal. Noted by Vasanth S.
|
jpayne@68
|
3853 FastaReadInputStream now has a mandatory amino field in constructor.
|
jpayne@68
|
3854 34.48
|
jpayne@68
|
3855 Bitsets and coverage arrays can now both be disabled in pileup.
|
jpayne@68
|
3856 Reorganized buffer lengths in BBIndexPacBio to reduce memory usage and support long (6000bp) reads with shorter kmers, down to 9bp.
|
jpayne@68
|
3857 Added small rna adapter path to BBQC/RQCFilter.
|
jpayne@68
|
3858 Fixed FilterReadsByName processing of sam files; bug found by Marissa Miller.
|
jpayne@68
|
3859 Accelerated and reduced memory usage of FilterReadsByName; moved name parsing over to Tools.
|
jpayne@68
|
3860 Added ReadStreamWriter.USE_ATTACHED_SAMLINE.
|
jpayne@68
|
3861 34.49
|
jpayne@68
|
3862 Fixed qin/qout flags in many classes; they were being ignored. Noted by Jason H.
|
jpayne@68
|
3863 Added Nextera LMP adapter sequences.
|
jpayne@68
|
3864 34.50
|
jpayne@68
|
3865 Added AssemblyStats format=7. Requested by Andrew Tritt.
|
jpayne@68
|
3866 34.51
|
jpayne@68
|
3867 Added physical (aka fragment) coverage flag to Pileup and BBMap.
|
jpayne@68
|
3868 Added rpkm/fpkm output to Piluep and BBMap. Requested by Vasanth.
|
jpayne@68
|
3869 Changed Seal FPKM calcluation; it was dividing by number of mapped reads rather than number of mapped fragments.
|
jpayne@68
|
3870 34.52
|
jpayne@68
|
3871 Added SmallKmerFrequency and commonkmers.sh. Requested by Bill A.
|
jpayne@68
|
3872 Fixed a bug in ReadStreamByteWriter; "attachment" mode was printing a period instead of newline.
|
jpayne@68
|
3873 34.53
|
jpayne@68
|
3874 Added graphical display of GC level to gchist. The gcplot flag works with all programs that use gchist. Requested by Kecia D.
|
jpayne@68
|
3875 Added reparse to BBTool_ST. This allows parsing of subclass fields which are otherwise overwritten by their defaults.
|
jpayne@68
|
3876 Added count output to SmallKmerFrequency.
|
jpayne@68
|
3877 34.54
|
jpayne@68
|
3878 Added cumulative column to gchist. This is also enabled by the gcplot flag. Requested by Seung-Jin.
|
jpayne@68
|
3879 Added BBMap normcov and normcovo flags. Requested by Vasanth.
|
jpayne@68
|
3880 Added support for out= to stats and statswrapper. Requested by Brian Foster.
|
jpayne@68
|
3881 Fixed a bug in which stdout was being closed by closing a PrintWriter that wrapped it.
|
jpayne@68
|
3882 Disabled a message about read pairing for sam input.
|
jpayne@68
|
3883 Finished DedupeByMapping and created dedupebymapping.sh.
|
jpayne@68
|
3884 34.55
|
jpayne@68
|
3885 Fixed a bug in BBMap's coverage flags; normcovo was called normcovOverall. Noted by Matt Nolan.
|
jpayne@68
|
3886 34.56
|
jpayne@68
|
3887 Fixed qin flag being ignored by BBMap. Noted by Adrian P.
|
jpayne@68
|
3888 Removed obsolete classes ReformatFasta and ReverseComplement (both handled by ReformatReads now).
|
jpayne@68
|
3889 34.57
|
jpayne@68
|
3890 Added BBMap timetag flag and thist output.
|
jpayne@68
|
3891 Fixed bug in AssemblyStats GC output. Noted by Jasmyn P.
|
jpayne@68
|
3892 Added format=0 to stats (no output).
|
jpayne@68
|
3893 34.58
|
jpayne@68
|
3894 Version bump.
|
jpayne@68
|
3895 34.59
|
jpayne@68
|
3896 BBSplit now supports # operator in filenames. Requested by Vicente G.
|
jpayne@68
|
3897 BBMap now prevents cross-scaffold alignments if any output file is sam or bam, not just the primary one.
|
jpayne@68
|
3898 Reformat now has a primaryonly flag to prevent output of secondary alignments.
|
jpayne@68
|
3899 Added KillSwitch class. This will kill the process after X seconds with under Y CPU utilization. It is invoked by the command line argument "monitor" for post programs.
|
jpayne@68
|
3900 34.60
|
jpayne@68
|
3901 BBDuk can now remove reads with less than X% of any single base. Requested by Alicia C.
|
jpayne@68
|
3902 Added reformat 'filterbits' and 'requiredbits' flags.
|
jpayne@68
|
3903 Removed obsolete colorspace-specific fields Read.expectedErrors and Read.mapLength.
|
jpayne@68
|
3904 Wrote SplitNexteraLMP.java and splitnextera.sh.
|
jpayne@68
|
3905 Added Read.subRead(from, to).
|
jpayne@68
|
3906 Added BBMap call to ReadStats.checkFiles() to force a crash before running, rather than after running, if there are problems.
|
jpayne@68
|
3907 Changed nextera_LMP_linker.fa.gz to a double linker after examining real data.
|
jpayne@68
|
3908 Modified BBTool_ST for greater flexibility with additional IO streams.
|
jpayne@68
|
3909 Wrote MultiStateAligner9XFlat.java. For testing. A flatter, faster MSA.
|
jpayne@68
|
3910 34.61
|
jpayne@68
|
3911 Completely removed all support for Solid colorspace.
|
jpayne@68
|
3912 Removed ChromosomeArrayCompressed.
|
jpayne@68
|
3913 Removed MultiStateAligner9fs.
|
jpayne@68
|
3914 Removed FastaStream, QualStream, FastqReadInputStream_old.
|
jpayne@68
|
3915 Renamed FastaQualReadInputStream3 to FastaQualReadInputStream.
|
jpayne@68
|
3916 Added kmer.TableLoaderLockfree. This unifies the load portion of BBDuk and Seal for filling AbstractKmerTables.
|
jpayne@68
|
3917 Added kmer.TableReader. This makes it easy to read data from kmer tables.
|
jpayne@68
|
3918 Added error message for KmerCountExact if k<1 or k>31.
|
jpayne@68
|
3919 Added warning to BBDuk if no kmers are loaded but a kmer operation is specified.
|
jpayne@68
|
3920 Added kmask flag to BBDuk.sh help and clarified ktrim flag.
|
jpayne@68
|
3921 Timer now automatically self-starts upon creation.
|
jpayne@68
|
3922 Added NexteraLMP support to rqcfilter.
|
jpayne@68
|
3923 Added versions of Illumina contaminant files without Nextera adapter junctions.
|
jpayne@68
|
3924 34.62
|
jpayne@68
|
3925 Fixed null pointer in BBDukF. Found by Alicia C.
|
jpayne@68
|
3926 34.63
|
jpayne@68
|
3927 Created tax package for processing NCBI taxonomy data.
|
jpayne@68
|
3928 Added IntList.getUniqueCount()
|
jpayne@68
|
3929 Added TaxNode and TaxTree, for accumulating taxa counts.
|
jpayne@68
|
3930 Added GiToNcbi, for translating gi numbers to taxa ids.
|
jpayne@68
|
3931 Added RenameGiToNcbi, for renaming sequences (e.g. nt) with their taxa id.
|
jpayne@68
|
3932 Added SortByTaxa, for sorting sequences based on taxonomy for better compression.
|
jpayne@68
|
3933 TaxTree and GiToNcbi now support serialized input; much smaller.
|
jpayne@68
|
3934 Integrated taxonomy support into Seal.
|
jpayne@68
|
3935 Seal now uses 9 ways, and uses pigz when loading the reference.
|
jpayne@68
|
3936 Added ftr2 (forcetrimright2) flag, which allows trimming a fixed number of bases on the rightmost end.
|
jpayne@68
|
3937 Added BBMap parsing logic to prevent bad vad values of maxsites and maxsites2.
|
jpayne@68
|
3938 Fixed BBTools failure to find primes.txt.gz if there is a space in the classpath (Matt Kearse).
|
jpayne@68
|
3939 All calls to average quality now require a max number of bases to process.
|
jpayne@68
|
3940 Added maqb flag (min average quality bases); maq calculation will be restricted to that many leading bases. (Shoudan)
|
jpayne@68
|
3941 Retired FilterReads (superceded by Reformat and BBNorm).
|
jpayne@68
|
3942 Added reformat "tossjunk", "fixjunk", and "aminoin" flags.
|
jpayne@68
|
3943 34.64
|
jpayne@68
|
3944 Fixed off-by-one error in forcetrimright2.
|
jpayne@68
|
3945 34.65
|
jpayne@68
|
3946 Made gi2taxid.sh, which calls RenameGiToNcbi.
|
jpayne@68
|
3947 RenameGiToNcbi updated to split input into valid and invalid output, where invalid gets anything with no taxid.
|
jpayne@68
|
3948 Added FileFormat.isFasta(String) method.
|
jpayne@68
|
3949 Improved SortByTaxa. Now does preorder traversal of tree, and supports dummy nodes, fusing, and promotion.
|
jpayne@68
|
3950 Added sortbytaxa.sh.
|
jpayne@68
|
3951 BBSplit ref= can now point to directories. Requested by Manuel K.
|
jpayne@68
|
3952 34.66
|
jpayne@68
|
3953 Fixed an uncaught overflow in ByteBuilder.expand().
|
jpayne@68
|
3954 Added SortByTaxa max fusion length.
|
jpayne@68
|
3955 Added barcode filtering to Reformat and BBDuk.
|
jpayne@68
|
3956 Sped up chastity filtering and allowed it to process reads with / before read number.
|
jpayne@68
|
3957 Added chastity filtering and barcode filtering to RQCFilter.
|
jpayne@68
|
3958 34.67
|
jpayne@68
|
3959 Fixed BBDuk double-counting chastity/barcode-filtered reads.
|
jpayne@68
|
3960 Fixed BBDuk overcounting of reads that were trimmed by overlap.
|
jpayne@68
|
3961 BBMap nzo flag now affects refstats and scafstats in addition to covstats (Vasanth).
|
jpayne@68
|
3962 Added BBMap sortstats flag.
|
jpayne@68
|
3963 Added BBMap rebuild flag (Vasanth).
|
jpayne@68
|
3964 34.68
|
jpayne@68
|
3965 Added qtrim=window flag (Alicia).
|
jpayne@68
|
3966 Added slashspace flag to Reformat (disable space when adding /1 and /2 to read names).
|
jpayne@68
|
3967 Added clearzone to Seal (Vasanth).
|
jpayne@68
|
3968 Added stoptag to Reformat.
|
jpayne@68
|
3969 Added boundstag to Parser (Shoudan).
|
jpayne@68
|
3970 34.69
|
jpayne@68
|
3971 Slight fix for samline inbounds detection.
|
jpayne@68
|
3972 Fixed a corrupt Truseq RNA adapter sequence.
|
jpayne@68
|
3973 Added Truseq RNA adapters to adapters.fa.
|
jpayne@68
|
3974 34.70
|
jpayne@68
|
3975 Added BBMerge useratio mode (enabled by default in vloose mode).
|
jpayne@68
|
3976 Added BBMerge adapter processing.
|
jpayne@68
|
3977 Added BBDuk kmask=lowercase.
|
jpayne@68
|
3978 34.71
|
jpayne@68
|
3979 Added BBMerge uloose and vstrict modes.
|
jpayne@68
|
3980 Added BBMerge requireratiomatch, ratiominoverlapreduction, and ratiooffset flags.
|
jpayne@68
|
3981 34.72
|
jpayne@68
|
3982 Adjusted BBMerge uloose settings.
|
jpayne@68
|
3983 34.73
|
jpayne@68
|
3984 Added RandomReads path flag.
|
jpayne@68
|
3985 Increased BBMerge defaults to -Xmx1000m and readbufferlen=400 to improve scaling.
|
jpayne@68
|
3986 baseToComplementExtended array now maps lowercase letters to lowercase letters.
|
jpayne@68
|
3987 Changed BBMerge to use floating-point probabilities.
|
jpayne@68
|
3988 Fixed missing quote mark in bbsplit.sh. Noted by Manuel K.
|
jpayne@68
|
3989 34.74
|
jpayne@68
|
3990 Added Reformat quantize flag (Alicia Clum).
|
jpayne@68
|
3991 34.75
|
jpayne@68
|
3992 RandomReads ignored q=x in perfect mode, and perfect flag did not work without a number.
|
jpayne@68
|
3993 Added Reformat skipreads flag.
|
jpayne@68
|
3994 Added DualCris, for dual input files of unequal length.
|
jpayne@68
|
3995 Repair.sh now works if r1 and r2 files are unequal length.
|
jpayne@68
|
3996 Added flags for BBMerge normalmode and ratiomode.
|
jpayne@68
|
3997 Made BBMerge default to ratiomode-only for default and loose stringencies.
|
jpayne@68
|
3998 BBMerge efilter now enabled by default in all modes.
|
jpayne@68
|
3999 Improved BBMerge efilter (now occurs after both merge modes).
|
jpayne@68
|
4000 Updated BBMerge efilter to examine only the trailing X bases depending on mininsert setting.
|
jpayne@68
|
4001 Made normalmode and ratiomode more independent, so requireratiomatch flag is more efficient.
|
jpayne@68
|
4002 Added ratiomode settings for strict, vstrict and ustrict. Ustrict has requireratiomatch enabled.
|
jpayne@68
|
4003 Increased readlength.sh default memory and reduced number of reads in buffer.
|
jpayne@68
|
4004 Added BBMerge ordered flag; disabled by default.
|
jpayne@68
|
4005 34.76
|
jpayne@68
|
4006 Accelerated BBMerge by using a buffer for quality values translated to probabilities.
|
jpayne@68
|
4007 Added early exits before mininsert0 and minoverlap, and added mininsert0 flag.
|
jpayne@68
|
4008 Added ecc mode, for correction only and no merging.
|
jpayne@68
|
4009 Tested removal of runtime division; speed unchanged.
|
jpayne@68
|
4010 34.77
|
jpayne@68
|
4011 Added BBMerge pfilter flag; discards overlaps with low probability mismatches.
|
jpayne@68
|
4012 Fixed BBMergeOverlapper.expectedMismatches() and probability(). Both were considering the wrong bases.
|
jpayne@68
|
4013 Increased mateByOverlapRatioJava speed with altBadlimt. Slightly increases false-positive rate.
|
jpayne@68
|
4014 Added reformat itn flag (convert iupac symbols to N).
|
jpayne@68
|
4015 Simplified mateByOverlap and mateByOverlapRatio - removed no-quality loop.
|
jpayne@68
|
4016 BBMerge uloose settings tweaked; no longer uses normalmode.
|
jpayne@68
|
4017 34.78
|
jpayne@68
|
4018 Fixed BBDuk compatibility with new BBMergeOverlapper float array requirement.
|
jpayne@68
|
4019 34.79
|
jpayne@68
|
4020 Integrated new BBMergeOverlapper ratiomode call into BBDuk. Uses settings similar to strict mode.
|
jpayne@68
|
4021 Adjusted BBMerge defaults some more.
|
jpayne@68
|
4022 34.80
|
jpayne@68
|
4023 Added outu flag to demuxbyname.
|
jpayne@68
|
4024 Added overlapWithoutQuality (owq) and overlapUsingQuality (ouq) flags to BBMerge; default is ouq=f owq=t.
|
jpayne@68
|
4025 Made a quality-free loop in BBMergeOverlapper now that quality is disabled by default; 20% faster.
|
jpayne@68
|
4026 Removed minoi flag and some legacy fields from Overlapper that dealt with mapping information.
|
jpayne@68
|
4027 34.81
|
jpayne@68
|
4028 Split read verification into a different function outside of constructor.
|
jpayne@68
|
4029 Added a mode for worker threads to verify reads, instead of at construction time.
|
jpayne@68
|
4030 Added input file check to BBMerge.
|
jpayne@68
|
4031 Greatly increased speed of overlapper with findBestRatio function.
|
jpayne@68
|
4032 34.82
|
jpayne@68
|
4033 Minor changes to BBMerge entropy calculations.
|
jpayne@68
|
4034 Added BBMerge static function errorCorrect().
|
jpayne@68
|
4035 Changed fast from normalmode to ratiomode.
|
jpayne@68
|
4036 Added ecc flag to BBDuk and Seal.
|
jpayne@68
|
4037 34.83
|
jpayne@68
|
4038 Fixed DualCris and SplitPairsAndSingles (repair.sh). Crash bug found by GenoMax.
|
jpayne@68
|
4039 34.84
|
jpayne@68
|
4040 Added overlap flag to BBNorm to regulate whether overlapping is used for error correction.
|
jpayne@68
|
4041 BBMerge now has static functions mergeableFraction() and makeInsertHistogram().
|
jpayne@68
|
4042 Added ecc flag to kmercountexact.
|
jpayne@68
|
4043 Added normbins flag to BBMap and fixed normcovo flag.
|
jpayne@68
|
4044 34.85
|
jpayne@68
|
4045 Finished ArrayListSet.
|
jpayne@68
|
4046 Created MultiCros and a functional reference implementation in main().
|
jpayne@68
|
4047 Upgraded Seal handling of refnames mode; now all sequences for a ref file get the same ID (in that mode).
|
jpayne@68
|
4048 Added BBSplit-style output for Seal refstats. Requested by Vasanth.
|
jpayne@68
|
4049 Removed outsingle from Seal.
|
jpayne@68
|
4050 Added multiple output streams to Seal with the pattern flag; acts like basename in BBSplit.
|
jpayne@68
|
4051 DemuxByName now supports arbitrary output streams without needing a name list.
|
jpayne@68
|
4052 Moved directory parsing from BBSplit to Tools.getFileOrFiles().
|
jpayne@68
|
4053 34.86
|
jpayne@68
|
4054 Fixed text string for output stats of FilterByName. Noted by Alex C.
|
jpayne@68
|
4055 Tiny change to BBMergeOverlapper regarding Ns.
|
jpayne@68
|
4056 Added mkf (minkmerfraction) flag to Seal. Requested by Vasanth S.
|
jpayne@68
|
4057 34.87
|
jpayne@68
|
4058 Added /ref/qual/ directory for recalibration matrices.
|
jpayne@68
|
4059 Added recalibrate() function to CalcTrueQuality.
|
jpayne@68
|
4060 recalibrate (recal) is now a parser flag and has been enabled for BBMerge, Reformat, and BBDuk.
|
jpayne@68
|
4061 Added fixheader flag to parser. Requested by Shijie.
|
jpayne@68
|
4062 Changed parseInt to parseLong for matrix loading.
|
jpayne@68
|
4063 Added qb123 matrix.
|
jpayne@68
|
4064 Added estimate by max observed error rate rather than average.
|
jpayne@68
|
4065 34.89
|
jpayne@68
|
4066 Added unicode2ascii.sh for fixing files with strange symbols.
|
jpayne@68
|
4067 34.90
|
jpayne@68
|
4068 Fixed FastaReadInputStream's ability to process extended ascii characters (128-255).
|
jpayne@68
|
4069 Fixed BBDuk, Reformat, and Seal sometimes ignoring the ftr2 flag.
|
jpayne@68
|
4070 Added adapter sequence detection to BBMerge (outa flag).
|
jpayne@68
|
4071 Made CalcTrueQuality multithreaded.
|
jpayne@68
|
4072 Fixed ROOT_QUALITY not getting set with the path flag.
|
jpayne@68
|
4073 Added path flag to BBDuk and BBMerge.
|
jpayne@68
|
4074 Added notags flag to BBMap.
|
jpayne@68
|
4075 34.91
|
jpayne@68
|
4076 CalcTrueQuality now tracks paired reads independently.
|
jpayne@68
|
4077 BBDuk and various programs no longer deadlock waiting for sam header to be read.
|
jpayne@68
|
4078 Added BBMerge iupacton (itn) flag.
|
jpayne@68
|
4079 Seal can now write sam output from sam input.
|
jpayne@68
|
4080 BBDuk can write sam output from sam input, but only for quality recalibration, not other operations.
|
jpayne@68
|
4081 Made RemapQuality to test recalibration.
|
jpayne@68
|
4082 Internal 2-pass recalibration is now working.
|
jpayne@68
|
4083 Changed $CMD to eval $CMD in all shellscripts, which allows escaping spaces in filenames using backslash. Thanks Jon!
|
jpayne@68
|
4084 Added qchist (qualityCountHistogram). Gives counts of bases with each quality score.
|
jpayne@68
|
4085 Fixed BBDuk not testing to see if qahist was set.
|
jpayne@68
|
4086 Changed CalcTrueQuality default observationcutoff to 100, because higher settings cause odd-looking graphs.
|
jpayne@68
|
4087 34.92
|
jpayne@68
|
4088 Added RenameReads prefixonly flag.
|
jpayne@68
|
4089 Added driver.SummarizeCoverage to summarize cross-contamination scafstats files.
|
jpayne@68
|
4090 Changed calcmem.sh to work correctly even if ulimit fails. Noted by Tomas B. Thanks to vladr (stackoverflow)!
|
jpayne@68
|
4091 Changed bbduk.sh/bbduk2.sh default ram to 1400m from 2000m so that they should work on 32-bit MacOS systems without setting the -Xmx flag.
|
jpayne@68
|
4092 Removed some redundancy from TextFile.
|
jpayne@68
|
4093 DecontaminateByNormalization now named CrossBlock.
|
jpayne@68
|
4094 Added Read.uToT and parser utot flag for converting uracil to thymine in reads.
|
jpayne@68
|
4095 BBMap now converts U to T when generating the reference, and all degenerate bases to N.
|
jpayne@68
|
4096 Changed SummarizeCoverage to be memory-efficient with large files (e.g. coverage vs nt).
|
jpayne@68
|
4097 Removed colorspace from ChromosomeArray.
|
jpayne@68
|
4098 Fixed a bug in which cigar strings were sometimes not printed for secondary alignments or when using filters. Noted by Jason S.
|
jpayne@68
|
4099 Added qb12 matrix to CalcTrueQuality.
|
jpayne@68
|
4100 Added support for adjusting read quality score limits beyond 2~41 with the mincalledquality and maxcalledquality flags.
|
jpayne@68
|
4101 Recalibration matrices may be extended above Q41 with the recalqmax flag, for processing consensus reads.
|
jpayne@68
|
4102 34.93
|
jpayne@68
|
4103 Fixed version flag in parser; it was being ignored if there were arguments.
|
jpayne@68
|
4104 Fixed calcmem.sh behavior on a Mac (or other system without /proc/mem) when ulimit=unlimited.
|
jpayne@68
|
4105 Fixed issue where BBTool_ST subclasses were having params overridden by defaults.
|
jpayne@68
|
4106 ReadStats now correctly tracks read1 and read2 for qhist using sam input, instead of lumping them together.
|
jpayne@68
|
4107 Fixed CalcTrueQuality's recalibration of sam input; it was applying the read1 profile to both reads. Finally works perfectly.
|
jpayne@68
|
4108 Added BBSplit force-rebuild logic.
|
jpayne@68
|
4109 Coverage histogram now goes to 1 million in 32bit mode, instead of 64k.
|
jpayne@68
|
4110 Added pjet to rqcfilter/bbqc.
|
jpayne@68
|
4111 Fixed a division-by-zero bug in ReadStats. Noted by Seung-Jin.
|
jpayne@68
|
4112 Removed confusing readme line stating that BBMap is free for noncommercial use. This is true, but it is also free for commercial use.
|
jpayne@68
|
4113 Added stdev to covstats output (as long as arrays are enabled).
|
jpayne@68
|
4114 Added driver.SummarizeSealStats and summarizeseal.sh for analyzing cross-contamination results using Seal stats output.
|
jpayne@68
|
4115 Added summarizescafstats.sh script for driver.SummarizeCoverage.
|
jpayne@68
|
4116 Added jgi.FilterReadsWithSubs it select only those reads with substitution errors for bases in a specified quality range.
|
jpayne@68
|
4117 Added phix_adapters.fa.gz to /resources and updated contents.
|
jpayne@68
|
4118 Added qahist average deviation header line.
|
jpayne@68
|
4119 34.94
|
jpayne@68
|
4120 BBMap now produces an error message if indexing fastq files rather than just crashing.
|
jpayne@68
|
4121 Added config file support to all BBTools via the config= flag.
|
jpayne@68
|
4122 Added some sample config files and a config file readme.
|
jpayne@68
|
4123 Moved bloom filter and count-min sketch data structures to bloom package.
|
jpayne@68
|
4124 34.95
|
jpayne@68
|
4125 Added trd flag to reformat.sh help. Noted missing by Esther Singer.
|
jpayne@68
|
4126 Added merge support to SplitNexteraLMP. Currently unknown which is better, merge=t or f.
|
jpayne@68
|
4127 Added better output names to RQCFilter. Requested by Bryce F.
|
jpayne@68
|
4128 Added kmer ownership to AbstractKmerTable.
|
jpayne@68
|
4129 KmerLink is now an AbstractKmerTable subclass.
|
jpayne@68
|
4130 Wrote Tadpole.
|
jpayne@68
|
4131 Fixed bug when reading empty fasta files. Noted by Matt K.
|
jpayne@68
|
4132 Added SamLine.countTrailingClip, and modified countLeadingClip. Now both have soft/hard clipping toggles.
|
jpayne@68
|
4133 Removed static SamLine.SUBTRACT_LEADING_SOFT_CLIP and replaced with required parameter.
|
jpayne@68
|
4134 Reduced default initial CoverageArray size from 16m to 500.
|
jpayne@68
|
4135 Added mincov, maxcov, and delcov flags to BBMask.
|
jpayne@68
|
4136 Updated BBMask readme.
|
jpayne@68
|
4137 Split IntList/LongList toString method into SetView and ListView.
|
jpayne@68
|
4138 Added Pileup toggle for including soft-clipped bases; default false.
|
jpayne@68
|
4139 Made Tadpole shell script.
|
jpayne@68
|
4140 Fixed Seal stats header indicating over 100% matched sequences with ambig=all. Noted by Esther Singer.
|
jpayne@68
|
4141 AbstractKmerTable classes now correctly return -1 instead of 0 if a key is not present.
|
jpayne@68
|
4142 Added AbstractKmerTable.clearOwner() to clean up ownership trails of abandoned contigs.
|
jpayne@68
|
4143 Fixed various bugs in Tadpole.
|
jpayne@68
|
4144 Added substring and case flags to FilterByName. Requested by Esther Singer.
|
jpayne@68
|
4145 Tadpole now works correctly multithreaded.
|
jpayne@68
|
4146 34.96
|
jpayne@68
|
4147 Fixed a bug in FastaReadInputStream not shutting down subprocesses when done.
|
jpayne@68
|
4148 Added LMP insert-size detection mode to Tadpole.
|
jpayne@68
|
4149 Added ByteBuilder.appendKmer(kmer, k).
|
jpayne@68
|
4150 Added Tadpole read-extension mode.
|
jpayne@68
|
4151 Tadpole now builds contigs from kmer seeds rather than contig seeds. Slower but more consistent.
|
jpayne@68
|
4152 Tadpole is now a complete assembler.
|
jpayne@68
|
4153 34.97
|
jpayne@68
|
4154 Added length, coverage, and GC to Tadpole contig names.
|
jpayne@68
|
4155 Added directional substrings for filterbyname. Requested by Esther Singer.
|
jpayne@68
|
4156 Disabled module lines in reformat.sh. Noted by Xiaoli D.
|
jpayne@68
|
4157 Renamed summarizecoverage to summarizescafstats.
|
jpayne@68
|
4158 Added ambig and kfilter flags to CrossBlock. Requested by Ken H.
|
jpayne@68
|
4159 34.98
|
jpayne@68
|
4160 BBDuk/BBDuk2 default maxrskip set to 1 (disabled), to reduce confusion.
|
jpayne@68
|
4161 Fixed Seal generating ArrayListSets even when pattern output was not specified.
|
jpayne@68
|
4162 Fixed Seal bug classifying both read1 and read2 as matched when only one matched in kpt=f mode (this IS the intended behavior in default kpt=t mode). Noted by Alex Spunde.
|
jpayne@68
|
4163 Fixed string compare bug in FilterReadsByName making substring=header and substring=name fail. Noted by Esther Singer.
|
jpayne@68
|
4164 34.99
|
jpayne@68
|
4165 Fixed BBMap not printing coverage statistics when machineout=t. Noted by Vasanth Singan.
|
jpayne@68
|
4166 Added minlen flag to filterbyname.
|
jpayne@68
|
4167 Changed Dedupe primary structure from HashMap to LinkedHashMap to (somewhat) preserve input order.
|
jpayne@68
|
4168
|
jpayne@68
|
4169
|
jpayne@68
|
4170
|
jpayne@68
|
4171 TODO: BBDuk crashes with K>31 (Alex Spunde).
|
jpayne@68
|
4172 TODO: Memory autodetection does not work on Amazon.
|
jpayne@68
|
4173 TODO: BBMap machineout to file (Vasanth).
|
jpayne@68
|
4174 TODO: chrombits and CHROMS_PER_BLOCK may be obsolete and ready to remove.
|
jpayne@68
|
4175 TODO: out=stdout.bam does not work.
|
jpayne@68
|
4176 TODO: Include deletions toggle for Pileup.
|
jpayne@68
|
4177 TODO: Soft-clipping coverage flag.
|
jpayne@68
|
4178 TODO: Add match/cigar/SamLine trimming to TrimRead.
|
jpayne@68
|
4179 TODO: Write Hollow.
|
jpayne@68
|
4180 TODO: Multithread splitnextera.
|
jpayne@68
|
4181 TODO: config flag in Parser
|
jpayne@68
|
4182 TODO: Normalize CalcTrueQuality on 50% GC by tracking GC rates (etc) observed in reads.
|
jpayne@68
|
4183 TODO: Make Recalibrate class and recalibrate.sh to automate everything.
|
jpayne@68
|
4184 TODO: Track quality-score accuracy per base location.
|
jpayne@68
|
4185 TODO: Track quality-score accuracy per base letter.
|
jpayne@68
|
4186 TODO: Tool to extract reads mapped to a specific locus.
|
jpayne@68
|
4187 TODO: Make it easy to test a decontam tool on the synth datasets.
|
jpayne@68
|
4188 TODO: Map unknowns in 48-sample-plate.
|
jpayne@68
|
4189 TODO: BBMerge return codes. -1 no solution, -2 ambig, -3 too long (short overlap), -4 too short.
|
jpayne@68
|
4190 TODO: Seal speed and mkf flags should work together.
|
jpayne@68
|
4191 TODO: Apply Seal refnames upgrade to taxonomy handling, if not already done.
|
jpayne@68
|
4192 TODO: BBNorm histout with 1pass/ecc does not seem to generate anything.
|
jpayne@68
|
4193 TODO: randomreads does not name reads by origin in fasta format.
|
jpayne@68
|
4194 TODO: Hamming distance for demuxbyname.
|
jpayne@68
|
4195 TODO: MultiCros wrapper and hash-based multi-listnum object.
|
jpayne@68
|
4196 TODO: Reformat should be able to trim mapped sam files (Aldo J).
|
jpayne@68
|
4197 TODO: Mask bases overlapping from Dedupe graph (Shoudan).
|
jpayne@68
|
4198 TODO: RQCFilter - dynamically switch between $TMPDIR or /dev/shm depending on input size and available disk space.
|
jpayne@68
|
4199 TODO: BBMerge - trim adapters for unmerged reads (?)
|
jpayne@68
|
4200 TODO: Fungal pipeline: FindErrors?
|
jpayne@68
|
4201 *TODO: BBMap calls calcCorrectness even when data is not synthetic.
|
jpayne@68
|
4202 TODO: BBMap File containing all reads/pairs that are not completely contained within a single contig. (Shoudan)
|
jpayne@68
|
4203 TODO: BBDuk/Seal - enable tracking of kmers by reference file rather than reference sequence.
|
jpayne@68
|
4204 TODO: Batch setting for BBDuk to operate on multiple files and auto-name output.
|
jpayne@68
|
4205 TODO: Get data from Chris B, count mismatched pairs, send to E.
|
jpayne@68
|
4206 TODO: Stats does not accurately estimate BBMap RAM usage for K=15.
|
jpayne@68
|
4207 TODO: Accelerate maxindel=0 mode for BBMap by banning MSA usage.
|
jpayne@68
|
4208 TODO: Redo DedupeByMapping so that it can handle sorted input using a heap.
|
jpayne@68
|
4209 TODO: MSA Flat - remove states to increase speed.
|
jpayne@68
|
4210 TODO: Dedupe does not work with sam input. (Lynn A.)
|
jpayne@68
|
4211 TODO: Change all instances of "remove bases with quality below minq" to "...trimq" in shellscripts.
|
jpayne@68
|
4212 TODO: Parse extra part of sam lines into a byte array (optionally).
|
jpayne@68
|
4213 *TODO: Dedupe crash on input in C:\temp\dd1\bad.fa (Shoudan).
|
jpayne@68
|
4214 TODO: Tile-based statistics and filtering for BBMap, BBDuk, etc.
|
jpayne@68
|
4215 TODO: Pileup could calculate ref/nonref coverage.
|
jpayne@68
|
4216 TODO: Marcel wants a program to essentially sort reads and remove duplicates that are at least X identity.
|
jpayne@68
|
4217 TODO: Move parsing of "threads" to parseCommonStatic and adjust all relevant classes.
|
jpayne@68
|
4218 TODO: Add 'remap' from Reformat flag to BBMap.
|
jpayne@68
|
4219 *TODO: BBMerge won't go below 17bp in normal mode or 26bp in loose mode, regardless of minoi flag.
|
jpayne@68
|
4220 TODO: BBMerge dynamic mode - test to determine best overlap limits.
|
jpayne@68
|
4221 TODO: Bed output of masked regions by BBMask, or regions with Ns.
|
jpayne@68
|
4222 TODO: Bed output of regions with coverage abover or below X (Bob Bower).
|
jpayne@68
|
4223 TODO: Document append in shellscripts.
|
jpayne@68
|
4224 TODO: Genbank format parser (Sam D). Looks confusing.
|
jpayne@68
|
4225 TODO: Decontam should break at (or N-mask) low-coverage areas rather than discarding the whole contig.
|
jpayne@68
|
4226 TODO: BED support for pileup. And make Pileup faster by ignoring irrelevent sam fields.
|
jpayne@68
|
4227 TODO: CrossMask. Accept set of files; for each, mask using BBDuk with all others as ref.
|
jpayne@68
|
4228 TODO: Study bisulfite data on BBMap. Possibly use multiple reference copies with different transforms (C->T, A-G, both, neither).
|
jpayne@68
|
4229 TODO: Shellscripts are not able to handle paths containing spaces.
|
jpayne@68
|
4230 TODO: Add mininsert flag to BBMap. And maybe maxinsert.
|
jpayne@68
|
4231 TODO: Parse MD tag when available.
|
jpayne@68
|
4232 TODO: CC rates for all 3 platforms in one chart; ignore R1/R2 differences.
|
jpayne@68
|
4233 *TODO: Dedupe loses reads when using paired data and run multithreaded.
|
jpayne@68
|
4234 TODO: document nhs flag.
|
jpayne@68
|
4235 TODO: Filter cross-contam plates with only depth and length, test cc rates.
|
jpayne@68
|
4236 TODO: Fix dedupe crash when minclustersize=1.
|
jpayne@68
|
4237 TODO: Clarify or fix what minid does in Dedupe.
|
jpayne@68
|
4238 TODO: Add ribosomal filtering to rqc.
|
jpayne@68
|
4239 TODO: Update BandedAlignerJNI for quicker width reset.
|
jpayne@68
|
4240 TODO: Optional penalty when seq ends before ref in banded.
|
jpayne@68
|
4241 TODO: Make sure AddAdapters is adding them correctly, i.e., reverse-complemented (or not).
|
jpayne@68
|
4242 TODO: Make list of proposed higher stringency adapter trimming changes and send to Vasanth/Erika.
|
jpayne@68
|
4243 TODO: Retire ErrorCorrect, and move the functionality over to another class.
|
jpayne@68
|
4244 TODO: Implement ErrorCorrectBulk in KmerNormalize. It is used in MateReadsMT.
|
jpayne@68
|
4245 TODO: BBMerge should allow optional inline error-correction for reads that fail to merge, and revert if they still fail.
|
jpayne@68
|
4246 TODO: Retire KmerCount7MT (non-atomic version).
|
jpayne@68
|
4247 *TODO: It appears that timeslip is being correctly applied by fillLimited (etc), but not by calcDelScore() or calcAffineScore().
|
jpayne@68
|
4248 TODO: Dedupe should warn if lowercase letters are present. (Kurt)
|
jpayne@68
|
4249
|
jpayne@68
|
4250
|
jpayne@68
|
4251 v33.
|
jpayne@68
|
4252 Added "usemodulo" flag to BBMap. Allows throwing away 80% of reference kmers to save memory. Slight reduction in sensitivity. Requested by Rob Egan.
|
jpayne@68
|
4253 Moved GetReads back to jgi package and fixed shellscript.
|
jpayne@68
|
4254 Fixed rare crash when using "local" mode on paired-end data on highly-repetitive genomes (Creinhardtii). Found by Vasanth S.
|
jpayne@68
|
4255 Improved "usemodulo" mode - it was biased against minus-strand hits. Now, it keeps kmers where (kmer%5==rkmer%5). Result is virtually no reduction in sensitivity (zero in error-free reads, and less than 0.01% in reads with 8% error).
|
jpayne@68
|
4256 BBMap will now discard reads shorter than "minlen".
|
jpayne@68
|
4257 Added "idhistbins" or "idbins" flag to BBMap; allows setting the number of bins used in the idhist.
|
jpayne@68
|
4258 Rescaled BBMap's MAPQ to be lower. It is now 0 for unmapped, 1-3 for ambiguous, and roughly 4-45 otherwise, with higher values allowed for longer reads.
|
jpayne@68
|
4259 Added a much flatter MSA version, "MultiStateAligner9Flat", requested by JJ Chai.
|
jpayne@68
|
4260 Fixed SNR output formatting.
|
jpayne@68
|
4261 Added "forcesectionname" flag; fasta reads will always get an "_1" at the end, even if they are not broken into multiple pieces. (requested by Shoudan)
|
jpayne@68
|
4262 Changed "fastareadlen" suffixes to only be appended when read is > maxlen rather than >=
|
jpayne@68
|
4263 Reorganized SamLine and created SamHeader class.
|
jpayne@68
|
4264 Modified CountBarcodes to append sub distance from expected barcodes and 'valid' for valid barcodes.
|
jpayne@68
|
4265 Fixed null pointer exception related to "qhist", "aqhist", and "qahist". Noted by Harald (seqanswers).
|
jpayne@68
|
4266 Fixed issue of readlength.sh breaking up reads when processing fasta files without a fasta extension.
|
jpayne@68
|
4267 Updated BBDuk documentation.
|
jpayne@68
|
4268 Added "maxlength" and qahist support to BBDuk.
|
jpayne@68
|
4269 Added "minoverlap" and "mininsert" to BBDuk.
|
jpayne@68
|
4270 Added "maxlength" to BBMerge.
|
jpayne@68
|
4271 Created countbarcodes.sh
|
jpayne@68
|
4272 Added edit distance column to CountBarcodes output.
|
jpayne@68
|
4273 Added raw mapping score tag, YS:i:, controlled by "scoretag" flag and disabled by default.
|
jpayne@68
|
4274 Added 'cq' (changequality) flag to reformat. Default: true.
|
jpayne@68
|
4275 Fixed mhist being generated from sam files.
|
jpayne@68
|
4276 Added readgroup support; a readgroup field "xx" can be specified with the flag "rgxx=value".
|
jpayne@68
|
4277 Updated 'usemodulo' flag to use (kmer%9==0 || rkmer%9==0). Requiring the remainders to be equal unevenly affected palindromes and thus even kmer lengths.
|
jpayne@68
|
4278 Updated RemoveHuman to use 'usemodulo' flag and reduced RAM allotment from 23g to 10g. Updated index location of HG19 masked.
|
jpayne@68
|
4279 Added "idfilter" to BBMap.
|
jpayne@68
|
4280 Made BandedAligner abstract superclass and created BandedAlignerConcrete for the Java implementation, and BandedAlignerJNI for the C version.
|
jpayne@68
|
4281 Made file extension detection more robust against capitalization.
|
jpayne@68
|
4282 Added outsingle to BBDuk.
|
jpayne@68
|
4283 Replaced FastaToChromArrays with ChromArrayMaker. Now, indexing can be done from fastq files instead of just fasta.
|
jpayne@68
|
4284 Fixed MAJOR bug in which reference was split up into pieces (as of 33.12).
|
jpayne@68
|
4285 Reverted to old version of reference loader (as of 33.13) as there was still a bug (skipping every other scaffold).
|
jpayne@68
|
4286 BBDuk (and BBDuk2) now better support kmer masking! Every occurance of a kmer is individually masked.
|
jpayne@68
|
4287 Added parseQuality (qin, qout, etc) to Dedupe.
|
jpayne@68
|
4288 Changed Dedupe default cluster stats cutoff to 2 (from 10), min cluster size to 2, and by default these values are linked.
|
jpayne@68
|
4289 Added 'outbest' to Dedupe, writing the representative read per cluster (regardless of 'pbr' flag). This is mainly for 16s clustering.
|
jpayne@68
|
4290 Fixed sorting of depths in pileup.sh. Noted by Alicia Clum.
|
jpayne@68
|
4291 Fixed 'outbest' of Dedupe (was writing to wrong stream).
|
jpayne@68
|
4292 Slightly accelerated read trimming.
|
jpayne@68
|
4293 Added read/base count tracking to ConcurrentReadStreamInterface.
|
jpayne@68
|
4294 Added display of exact number of input and output bases and reads to reformat.sh (requested by Seung-Jin).
|
jpayne@68
|
4295 Fixed capital letters changing to lower-case in output filenames when using the "basename" flag with BBSplit. Noted by Shoudan Liang.
|
jpayne@68
|
4296 Added Tools.condenseStrict(array).
|
jpayne@68
|
4297 Fixed fast/slow flags with BBSplit. Noted by Shoudan Liang.
|
jpayne@68
|
4298 Added 3-frames option to TranslateSixFrames by adding the flag "frames=3". Requested by Anne M.
|
jpayne@68
|
4299 TranslateSixFrames now defaults to fasta format when the file extension is unclear.
|
jpayne@68
|
4300 Added "estherfilter.sh" for filtering blastall queries.
|
jpayne@68
|
4301 Added option of getting an input stream from a process with null file argument.
|
jpayne@68
|
4302 Wrote FastaToChromArrays2 based on ByteFile/ByteBuilder for slightly better indexing speed and lower memory use.
|
jpayne@68
|
4303 Modified ChromosomeArray to work with ByteBuilder.
|
jpayne@68
|
4304 Fixed reformat displaying wrong number of input reads when run interleaved (due to recent changes).
|
jpayne@68
|
4305 Added minratio, maxindel, minhits, and fast flags to BBQC, for controlling BBMap.
|
jpayne@68
|
4306 Fixed "assert(false)" statement accidentally left in SamPileup from testing. Noted by Brian Foster.
|
jpayne@68
|
4307 Added kfilter and local flags to BBQC.
|
jpayne@68
|
4308 Fixed "bs" (bamscript) flag with BBSplit. Previously, it did not include the per-reference output streams.
|
jpayne@68
|
4309 Added Jonathan Rood's C code and JNI class for Dedupe.
|
jpayne@68
|
4310 Modified dedupe shellscripts to allow JNI code.
|
jpayne@68
|
4311 BBSplit was not outputting any reads when reference files had uppercase letters (as a result of the recent case-sensitivity change). This has been fixed. Noted by Shoudan Liang.
|
jpayne@68
|
4312 BBMap can now output fastq files with reads renamed to indicate mapping location, using the flags "rbm" and "don" (renamebymapping and deleteoldname).
|
jpayne@68
|
4313 FastaQualInputStream replaced by FastaQualInputStream3. At least 2.5x faster, and correctly reads input in which fasta and qual lines are wrapped at different lengths. Bug noted by Kurt LaButti.
|
jpayne@68
|
4314 Added bqhist, which allows box plots of read quality-per-base-location.
|
jpayne@68
|
4315 Fixed a slowdown when making quality histograms due to recalculating probability rather than using cached value.
|
jpayne@68
|
4316 Default sam format is now 1.4.
|
jpayne@68
|
4317 RemoveHuman/BBQC/RQCFilter now default to minhits=1 because 'usemodulo' reduces the number of valid keys.
|
jpayne@68
|
4318 Programs no longer default to outputting to stdout when "out=" is not specified because it's annoying. To write to stdout set "out=stdout.fq" (for example).
|
jpayne@68
|
4319 AssemblyStats now counts IUPAC and invalid characters seperately. X and N now denote gaps between contigs, but no other symbols do. The code was also cleaned somewhat. The output formatting changed slightly.
|
jpayne@68
|
4320 Preliminarily integrated Jon Rood's JNI versions of BandedAligner and MultiStateAligner into both Java code and shellscripts to test Genepool deployment.
|
jpayne@68
|
4321 C code is now in /jni/ folder, at same level as /resources/ and /docs/.
|
jpayne@68
|
4322 Clarified documentation of BBMap, BBSplit, and BBWrap to differentiate some parameters. For example, "refstats" only works with BBSplit.
|
jpayne@68
|
4323 Added LW and RW (whisker values) columns to bqhist output, set at the 2nd and 98th percentiles. Requested by Seung-Jin Sul.
|
jpayne@68
|
4324 BBQC will now compress intermediate files to level 2 instead of level 4, to save time.
|
jpayne@68
|
4325 Fixed incompatibility of dot graph output and other output in Dedupe.
|
jpayne@68
|
4326 Reverted to default "minhits=2" for RemoveHuman, because minhits=1 took 5x as long.
|
jpayne@68
|
4327 Added median, mean, and stdev to gchist. Requested by Seung-Jin.
|
jpayne@68
|
4328 Added obqhist (overall base quality histogram). Requested by Seung-Jin.
|
jpayne@68
|
4329 Fixed various places, such as BBDuk, where the "int=true" flag caused references to be loaded interleaved. Noted by Jessica Jarett.
|
jpayne@68
|
4330 Added some parser flags to allow dynamically enabling verbose mode and assertions specifically for certain classes.
|
jpayne@68
|
4331 Fixed a bug in BBMap that made secondary alignments sometimes not get cigar strings.
|
jpayne@68
|
4332 Added "addprefix" mode to rename reads, which simply prepends a prefix to the existing name.
|
jpayne@68
|
4333 Clarified documentation of different histogram outputs in shellscripts.
|
jpayne@68
|
4334 Ported BBMapThread changes over to BBMap variants.
|
jpayne@68
|
4335 Restructured SamPileup and renamed it to CoveragePileup. Now supports Read objects (instead of just SamLines).
|
jpayne@68
|
4336 Integrated CoveragePileup with BBMap and documented new flags.
|
jpayne@68
|
4337 CoveragePileup: Added a concise coverage output, stranded coverage, and read-start-only coverage.
|
jpayne@68
|
4338 Removed an obsolete Java classes and some shellscripts.
|
jpayne@68
|
4339 Increased robustness of BBDuk's detection invalid file arguments, and clarified the error messages. Noted by Scott D.
|
jpayne@68
|
4340 Fixed a problem with interleaving not being forced on fasta input.
|
jpayne@68
|
4341 Paired output files will now force BBDuk input to be treated as interleaved.
|
jpayne@68
|
4342 BBDuk now tracks statistics on which reference sequences were trimmed or masked - previously, it just tracked what was filtered.
|
jpayne@68
|
4343 Reverse-complemented Nextera adapters and added them to official release (/resources/nextera.fa.gz).
|
jpayne@68
|
4344 Added Illumina adapter sequence legal disclaimer to /docs/Legal_Illumina.txt
|
jpayne@68
|
4345 Implemented GC calculation from index, for generating coverage stats while mapping.
|
jpayne@68
|
4346 Tracked down strangeness with BBDuk. It is possible for "rcomp=f" to slightly reduce sensitivity when "mm=t" using an even kmer length, due to asymmetry. This appears to be correct.
|
jpayne@68
|
4347 Merged in revised JNI Dedupe version that should be working correctly. Verified that it returns same answer as non-JNI version. Tests indicate roughly triple speed, when working with PacBio reads of insert.
|
jpayne@68
|
4348 BBMap JNI version now seems roughly 30% faster than Java version.
|
jpayne@68
|
4349 Added insert size quartiles to BBMap and BBMerge. Requested by Alex Copeland.
|
jpayne@68
|
4350 Fixed rare bug related to SiteScore.fixXY(), caused by aligning reads with insufficient padding, fixing the tips, but not changing the start/stop positions. Found by Brian Foster.
|
jpayne@68
|
4351 Fixed a race condition in TextStreamWriter that could randomly cause a deadlock in numerous different programs. Found by Shoudan Liang.
|
jpayne@68
|
4352 Added "maxsites2" flag to allow over 800 alignments for a given read.
|
jpayne@68
|
4353 Fixed bounds of kmer masking in BBDuk; they were off by 2 (too big).
|
jpayne@68
|
4354 Fixed unintended debug print line. Noted by Shoudan Liang.
|
jpayne@68
|
4355 Updated RandomReadInputStream to work with the newer RandomReads3 class.
|
jpayne@68
|
4356 ConcurrentGenericReadInputStream now supports RandomReadInputStream3 as a producer.
|
jpayne@68
|
4357 Fixed kmer dumping from CountKmersExact.
|
jpayne@68
|
4358 Fixed length of vector created in BBMergeOverlapper (4->5). Noted by Jon Rood.
|
jpayne@68
|
4359 Changed default kmer length in BBDuk to 27 so that the 'maskmiddle' base will be in the middle for both forward and reverse kmers.
|
jpayne@68
|
4360 "pairlen" flag accidentally deleted from BBMap; restored. Noted by HGV (seqanswers).
|
jpayne@68
|
4361 BBMerge now has a JNI version from Jonathan Rood - 60% faster than pure Java. Requires compiling the C code; details are in /jni/README.txt.
|
jpayne@68
|
4362 Wrapped BBMerge JNI initializer in a conditional, so it will not try to load unless "usejni" is specified.
|
jpayne@68
|
4363 Added "parseCommonStatic" to BBMerge and BBDuk (to allow JNI flag parsing).
|
jpayne@68
|
4364 Commented out "module load" and "module unload" statements in public version.
|
jpayne@68
|
4365 Added 'printlastbin' or 'plb' flag to countunique to produce a final bin smaller than binsize. Suggested for use in cumulative mode. Requested by Andrew Tritt.
|
jpayne@68
|
4366 Added support for bzip2 and pbzip2 compression and decompression. The programs must be installed to use bz2 format.
|
jpayne@68
|
4367 Elminated use of "sh" when launching subprocesses. This also allows pigz compression support in Windows.
|
jpayne@68
|
4368 Files were not being closed after "testInterleaved()". Fixed.
|
jpayne@68
|
4369 Improved error messages when improper quality values are observed.
|
jpayne@68
|
4370 Updated hard-coded adapter path to include Nextera adapters. This affects BBQC and RQCFilter.
|
jpayne@68
|
4371 Improved file format detection. Now FileFormat (testformat.sh) will print a warning when the contents and extension don't match, and it can differentiate between sam and fastq. Problem noted by Vasanth Singan.
|
jpayne@68
|
4372 Fixed issue where "scafstats" output was printing inflated numbers with chimeric paired reads, or pairs with only one mapped read. Noted by HGV (seqanswers).
|
jpayne@68
|
4373 Closed stream after reading in FileFormat.
|
jpayne@68
|
4374 Unrolled, debranched, and removed assertion function calls from BBMerge inner loop.
|
jpayne@68
|
4375 Fixed a bug in which findTipDeletions was not changing the bounds of the gap array.
|
jpayne@68
|
4376 Added getters and setters for SiteScores that enforce gap correctness.
|
jpayne@68
|
4377 Improved GapTools to test for and fix non-ascending points.
|
jpayne@68
|
4378 Forced use of setters in TranslateColorspaceRead, AbstractMapThread, and BBIndex* classes; this caught some inconsistencies that should increase stability and correctness.
|
jpayne@68
|
4379 Enabled jni-mode alignment by default for BBQC and removehuman.
|
jpayne@68
|
4380 Added a BBMap output line indicating how many reads survived for use with, e.g., removehuman. Requested by Brian Foster.
|
jpayne@68
|
4381 Added messages to BBQC to indicate which phase is executing. Requested by Brian Foster.
|
jpayne@68
|
4382 SiteScore start and stop are exclusively set by methods now. Fixed a bug with local flag noted by Vasanth Singan.
|
jpayne@68
|
4383 Added MaximumSpanningTree generation to Dedupe (mst flag).
|
jpayne@68
|
4384 Merged in faster BBMerge overlapper JNI version; now 90% faster than Java with fastq and 70% faster with fasta.
|
jpayne@68
|
4385 Improved Dedupe's support for paired reads: fixed an assertion, and added "in1" and "in2".
|
jpayne@68
|
4386 Fixed a assertion involving semiperfect alignments of repetitive reads, that go out of the alignment window. Found by Alicia Clum.
|
jpayne@68
|
4387 Fixed idhist mean calculation. Added mode, median, stdev, both by read count and base count.
|
jpayne@68
|
4388 Better documented ConcurrentReadStreamInterface.
|
jpayne@68
|
4389 Fixed a crash in CoveragePileup when using 32-bit mode.
|
jpayne@68
|
4390 Fixed a couple instances in which the first two arguments being unrecognized would not be noticed.
|
jpayne@68
|
4391 Fixed a bug in pileup causing coverage fraction to be reported incorrectly, if arrays were not being used. Noted by Vasanth Singan.
|
jpayne@68
|
4392 Fixed a twocolumn mode in pileup; it was generating no output.
|
jpayne@68
|
4393 Added additional parse flags to pileup, such as "stats" and "outcov".
|
jpayne@68
|
4394 Added additional output fields to coverage stats - total number of covered bases, and number of reads mapped to plus and minus strands.
|
jpayne@68
|
4395 CountKmersExact: Added preallocation (faster, less memory) and a one-pass-mode for the prefilter (faster, but nondeterministic).
|
jpayne@68
|
4396 Replaced most instances of "Long.parseLong" with "Tools.parseKMG" to support kilo, mega, and giga abbreviated suffixes.
|
jpayne@68
|
4397 Added jgi.PhylipToFasta and phylip2fasta.sh, for converting interleaved phylip files to fasta. Requested by Esther Singer.
|
jpayne@68
|
4398 v33.58
|
jpayne@68
|
4399 Began listing point-version numbers in this readme.
|
jpayne@68
|
4400 Added jgi.A_Sample2, an simpler template for a concurrent pipe-filter stage.
|
jpayne@68
|
4401 Added jgi.MakeChimeras, a tool for making chimeric PacBio reads from input non-chimeric reads. Also, makechimeras.sh. Requested by Esther Singer.
|
jpayne@68
|
4402 Added support for normalized binning to CoveragePileup. Requested by Vasanth Singan.
|
jpayne@68
|
4403 v33.59
|
jpayne@68
|
4404 Fixed pileup's normalized scaling when dealing with 0-coverage scaffolds.
|
jpayne@68
|
4405 v33.60
|
jpayne@68
|
4406 Added driver.FilterReadsByName.java and filterbyname.sh. Allows inclusion or exclusion of reads by name.
|
jpayne@68
|
4407 Added midpad flag to RandomReads (allows defining inter-scaffold padding).
|
jpayne@68
|
4408 v33.61
|
jpayne@68
|
4409 Added ConcurrentReadInputStreamD, prototype for MPI-version of input stream.
|
jpayne@68
|
4410 Made Read and all classes that might be attached to reads Serializable.
|
jpayne@68
|
4411 Added DemuxByName and demuxbyname.sh which allows a single file to be split into multiple files based on read names.
|
jpayne@68
|
4412 v33.62
|
jpayne@68
|
4413 Added FilterByCoverage and filterbycoverage.sh to filter assemblies based on contig coverage stats (from Pileup).
|
jpayne@68
|
4414 Added CovStatsLine, an object representation of Pileup's coverage stats.
|
jpayne@68
|
4415 Added '#' symbol to coverage stats header.
|
jpayne@68
|
4416 v33.63
|
jpayne@68
|
4417 Fixed path in filterbycoverage.sh
|
jpayne@68
|
4418 v33.64
|
jpayne@68
|
4419 Added custom scripts driver.MergeCoverageOTU and mergeOTUs.sh for Esther.
|
jpayne@68
|
4420 Added DecontaminateByNormalization, for automating SAG plate decontamination.
|
jpayne@68
|
4421 Fixed legacy code that set KmerNormalize to use 8 threads in some cases.
|
jpayne@68
|
4422 Added "fixquality" for capping quality scores at 41. Requested by Bryce Foster.
|
jpayne@68
|
4423 Added fasta output to kmercountexact. Requested by Alex Copeland.
|
jpayne@68
|
4424 Added kmer histogram to kmercountexact (2-column and 3-column). Requested by Alex Copeland.
|
jpayne@68
|
4425 Added multiple memory-related and output formatting flags to kmercountexact.
|
jpayne@68
|
4426 Made KmerNode a subclass of AbstractKmerTable.
|
jpayne@68
|
4427 Improved Data's "unloadall" to also clear scaffold-related data.
|
jpayne@68
|
4428 Removed obsolete class CoverageArray1.
|
jpayne@68
|
4429 v33.65
|
jpayne@68
|
4430 Reduced preallocated memory in kmercountexact to avoid a crash on high memory machines. Also reduced total number of threads.
|
jpayne@68
|
4431 v33.66
|
jpayne@68
|
4432 "CountKmersExact.java" renamed to "KmerCountExact.java".
|
jpayne@68
|
4433 kmercountexact now writes histogram and kmer dump simultaneously in seperate threads.
|
jpayne@68
|
4434 kmercountexact.sh now specifies both -Xms and -Xmx.
|
jpayne@68
|
4435 CountKmersExact will no longer run out of memory if -Xms is not specified; instead, it will preallocate a smaller table.
|
jpayne@68
|
4436 v33.67
|
jpayne@68
|
4437 Messed with MDA amp in RandomReads a bit.
|
jpayne@68
|
4438 Added parser "ztd" ("zipthreaddivisor") flag. Defaults to 2 for removehuman.sh.
|
jpayne@68
|
4439 Added BBMerge flags "maq" (minaveragequality) and "mee" (mmaxexpectederrors). Reads violating these will not be attempted to merge.
|
jpayne@68
|
4440 Added BBMerge "efilter" flag, to allow disabling of the efilter. Efilter bans merges of reads that have more than the expected number of errors, based on quality scores.
|
jpayne@68
|
4441 Closed A_Sample2 I/O streams after completion. Noted by Jon Rood.
|
jpayne@68
|
4442 Created SynthMDA, a program to make a synthetic MDA'd single cell genome. This genome would be used as a reference for RandomReads.
|
jpayne@68
|
4443 Added Reformat "vpair" or (verifypairing) flag, which allows validation of pair names. Before, it was just interleaved reads.
|
jpayne@68
|
4444 Pair name validation will now accept identical names, if the "ain" (allowidenticalnames) flag is set.
|
jpayne@68
|
4445 Updated reformat.sh, repair.sh, bbsplitpairs.sh with new flags.
|
jpayne@68
|
4446 Removed FastaReadInputStream_old.java.
|
jpayne@68
|
4447 Added "forcelength" flag to MakeChimeras.
|
jpayne@68
|
4448 v33.68
|
jpayne@68
|
4449 Added "ihist" flag to rqcfilter, default "ihist.txt". Unless this is set to null, BBMerge will run to generate the insert size histogram after filtering completes.
|
jpayne@68
|
4450 AbstractKmerTable preallocation is now multithreaded. Unfortunately, this did not result in a speedup.
|
jpayne@68
|
4451 Added ByteBuilder-related methods to certain Read output formats.
|
jpayne@68
|
4452 Added ByteStreamWriter. This is a threaded writer with low overhead, and is substantially faster than TextStreamWriter (perhaps 2x speed).
|
jpayne@68
|
4453 Fixed a bug in KmerNode (traversing wrong branch during dump).
|
jpayne@68
|
4454 All AbstractKmerTable subclasses now dump kmers using bsw/ByteBuilder instead of tsw/StringBuilder.
|
jpayne@68
|
4455 Added ForceTrimLeft/ForceTrimRight flags to Dedupe (requested by Bryce/Seung-Jin).
|
jpayne@68
|
4456 v33.69
|
jpayne@68
|
4457 FilterByCoverage (and thus DecontaminatebyNormalization) now produce a log file indicating which contigs were removed.
|
jpayne@68
|
4458 FilterByCoverage and DecontaminatebyNormalization can now optionally process coverage before and after normalization, and not remove contigs unless the coverage changes by at least some ratio (default 2). Enable with "mapraw" and optionally "minratio" flag.
|
jpayne@68
|
4459 Added ihist to file-list.txt. TODO: Verify success.
|
jpayne@68
|
4460 Reads longer than 200bp are now detected as ASCII-33 regardless of their quality values. This helps with handling PacBio CCS/ROI data.
|
jpayne@68
|
4461 Added support in FixPairsAndSingles (repair.sh) for reads with names that do not contain whitespace, but still end with "/1" and "/2".
|
jpayne@68
|
4462 Added qout flag to RandomReads3.
|
jpayne@68
|
4463 Refactored TextStreamWriter to be more like ByteStreamWriter.
|
jpayne@68
|
4464 Added gcformat 0 (no base content info printed) to AssemblyStats2 (stats.sh).
|
jpayne@68
|
4465 v33.70
|
jpayne@68
|
4466 Updated RQCFilter and BBQC to bring them closer together and improve some of their defaults. RQCFilter now has more parameters such as k for filtering and trimming.
|
jpayne@68
|
4467 RQCFilter now correctly produces the insert size histogram.
|
jpayne@68
|
4468 v33.71
|
jpayne@68
|
4469 Fixed a bug in Dedupe preventing overlap detection when 'absorb match' and 'absorb containment' were both disabled. Noted by Shoudan Liang.
|
jpayne@68
|
4470 Optimized synthetic MDA procedure.
|
jpayne@68
|
4471 v33.72
|
jpayne@68
|
4472 Fixed a bug in SynthMDA.java. Further tweaked parameters.
|
jpayne@68
|
4473 Added synthmda.sh.
|
jpayne@68
|
4474 v33.73
|
jpayne@68
|
4475 Further tweaked SynthMDA defaults to better match some real data sent to me by Shoudan and Alex.
|
jpayne@68
|
4476 Fixed a bug in BBDuk's mask mode in which all bases in a masked read were assigned quality 0. Noted by luc (SeqAnswers).
|
jpayne@68
|
4477 Fixed a small error in KmerCountExact's preallocation calculation.
|
jpayne@68
|
4478 Added preallocation to BBDuk/BBDuk2. Not recommended for BBDuk2 because the tables may need unequal sizes.
|
jpayne@68
|
4479 Added "restrictleft" and "restrictright" flags to BBDuk (not BBDuk2). These allow only looking for kmer matches in the leftmost or rightmost X bases. Requested by lankage (SeqAnswers).
|
jpayne@68
|
4480 v33.74
|
jpayne@68
|
4481 Added jgi.Shuffle.java to input a read set and output it in random order. It can also sort by various things (coordinates, sequence, name, and numericID).
|
jpayne@68
|
4482 Added CallPeaks, which can call peaks from a histogram. Requested by Kurt LaButti.
|
jpayne@68
|
4483 Integrated peak calling into BBNorm and KmerCountExact.
|
jpayne@68
|
4484 BBNorm now has a "histogramcolumns" flag, so it can produce Jellyfish-compatible output.
|
jpayne@68
|
4485 Added callpeaks.sh.
|
jpayne@68
|
4486 v33.75
|
jpayne@68
|
4487 CallPeaks now calls by raw kmer count rather than unique kmer count. This better detects higher-order peaks.
|
jpayne@68
|
4488 Finished CrossContaminate.java and added crosscontaminate.sh.
|
jpayne@68
|
4489 Added "header" and "headerpound" to pileup.sh, to control header presence and whether they start with "#".
|
jpayne@68
|
4490 Added "prefix" flag to SynthMDA and RandomReads3, to better track origin of reads during cross-contamination trials.
|
jpayne@68
|
4491 RQCFilter and BBQC now parse 'usejni' flag; rqcfilter.sh and bbqc.sh default to this being enabled.
|
jpayne@68
|
4492 Added "uselowerdepth" flag to BBNorm (default true). Allows normalization by depth of higher or lower read. Set to false by DecontaminateByNormalization.
|
jpayne@68
|
4493 v33.76
|
jpayne@68
|
4494 Fixed a bug in synthmda.sh command line.
|
jpayne@68
|
4495 Fixed build number not being parsed by SynthMDA.
|
jpayne@68
|
4496 Added some error handling to CrossContaminate, so it shouldn't hang as a result of missing files.
|
jpayne@68
|
4497 v33.77
|
jpayne@68
|
4498 SynthMDA now nullifies reference in memory prior to generating reads.
|
jpayne@68
|
4499 Parser was not correctly setting the number of compression threads when exactly 1 was requested.
|
jpayne@68
|
4500 Shuffle is now multithreaded, and CrossContaminate defaults to shufflethreads=3.
|
jpayne@68
|
4501 Shuffle now removes reads as they are printed, reducing memory usage.
|
jpayne@68
|
4502 Created shellscript templates for generating and assembling full plates of synth MDA data, and ran successfully.
|
jpayne@68
|
4503 *SamLine was fixed when generating pnext from clipped reads. Still needs work; pos1 and pos2 need to be recalculated considering clipping.
|
jpayne@68
|
4504 BBDuk now tracks #contaminant bases as well as #contaminant reads per scaffold for stats. Additional flag "columns=5" enables this output.
|
jpayne@68
|
4505 BBDuk stats are now sorted by #bases, not #reads.
|
jpayne@68
|
4506 BBDuk counting arrays changed from int to long to handle potential overflow.
|
jpayne@68
|
4507 v33.78
|
jpayne@68
|
4508 Modified DemuxByName to handle affixes of variable length (though it's less efficient with multiple lengths).
|
jpayne@68
|
4509 v33.79
|
jpayne@68
|
4510 Changed the way "pos" and "pnext" are calculated for paired reads to be consistent. Bug had been noted with soft-clipped reads by Rob Egan.
|
jpayne@68
|
4511 Changed LOCAL_ALIGN_TIP_LENGTH from 8 to 1. Previously, soft-clipping would only occur if at least 8 bases would be clipped; not sure why I did that.
|
jpayne@68
|
4512 Changed the way "tlen" is calculated to compensate for clipping.
|
jpayne@68
|
4513 v33.80
|
jpayne@68
|
4514 Changed default decontaminate minratio from2 to 0 (disabling it) because of false negatives.
|
jpayne@68
|
4515 Changed default decontaminate mincov from 4 to 5 due to a false negative.
|
jpayne@68
|
4516 Changed default decontaminate kfilter from 63 to 55 to better reflect Spades defaults.
|
jpayne@68
|
4517 Fixed a bug in filterbycoverage which was outputting contaminant contigs instead of clean contigs.
|
jpayne@68
|
4518 Added outd (outdirty) flag to FilterByCoverage.
|
jpayne@68
|
4519 v33.81
|
jpayne@68
|
4520 Changed decontaminate normalization target from 100 to 50, and minlength from 0 to 500.
|
jpayne@68
|
4521 Changed decontaminate minc and minp flags from int to float.
|
jpayne@68
|
4522 v33.82
|
jpayne@68
|
4523 Changed cross contaminate probability root from 2 to 3 (increasing amount of lower-level contamination).
|
jpayne@68
|
4524 Fixed a crash bug in sam file generation caused by the change in the way pos was calculated.
|
jpayne@68
|
4525 v33.83
|
jpayne@68
|
4526 Added aecc=f, cecc=f, minprob=0.5, depthpercentile=0.8 flags to DecontaminateByNormalization. Defaults are as listed.
|
jpayne@68
|
4527 Dropped mindepth to 3 and maxdepth to target; target default changed to 20.
|
jpayne@68
|
4528 Changed the way mindepth is handled in normalization; now it is based on the depth of the higher read.
|
jpayne@68
|
4529 v33.84
|
jpayne@68
|
4530 Added BBNorm prebits flag for setting prefilter cell size (default 2).
|
jpayne@68
|
4531 Added Decontaminate filterbits and prefilterbits flags, default 32 and 4. 4 was chosen because MDA data has high error kmer counts.
|
jpayne@68
|
4532 v33.85
|
jpayne@68
|
4533 Fixed parsing of decontaminate minc and minp (parsed as ints; should have been floats)
|
jpayne@68
|
4534 Changed default minc to 3.5.
|
jpayne@68
|
4535 Change default ratio to 1.2.
|
jpayne@68
|
4536 v33.86
|
jpayne@68
|
4537 Changed decontaminate default dp to 0.75.
|
jpayne@68
|
4538 Changed decontaminate default prebits to 2.
|
jpayne@68
|
4539 Changed decontaminate default minr (min reads) to 20. Some tiny (~500bp) low-coverage contigs were getting through.
|
jpayne@68
|
4540 Changed decontaminate mindepth to 2.
|
jpayne@68
|
4541 Decontaminate results now prints extra columns for read counts and pre-norm coverage.
|
jpayne@68
|
4542 v33.87
|
jpayne@68
|
4543 Added "covminscaf" flag to BBMap and Pileup, to supress output of really short contigs. Default 0.
|
jpayne@68
|
4544 Changed CrossContaminate coverage distribution from cubic to geometric.
|
jpayne@68
|
4545 v33.88
|
jpayne@68
|
4546 Shuffle removing reads caused incredible slowness; it should have set reads to null. Fixed.
|
jpayne@68
|
4547 v33.89
|
jpayne@68
|
4548 Added HashArrayA, HashForestA, KmerNodeA and updated AbstractKmerTable to allow sets of values per kmer.
|
jpayne@68
|
4549 Refactored all AbstractKmerTable subclasses.
|
jpayne@68
|
4550 Added scaffold length tracking to BBDuk (for RPKM).
|
jpayne@68
|
4551 Added RPKM output to BBDuk (enable with "rpkm" flag).
|
jpayne@68
|
4552 BBDuk now unloads kmers after finishing processing reads.
|
jpayne@68
|
4553 v33.90
|
jpayne@68
|
4554 BBDuk counter arrays are now local per-thread, to prevent cache-thrashing.
|
jpayne@68
|
4555 Added IntList.toString()
|
jpayne@68
|
4556 Created Seal class, based on BBDuk with values stored in arrays.
|
jpayne@68
|
4557 Adjusted auto skip settings of BBDuk (increased size threshold for longer skips).
|
jpayne@68
|
4558 Added BBDuk skip flag (controls minskip and maxskip).
|
jpayne@68
|
4559 Fixed a bug in DemuxByName/DecontaminateByNormalization/CrossContaminate: attempt to read directories as files.
|
jpayne@68
|
4560 v33.91
|
jpayne@68
|
4561 Fixed a bug in BBDuk related to clearing data too early. Noted by Brian Foster.
|
jpayne@68
|
4562 v33.92
|
jpayne@68
|
4563 Added per-reference-file stats counting to BBDuk/Seal, and "refstats" flag.
|
jpayne@68
|
4564 Added returnList(boolean) to ConcurrentReadStreamInterface.
|
jpayne@68
|
4565 Removed an extra listen() call from ConcurrentReadInputStreamD.
|
jpayne@68
|
4566 Documented "addname" flag for stats.sh.
|
jpayne@68
|
4567 Implemented restrictleft and restrictright for BBDuk2.
|
jpayne@68
|
4568 Added "nzo" flag for BBDuk/Seal.
|
jpayne@68
|
4569 Added sdriscoll's reformatted shellscript help for BBDuk and BBMap. Thanks!
|
jpayne@68
|
4570 Added more documentation to bbmap.sh (usequality flag).
|
jpayne@68
|
4571 Added maq (minaveragequality) flag to BBMap, at request of sdriscoll.
|
jpayne@68
|
4572 Added rename flag to BBDuk/Seal - renames reads based on what sequences they matched.
|
jpayne@68
|
4573 Added userefnames flag BBDuk/Seal - the names of reference files are used, rather than scaffold IDs.
|
jpayne@68
|
4574
|
jpayne@68
|
4575 v33.93
|
jpayne@68
|
4576 maxindel flag now allows KMG suffix.
|
jpayne@68
|
4577 Added "speed" flag to BBDuk/Seal.
|
jpayne@68
|
4578 Added read processing time to BBDuk/Seal output.
|
jpayne@68
|
4579 BBDuk "fbm" (findbestmatch) mode is now much faster, using variable rather than fixed-length counters.
|
jpayne@68
|
4580 Fixed BBDuk2 not working when using the "ref" flag rather than "filterref".
|
jpayne@68
|
4581 Changed AbstractKmerTable subclass names to *1D and *2D.
|
jpayne@68
|
4582 Made KmerNode a superclass of KmerNode1D and KmerNode2D and eliminated redundant methods.
|
jpayne@68
|
4583 Eliminated 2D version of HashForest; it now works with 1D and 2D nodes.
|
jpayne@68
|
4584 Made HashArray a superclass of HashArray1D and HashArray2D.
|
jpayne@68
|
4585 Created HashArrayHybrid.
|
jpayne@68
|
4586 Added slow debugging methods to AbstractKmerTable classes, to verify that values were present after being added.
|
jpayne@68
|
4587 Fixed bug in KmerNode1D; was never changing its value on 'set'. Probably only affected Seal. Seal 1D now appears to produce identical output for prealloc and non-prealloc.
|
jpayne@68
|
4588 Finished debugging KmerNode2D, KmerForest, HashArray2D, HashArrayHybrid, and Seal.
|
jpayne@68
|
4589 Added "fbm" and "fum" to Seal.
|
jpayne@68
|
4590 Seal now defaults to 7 ways.
|
jpayne@68
|
4591 Adjusted Seal's memory preallocation.
|
jpayne@68
|
4592 Added -Xms flag to BBMergeGapped BBNorm shellscripts.
|
jpayne@68
|
4593 v33.94
|
jpayne@68
|
4594 Added -Xms flag to BBDuk and Seal.
|
jpayne@68
|
4595 Added qskip flag to BBDuk and Seal (for skipping query kmers).
|
jpayne@68
|
4596 v33.95
|
jpayne@68
|
4597 Seal now defaults to HashArrayHybrid rather than HashArrayArray2D
|
jpayne@68
|
4598 v33.96
|
jpayne@68
|
4599 Fixed a slowdown in Seal and BBDuk caused by sorting list of ID hits.
|
jpayne@68
|
4600 v33.97
|
jpayne@68
|
4601 Wrote driver.CorrelateIdentity and matrixtocolumns.sh for identity correlations between 16S and V4.
|
jpayne@68
|
4602 Wrote jgi.IdentityMatrix and idmatrix.sh for all-to-all alignment.
|
jpayne@68
|
4603 Added BandedAligner.alignQuadruple() to check all orientations.
|
jpayne@68
|
4604 BandedAligner now does not clear the full arrays, only the used portion, which can vary depending on read length.
|
jpayne@68
|
4605 v33.98
|
jpayne@68
|
4606 No change - build failure.
|
jpayne@68
|
4607 v33.99
|
jpayne@68
|
4608 Changed BandedAligner.PenalizeOffCenter(). Indels were getting double-penalized when they led to length mismatches between query and ref.
|
jpayne@68
|
4609 Added AlignDouble(), but it looks like AlignQuadruple is the only viable method for calculating full identity when the sequences do not start or stop at the same place.
|
jpayne@68
|
4610 Added test method to ReadStats to ensure the files are safe to write (ReadStats.testFiles()).
|
jpayne@68
|
4611 Fixed a bug bqhist output giving read 1 and read 2 same values. Noted by Shoudan/Bryce
|
jpayne@68
|
4612 Fixed a bug in BBDuk initialization when no kmer input supplied. Noted by Bill A.
|
jpayne@68
|
4613 Fixed a bug in BBDuk/Seal giving a spurious warning.
|
jpayne@68
|
4614 Detected race condition in ByteFile2 triggered by closing early. Not very important.
|
jpayne@68
|
4615 Added jni path flags to BBDuk shellscript command line.
|
jpayne@68
|
4616 Wrote FindPrimers and msa.sh to locate primer sites. Uses MultiStateAligner; outputs in sam format.
|
jpayne@68
|
4617 Wrote CutPrimers and cutprimers.sh to cut regions flanked by mapped primer locations from sequences, e.g. V4.
|
jpayne@68
|
4618
|
jpayne@68
|
4619 TODO: Plot correlation of V4 and 16s.
|
jpayne@68
|
4620 TODO: Add length into edges of Dedupe output. (Ted)
|
jpayne@68
|
4621 TODO: Benchmark Seal. Speed seems inconsistent.
|
jpayne@68
|
4622 TODO: Locking version of Seal.
|
jpayne@68
|
4623 TODO: HashArray resize - grow fast up to a limit, then resize to exactly the max allowable.
|
jpayne@68
|
4624 TODO: Alicia BBMap PacBio slowdown (try an older version...)
|
jpayne@68
|
4625 TODO: BBMerge rename mode with insert sizes.
|
jpayne@68
|
4626 TODO: Dump info about Seal kmer copy histogram.
|
jpayne@68
|
4627 TODO: Dedupe crash bug. (Kurt)
|
jpayne@68
|
4628 TODO: CallPeaks minwidth should be a subsumption threshold, not creation threshold.
|
jpayne@68
|
4629 TODO: CallPeaks should not subsume peaks with valleys in between that are very low.
|
jpayne@68
|
4630 *TODO: Make TextStreamWriter an abstract superclass.
|
jpayne@68
|
4631 TODO: BBDuk split mode
|
jpayne@68
|
4632 TODO: Add option for BBMap to convert U to T. (Asaf Levy)
|
jpayne@68
|
4633 TODO: Add dedupe support for graphing containments and matches.
|
jpayne@68
|
4634 TODO: Log normalization.
|
jpayne@68
|
4635 TODO: Prefilterpasses (prepasses)
|
jpayne@68
|
4636 TODO: Test forcing msa.scoreNoIndels to always run bidirectionally.
|
jpayne@68
|
4637 TODO: Message for BBNorm indicating pairing (this is nontrivial)
|
jpayne@68
|
4638 TODO: Average quality for pileup.sh
|
jpayne@68
|
4639 TODO: Fix ChromArrayMaker which may skip every other scaffold (for now I have reverted to old, correct version). ***Possibly fixed by disabling interleaving; TODO: Test.
|
jpayne@68
|
4640 TODO: Consider changing ConcurrentGenericReadInputStream to put read/base statistics into incrementGenerated(), or at least in a function.
|
jpayne@68
|
4641 TODO: BBSplit produces alignments to the wrong reference in the output for a specific reference. (Shoudan)
|
jpayne@68
|
4642 TODO: Change the way Ns are handled in cigar strings, both input and output.
|
jpayne@68
|
4643 TODO: Add #clipped reads/bases to BBMap output.
|
jpayne@68
|
4644 TODO: Add method for counting number of clipped bases in a read and unclipped length.
|
jpayne@68
|
4645 TODO: Orientation statistics for BBMap ihist.
|
jpayne@68
|
4646 TODO: Clarify documentation of 'reads' flag to note that it means reads OR pairs.
|
jpayne@68
|
4647 TODO: bs flag does not work with BBWrap (Shoudan).
|
jpayne@68
|
4648 TODO: Fasta input tries to sometimes keep reading from the file when a limited number of reads is specified. Gives error message but output is fine.
|
jpayne@68
|
4649 TODO: 'saa' flag sometimes does not work (Shoudan).
|
jpayne@68
|
4650 TODO: Kmer transition probabilities for binning.
|
jpayne@68
|
4651 TODO: One coverage file per scaffold; abort if over X scaffolds. (Andrew Tritt)
|
jpayne@68
|
4652 TODO: Enable JNI by default for BBMap and Dedupe on Genepool.
|
jpayne@68
|
4653 TODO: Disable cigar string generation when dumping coverage only (?). This will disable stats, though.
|
jpayne@68
|
4654 TODO: Pipethread spawned when decompressing from standard in with an external process.
|
jpayne@68
|
4655 TODO: FileFormat should test interleaving and quality individually on files rather than relying on a static field.
|
jpayne@68
|
4656 TODO: Refstats (BBSplit) still reports inflated rates for pairs that don't map to the same reference. This behavior is difficult to change because it is conflated with BBSPlit's output streams.
|
jpayne@68
|
4657
|
jpayne@68
|
4658
|
jpayne@68
|
4659 v32.
|
jpayne@68
|
4660 Revised all shellscripts to better detect memory in Linux. This should massively increase reliability and ease of use.
|
jpayne@68
|
4661 Added append flag. Allows appending to output files instead of overwriting.
|
jpayne@68
|
4662 Append flag now should work with BBWrap, with sam files, and with gzipped files.
|
jpayne@68
|
4663 All statistics are now stored in longs, rather than ints.
|
jpayne@68
|
4664 Added statistics tracking of # bases as well as # reads. Updated human-readable output to show 4 columns.
|
jpayne@68
|
4665 Split bbmerge into gapped (split kmer) and ungapped (overlap only) versions. bbmerge.sh calls the ungapped version.
|
jpayne@68
|
4666 Added "qahist" to bbmap - match/sub/ins/del histogram by quality score.
|
jpayne@68
|
4667 Fixed "pairlen" flag; it was only being used if greater than the default. (Noted by Harald on seqanswers)
|
jpayne@68
|
4668 Added insert size median and standard deviation to output stats. The 'ihist=' flag must be set to enable this, otherwise the data won't be tracked. (Requested by Harald on seqanswers)
|
jpayne@68
|
4669 Fixed bug in which non-ACGTN IUPAC symbols were not being converted to N. (Noted by Leanne on seqanswers)
|
jpayne@68
|
4670 Changed shellscripts from DOS to Unix EOL encoding.
|
jpayne@68
|
4671 Added support for "-h" and "--help" in shellscripts (before it was just in java files).
|
jpayne@68
|
4672 Created Dedupe2 - faster, and supports 1-cluster-per-file output.
|
jpayne@68
|
4673 Created Dedupe3 - supports more than 2 affix tables. Uses slightly more memory.
|
jpayne@68
|
4674 BBMap now generates "sort" shellscripts even if the output is in bam format.
|
jpayne@68
|
4675 pileup.sh now prints a coverage summary to standard out.
|
jpayne@68
|
4676 Added 'split' flag to BBMask.
|
jpayne@68
|
4677 Fixed bug in randomreads allowing paired reads to come from 'nearby' scaffolds.
|
jpayne@68
|
4678 Documented randomreads.sh.
|
jpayne@68
|
4679 Added gaussian insert size distribution to randomreads.
|
jpayne@68
|
4680 Fixed a bug in calcmem.sh that prevented requesting memory that Linux considered 'cached'.
|
jpayne@68
|
4681 TODO: Penalize score of sites with errors near read tips, and long deletions.
|
jpayne@68
|
4682 Added "Median_fold" column to pileup. You need to set 'bitset=
|
jpayne@68
|
4683 Changed default quality-filtering mode to average probability rather than average quality score.
|
jpayne@68
|
4684 Default number of threads now takes the environment variable NSLOTS into consideration. However, because Mendel nodes have hpyerthreading enabled, if NSLOTS>8 and (# processors)==NSLOTS*2, then #processors will be used instead. So it is still recommended that you set threads manually if you don't have exclusive access to a node.
|
jpayne@68
|
4685 Fixed bbmerge, which was crashing on fasta input.
|
jpayne@68
|
4686 Fixed gaussian insert size distribution in randomreads (it was causing a crash).
|
jpayne@68
|
4687 Enabled unpigz support in Windows (decompression only).
|
jpayne@68
|
4688 TODO: BBNorm needs in1/in2/out1/out2 support.
|
jpayne@68
|
4689 Added mingc and maxgc to reformat.
|
jpayne@68
|
4690 Added 'passes' flag to BBQC and reduced default passes to 1 if normalization is disabled.
|
jpayne@68
|
4691 Swapped FileFormat's method signature "allowFileRead" and "allowSubprocess" parms for some functions, as they were inconsistent. This may have unknown effects.
|
jpayne@68
|
4692 TODO: unclear if fasta files are currently checked for interleaving. Method added to "FASTQ".
|
jpayne@68
|
4693 TODO: FileFormat should perhaps test for quality format and interleaving.
|
jpayne@68
|
4694 Fixed reversed variables in "machineout" stats for %mapped and %unambiguous. Found by Michael Barton.
|
jpayne@68
|
4695 Added "testformat.sh".
|
jpayne@68
|
4696 Fixed dedupe "csf" output to work even when no other outputs specified.
|
jpayne@68
|
4697 Fixed dedupe erroneous assumption that "bandwidth" had not been custom-specified.
|
jpayne@68
|
4698 Changed MakeLengthHistogram (readlength.sh) default behavior to place reads in lower bins rather than closest bins. Toggle with "round" flag.
|
jpayne@68
|
4699 Added "repair" flag to SplitPairsAndSingles. Created "repair.sh".
|
jpayne@68
|
4700 Fixed a bug in which tabs were not allowed in fasta headers.
|
jpayne@68
|
4701 Improved BBMerge: default minqo 7->8, made margin a parameter, added 'strict' macro that reduces false positive rate.
|
jpayne@68
|
4702 Added "samestrand" flag to RandomReads.
|
jpayne@68
|
4703 Fixed a dedupe bug with "pto" and paired reads; read2 was not getting a UnitID.
|
jpayne@68
|
4704 Fixed a bug in which the BBMap stats for insertion rate was sometimes higher than the true value.
|
jpayne@68
|
4705 Fixed bugs in BBMerge; increased speed slightly.
|
jpayne@68
|
4706 Created grademerge.sh to grade merged reads.
|
jpayne@68
|
4707 Added 'variance' flag to randomreads; used to make qualities less uniform between reads.
|
jpayne@68
|
4708 BBDuk now has overwrite=true by default.
|
jpayne@68
|
4709 calcmem.sh now sets -Xmx and -Xms from each other if only one was specified.
|
jpayne@68
|
4710
|
jpayne@68
|
4711 Fixed bug with "ambig=all" and "stoptag" flags being used together. Found by WhatSoEver (seqanswers).
|
jpayne@68
|
4712 Added 'findbestmatch'/'fbm' flag to BBDuk; reports the reference sequence sharing the greatest number of kmers with the read.
|
jpayne@68
|
4713 Shellscripts no longer try to calculate memory before displaying help (noted by Kjiersten Fagnan).
|
jpayne@68
|
4714 -ea and -da are now valid parameters for all shellscripts.
|
jpayne@68
|
4715 Improved documentation of Dedupe.
|
jpayne@68
|
4716 Added "loose" and "vloose" modes to BBMerge.
|
jpayne@68
|
4717 Added novel-kmer-filtering to BBMerge - bans merged reads that create a novel kmer. Does not seem to help.
|
jpayne@68
|
4718 Added entropy-detection to BBMerge - minimum allowed overlap is determined by entropy rather than a constant. Moderate improvement.
|
jpayne@68
|
4719 Fixed bug causing "repair.sh" script to not work. Noted by SES (seqanswers).
|
jpayne@68
|
4720 Added "fast" mode to BBMerge.
|
jpayne@68
|
4721 Fixed a rounding problem in RandomReads that caused gaussian distribution to have 2x frequency of intended reads at exactly insert size of double read length.
|
jpayne@68
|
4722 Added exponential decay insert size distribution to RandomReads, for use in LMP libraries.
|
jpayne@68
|
4723 TODO: Track different paired read orientation rates (innie, outie, same direction, etc) with BBMap.
|
jpayne@68
|
4724 Added sssr (secondarysitescoreratio) and ssao (secondarysiteasambiguousonly) flags. Response to WhatSoEver (seqanswers).
|
jpayne@68
|
4725 Ambiguously-mapped reads that print a primary site now print a minimum of 1 secondary site, and all sites with the same score as the top secondary site.
|
jpayne@68
|
4726 Improved error message for paired reads with unequal number of read 1 vs read 2. Response to Salvatore (seqanswers).
|
jpayne@68
|
4727 Updated bbcountunique.sh help message.
|
jpayne@68
|
4728 Changed AddAdapters default to "arc=f" (no reverse-complement adapters). Added "addpaired" flag (adds adapter to same location of both reads).
|
jpayne@68
|
4729 Added BBDuk/BBDuk2 "tbo" (trimbyoverlap) flag. Vastly reduces false-negatives with no increase in false-positives.
|
jpayne@68
|
4730 Adding "fragadapter" flag to RandomReads. Also added ability to handle multiple different adapters for both read 1 and read 2. Adapters are added to paired reads with insert size shorter than read length.
|
jpayne@68
|
4731 Added "ordered" flag to BBDuk/BBDuk2.
|
jpayne@68
|
4732 Added "tpe" (trimpairsevenly) flag to BBDuk/BBDuk2. This works in conjunction with kmer-trimming to the right. Slightly decreases false negatives and doubles false positives.
|
jpayne@68
|
4733 Updated rqcfilter and bbqc with 'tbo' and 'tpe' flags.
|
jpayne@68
|
4734 TODO: Migrate RQCFilter to BBDuk2.
|
jpayne@68
|
4735 Improved addadapters to better handle reads annotated by renamereads.
|
jpayne@68
|
4736 BBMap's fillLimited routine is now affected by 'sssr' flag, if secondary sites are enabled. This will make things slightly slower when secondary sites are enabled, if sssr uses a low value (default is 0.95).
|
jpayne@68
|
4737 statswrapper now allows comma-delimited files.
|
jpayne@68
|
4738 Added standard deviation to BBMerge (requested by Bryce F).
|
jpayne@68
|
4739 Added "tbo" (trimbyoverlap) flag to BBMerge, as an alternative to joining.
|
jpayne@68
|
4740 Updated help for 'ambig' in bbmap.sh to remove the obsolete information that 'ambig=all' did not support sam output.
|
jpayne@68
|
4741 Updated BBMapSkimmer and its shellscript to default to 'ambig=all', which is its intended mode.
|
jpayne@68
|
4742 BBDuk no longer defaults to "out=stdout.fq" because that was incredibly annoying. Now it defaults to "out=null".
|
jpayne@68
|
4743 Changed BBDuk default mink from 4 to 6.
|
jpayne@68
|
4744 Changed BBDuk, Reformat, SplitPairsAndSingles default trimq from 4 to 6.
|
jpayne@68
|
4745 Added "ftr"/"ftl" flags to BBDuk.
|
jpayne@68
|
4746 Added "bbmapskimmer" to the list of options parsed by BBWrap. (Noted by JJ Chai)
|
jpayne@68
|
4747 Corrected documentation of idtag and stoptag - both default to false, not true. (Noted by JJ Chai)
|
jpayne@68
|
4748 Added "mappedonly" flag to reformat. (Requested by Kristen T)
|
jpayne@68
|
4749 Added "rmn" (requirematchingnames) flag to Dedupe. Requested by Alex Copeland.
|
jpayne@68
|
4750 Added ehist, indelhist, idhist, gchist, lhist flags to BBMap, BBDuk, and Reformat.
|
jpayne@68
|
4751 Added removesmartbell.sh wrapper for pacbio.RemoveAdapters2.
|
jpayne@68
|
4752 Fixed instance in KmerCoverage where input stream was being started twice. Noted by Alicia Clum.
|
jpayne@68
|
4753 Added "ngn" (NumberGraphNodes) flag to dedupe; default true. Allows toggling of labelling graph nodes with read number or read name.
|
jpayne@68
|
4754 "slow" flag now disables a heuristic that skipped mapping reads containing only kmers that are highly overrepresented in the reference. Problem noted by Shoudan Liang.
|
jpayne@68
|
4755 Added MergeBarcodes and mergebarcodes.sh
|
jpayne@68
|
4756 Identity is now calculated neutrally by default.
|
jpayne@68
|
4757 Added "qin" and "qout" documentation to bbnorm shellscripts. Noted by muol (seqanswers).
|
jpayne@68
|
4758 Changed qhist to ouput additional columns - both linear averages and logrithmic averages.
|
jpayne@68
|
4759 Added mode to BBMerge output.
|
jpayne@68
|
4760 Added mode, min, max, median, and standard deviation to ReadLength output. The mode and std dev are affected by bin size, so will only be exactly correct when bin size is 1.
|
jpayne@68
|
4761 Added "nzo" (nonzeroonly) flag to ReadLength.
|
jpayne@68
|
4762 Created "A_Sample", a template for programs that input reads, perform some function, and output reads.
|
jpayne@68
|
4763 BBNorm now works correctly with dual input and output files. Noted by Olaf (seqanswers).
|
jpayne@68
|
4764 Added mode to BBMap insert size statistics.
|
jpayne@68
|
4765 Added CorrelateBarcodes and filterbarcodes.sh, for analyzing and filtering reads by barcode quality.
|
jpayne@68
|
4766 Added "aqhist" (average quality histogram) to ReadStats - can be used by BBMap, BBDuk, Reformat.
|
jpayne@68
|
4767
|
jpayne@68
|
4768
|
jpayne@68
|
4769 v31.
|
jpayne@68
|
4770 TODO: Change pipethreads to redirects (where possible), and hash pipethreads by process, not by filename.
|
jpayne@68
|
4771 TODO: Improve scoring function by using gembal distribution and/or accounting for read length.
|
jpayne@68
|
4772 TextStreamWriter was improperly testing for output format 'other'. Noted by Brian Foster.
|
jpayne@68
|
4773 Fixed bug for read stream 2 in RTextOutputStream3. Found by Brian Foster.
|
jpayne@68
|
4774 Fixed bug in MateReadsMT creating an unwanted read stream 2. Found by Brian Foster.
|
jpayne@68
|
4775 TrimRead.testOptimal() mode added, and made default when quality trimming is performed; old mode can be used with 'otf=f' flag.
|
jpayne@68
|
4776 Fixed a couple cases where output file format was set to "ordered" even though the process was singlethreaded; this had caused an out-of-memory crash noted by Bill A.
|
jpayne@68
|
4777 Changed shellscripts of MapPacBio classes to remove "interleaved=false" term.
|
jpayne@68
|
4778 Reduced Shared.READ_BUFFER_LENGTH from 500 to 200 and Shared.READ_BUFFER_MAX_DATA from 1m to 500k, to reduce ram usage of buffers.
|
jpayne@68
|
4779 Noticed small bug in trimming; somehow a read had a 'T' with quality 0, which triggered assertion error. I disabled the assertion but I'm not sure how it happened.
|
jpayne@68
|
4780 Fixed bug in which pigz was not used to decompress fasta files.
|
jpayne@68
|
4781 All program message information now defaults to stderr.
|
jpayne@68
|
4782 Added "ignorebadquality" (ibq) flag for reads with out-of-range quality.
|
jpayne@68
|
4783 TODO: mask by information content
|
jpayne@68
|
4784 Added "mtl"/"mintrimlength" flag (default 60). Reads will not be trimmed shorter than that.
|
jpayne@68
|
4785 Made 'tuc' (to uppercase) default to true for bbmap, to prevent assertion errors. Reads MUST be uppercase to match reference.
|
jpayne@68
|
4786 Added new tool, BBMask.
|
jpayne@68
|
4787 Reads and SamLines can now be created with null bases.
|
jpayne@68
|
4788 SamLines to Read is now faster, skipping colorspace check.
|
jpayne@68
|
4789 Added deprecated 'SOH' symbol support to FastaInputStream. This will be replaced with a '>'. Needed to process NCBI's NT database.
|
jpayne@68
|
4790 Added "sampad" or "sp" flag to BBMask, to allow masking beyond bounds of mapped reads.
|
jpayne@68
|
4791 TODO: %reads with ins, del, splice
|
jpayne@68
|
4792 TODO: #bases mapped/unmapped, avg read length mapped/unmapped
|
jpayne@68
|
4793 Dedupe now tracks and prints scaffolds that were duplicates with "outd=". (request by Andrew Tritt)
|
jpayne@68
|
4794 Updated all shellscripts to support the -h and --help flags. (suggested by Westerman)
|
jpayne@68
|
4795 RAM detection is now skipped if user supplies -Xmx flag, preventing a false warning. (noted by Westerman)
|
jpayne@68
|
4796 Created AddAdapters.java. Capable of adding adapter sequence to a fastq file, and grading the trimmed file for correctness.
|
jpayne@68
|
4797 Removed some debug code from FileFormat causing a crash on "stdin" with no extension. Noted by Matt Nolan.
|
jpayne@68
|
4798 Added BBWrap and bbwrap.sh. Wraps BBMap to allow multiple input/output files without reloading the reference.
|
jpayne@68
|
4799 Added support for breaking long fastq reads into shorter reads (maxlength and minlength flags). Requested by James Han.
|
jpayne@68
|
4800 Added Pileup support for residual bins smaller than binsize. Flag "ksb", "keepshortbins". Requested by Kurt LaButti.
|
jpayne@68
|
4801 Fixed support for breaking long reads; was failing on the last read in the set. Noted by James Han.
|
jpayne@68
|
4802 Improved accuracy slightly by better detecting when padding is needed.
|
jpayne@68
|
4803 Improved verbose output from MSA.
|
jpayne@68
|
4804 Created TranslateSixFrames, first step toward amino acid mapping.
|
jpayne@68
|
4805 Improved RandomReads ability to simulate PacBio error profile.
|
jpayne@68
|
4806 Fixed crash when using BBSplit in PacBio mode. (Noted by Esther Singer)
|
jpayne@68
|
4807 May have improved ability to read relatively-pathed files if "." is not in $PATH. (nope, seems not)
|
jpayne@68
|
4808 Fixed crash when using "usequality=f" flag with fasta input reads. (Noted by Esther Singer)
|
jpayne@68
|
4809 Corrected behaviour of minlength with regards to trimming; it was not always working correctly.
|
jpayne@68
|
4810 Added "bhist" (base composition histogram) flag.
|
jpayne@68
|
4811
|
jpayne@68
|
4812 v30.
|
jpayne@68
|
4813 Disabled compression/decompression subprocesses when total system threads allowed is less than 3.
|
jpayne@68
|
4814 Fixed assertion error in calcCorrectness in which SiteScores are not necessarily sorted if AMBIGUOUS_RANDOM=true. Noted by Brian Foster.
|
jpayne@68
|
4815 Fixed bug in toLocalAlignment with respect to considering XY as insertions, not subs.
|
jpayne@68
|
4816 TODO: XY should be standardized as substitutions.
|
jpayne@68
|
4817 Added scarf input support. Requested by Alex Copeland.
|
jpayne@68
|
4818 TODO: Allow sam input with interleaved flag.
|
jpayne@68
|
4819 TODO: Make pigz a module dependency or script load.
|
jpayne@68
|
4820 Fixed bug with nodisk mode dropping the name of the first scaffold of every 500MB chunk after the first. Noted by Vasanth Singan.
|
jpayne@68
|
4821 Overhaul of I/O channel creation. Sequence files are now initialized with a FileFormat object which contains information about the format, permission to overwrite, etc.
|
jpayne@68
|
4822 Increased limit of number of index threads in Windows in nodisk mode (since disk fragmentation is no longer relevant).
|
jpayne@68
|
4823 Renamed Read.list to sites; added Read.topSite() and Read.numSites(); replaced many instances of things like "r.sites!=null && !r.sites.isEmpty()"
|
jpayne@68
|
4824 Refactored to put Read and all read-streaming I/O classes in 'stream' package.
|
jpayne@68
|
4825 Moved kmer hashing and indexing classes to kmer package.
|
jpayne@68
|
4826 Moved Variation, subclasses, and related classes to var package.
|
jpayne@68
|
4827 Moved FastaToChrom and ChromToFasta to dna package.
|
jpayne@68
|
4828 Moved pacbio error correction classes to pacbio package.
|
jpayne@68
|
4829 Removed stack, stats, primes, and other packages; prefixed all unused pacakges with z_.
|
jpayne@68
|
4830 TODO: Sites failing Data.isSingleScaffold() test should be clipped, not discarded.
|
jpayne@68
|
4831 RandomReads3 no longer adds /1 and /2 to paired fastq read names by default (can be enabled with 'addpairnum' flag).
|
jpayne@68
|
4832 Added "inserttag" flag; adds the insert size to sam output.
|
jpayne@68
|
4833 Fixed insert size histogram anomaly. There was a blip at insert==(read1.length+read2.length) because the algorithm used to calculate insert size was different for reads that overlap and reads that don't overlap.
|
jpayne@68
|
4834 Skimmer now defaults to cigar=true.
|
jpayne@68
|
4835 Added maxindel1 and maxindel2 (or maxindelsum) flags.
|
jpayne@68
|
4836 Removed OUTER_DIST_MULT2 because it caused assertion errors when different from OUTER_DIST_MULT; changed OUTER_DIST_MULT from 15 to 14.
|
jpayne@68
|
4837 Added shellscript for skimmer, bbmapskimmer.sh
|
jpayne@68
|
4838 TODO: Document above changes to parameters.
|
jpayne@68
|
4839
|
jpayne@68
|
4840
|
jpayne@68
|
4841
|
jpayne@68
|
4842 v29.
|
jpayne@68
|
4843 New version since major refactoring.
|
jpayne@68
|
4844 Added FRACTION_GENOME_TO_EXCLUDE flag (fgte). Setting this lower increases sensitivity at expense of speed. Range is 0-1 and default is around 0.03.
|
jpayne@68
|
4845 Added setFractionGenometoExclude() to Skimmer index.
|
jpayne@68
|
4846 LMP librares were not being paired correctly. Now "rcs=f" may be used to ignore orientation when pairing. Noted by Kurt LaButti.
|
jpayne@68
|
4847 Allocating memory to alignment score matrices caused uncaught out-of-memory error on low-memory machines, resulting in a hang. This is now caught and results in an exit. Noted by Alicia Clum.
|
jpayne@68
|
4848 GPINT machines are now detected and restricted to 4 threads max. This helps prevent out-of-memory errors with PacBio mode.
|
jpayne@68
|
4849 Fixed sam output bug in which an unmapped read would get pnext of 0 rather than 1 when its mate mapped off the beginning of a scaffold. Noted by Rob Egan.
|
jpayne@68
|
4850 Added memory test prior to allocating mapping threads. Thread count will be reduced if there is not enough memory. This is to address the issue noted by James Han, in which the PacBio versions would crash after running out of memory on low-memory nodes.
|
jpayne@68
|
4851 TODO: Detect and prevent low-memory crashes while loading the index by aborting.
|
jpayne@68
|
4852 Fixed assertion error caused by strictmaxindel mode (noted by James Han).
|
jpayne@68
|
4853 Added flag "trd" (trimreaddescriptions) which truncates read names at the first whitespace.
|
jpayne@68
|
4854 Added "usequality/uq" flag to turn on/off usage of quality information when mapping. Requested by Rob Egan.
|
jpayne@68
|
4855 Added "keepbadkeys/kbk" flag to prevent discarding of keys due to low quality. Requested by Rob Egan.
|
jpayne@68
|
4856 Fixed crash with very long reads and very small kmers due to exceeding length of various kmer array buffers.
|
jpayne@68
|
4857 Avg Initial Sites and etc no longer printed for read 2 data.
|
jpayne@68
|
4858 TODO: Support for selecting long-mate-pair orientation has been requested by Alex C.
|
jpayne@68
|
4859 Fixed possible bug in read trimming when the entire read was below the quality threshold.
|
jpayne@68
|
4860 Fixed trim mode bug: "trim=both" was only trimming the right side. "qtrim" is also now an alias for "trim".
|
jpayne@68
|
4861 Fixed bug in ConcurrentGenericReadInputStream causing an incorrect assertion error for input in paired files and read sampling. Found by Alex Copeland.
|
jpayne@68
|
4862 Added insert size histogram: ihist=<file>
|
jpayne@68
|
4863 Added "machineout" flag for machine-readable output stats.
|
jpayne@68
|
4864 TODO: reads_B1_100000x150bp_0S_0I_0D_0U_0N_interleaved.fq.gz (ecoli) has 0% rescued for read1 and 0.7% rescued for read 2. After swapping r1 and r2, .664% of r2 is rescued and .001% of r1 is rescued. Why are they not symmetric?
|
jpayne@68
|
4865 Added 'slow' flag to bbmap for increased accuracy. Still in progress.
|
jpayne@68
|
4866 Added MultiStateAligner11ts to MSA minIdToMinRatio().
|
jpayne@68
|
4867 Changed the way files are tested for permission to write (moved to Tools).
|
jpayne@68
|
4868 Fixed various places in which version string was parsed as an integer.
|
jpayne@68
|
4869 Added test for "help" and "version" flags.
|
jpayne@68
|
4870 Fixed bug in testing for file existence; noted by Bryce Foster.
|
jpayne@68
|
4871 Fixed issue with scaffold names not being trimmed on whitespace boundaries when 'trd=t'. Noted by Rob Egan.
|
jpayne@68
|
4872 Added pigz (parallel gzip) support, at suggestion of Rob Egan.
|
jpayne@68
|
4873 Improved support for subprocesses and pipethreads; they are now automatically killed when not needed, even if the I/O stream is not finished. This allows gunzip/unpigz when a file is being partially read.
|
jpayne@68
|
4874 Added shellscript test for the hostname 'gpint'; in that case, memory will be capped at 4G per process.
|
jpayne@68
|
4875 Changed the way cris/ros are shut down. All must now go through ReadWrite.closeStreams()
|
jpayne@68
|
4876 TODO: Force rtis and tsw to go through that too.
|
jpayne@68
|
4877 TODO: Add "Job.fname" field.
|
jpayne@68
|
4878 Made output threads kill processes also.
|
jpayne@68
|
4879 Modified TrimRead to require minlength parameter.
|
jpayne@68
|
4880 Fixed a bug with gathering statistics in BBMapPacBioSkimmer (found by Matt Scholz).
|
jpayne@68
|
4881 Fixed a bug in which reads with match string containing X/Y were not eligible to be semiperfect (Found by Brian Foster).
|
jpayne@68
|
4882 Fixed a bug related to improving the prior fix; I had inverted an == operator (Found by Brian Foster).
|
jpayne@68
|
4883 Added SiteScore.fixXY(), a fast method to fix reads that go out-of-bounds during alignment. Unfinished; score needs to be altered as a result.
|
jpayne@68
|
4884 Added "pairsonly" or "po" flag. Enabling it will treat unpaired reads as unmapped, so they will be sent to 'outu' instead of 'outm'. Suggested by James Han and Alex Copeland.
|
jpayne@68
|
4885 Added shellscript support for java -Xmx flag (Suggested by James Han).
|
jpayne@68
|
4886 Changed behavior: with 'quickmatch' enabled secondary sites will now get cigar strings (mostly, not all of them).
|
jpayne@68
|
4887 "fast" flag now enables quickmatch (50% speedup in e.coli with low-identity reads). Very minor effect on accuracy.
|
jpayne@68
|
4888 Fixed bug with overflowing gref due GREFLIMIT2_CUSHION padding. Found by Alicia Clum.
|
jpayne@68
|
4889 Fixed bug in which writing the index would use pigz rather than native gzip, allowing reads from scaffolds.txt.gz before the (buffered) writing finished. Rare race condition. Found by Brian Foster.
|
jpayne@68
|
4890 Fixed stdout.fa.gz writing uncompressed via ReadStreamWriter.
|
jpayne@68
|
4891 Added "allowSubprocess" flag to all constructors of TextFile and TextStreamWriter, and made TextFile 'tryAllExtensions' flag the last param.
|
jpayne@68
|
4892 allowSubprocess currently defaults to true for ByteFiles and ReadInput/Output Streams.
|
jpayne@68
|
4893 TODO: TextFile and TextStreamWriter (and maybe others?) may ignore ReadWrite.killProcess().
|
jpayne@68
|
4894 TODO: RTextOutputStream3 - make allowSubprocess a parameter
|
jpayne@68
|
4895 TODO: Assert that first symbol of reference fasta is '>' to help detect corrupt fastas.
|
jpayne@68
|
4896 Improved TextStreamWriter, TextFile, and all ReadStream classes usage of ReadWrite's InputStream/OutputStream creation/destruction methods.
|
jpayne@68
|
4897 All InputStream and OutputStream creation/destruction now has an allowSubprocesses flag.
|
jpayne@68
|
4898 Added verbose output to all ReadWrite methods.
|
jpayne@68
|
4899 Fixed bug in which realigned SiteScores were not given a new perfect/semiperfect status. Noted by Brian Foster and Will Andreopoulos.
|
jpayne@68
|
4900
|
jpayne@68
|
4901
|
jpayne@68
|
4902 v28.
|
jpayne@68
|
4903 New version because the new I/O system seems to be stable now.
|
jpayne@68
|
4904 Re-enabled bam input/output (via samtools subprocess). Lowered shellscript memory from 85% to 84% to provide space for samtools.
|
jpayne@68
|
4905 Added "-l" to "#!/bin/bash" at top. This may make it less likely for the environment to be messed up. Thanks to Alex Boyd for the tip.
|
jpayne@68
|
4906 Addressed potential bug in start/stop index padding calculation for scaffolds that began or ended with non-ACGT bases.
|
jpayne@68
|
4907 Made superclass for Index.
|
jpayne@68
|
4908 Made superclass for BBMap.
|
jpayne@68
|
4909 Removed around 5000 lines of code as a result of dereplication into superclasses.
|
jpayne@68
|
4910 Added MultiStateAligner11ts, which uses arrays for affine transform instead of if blocks. Changing insertions gave a ~5% speedup; subs gave an immeasurably small speedup.
|
jpayne@68
|
4911 Found bug in calculation of insert penalties during mapping. Fixing this bug increases speed but decreases accuracy, so it was modified toward a compromise.
|
jpayne@68
|
4912
|
jpayne@68
|
4913
|
jpayne@68
|
4914 v27.
|
jpayne@68
|
4915 Added command line to sam file header.
|
jpayne@68
|
4916 Added "msa=" flag. You can specify which msa to use by entering the classname.
|
jpayne@68
|
4917 Added initial banded mode. Specify "bandwidth=X" or "bandwidthratio=X" accelerate alignment.
|
jpayne@68
|
4918 Cleaned up argument parsing a bit.
|
jpayne@68
|
4919 Improved nodisk mode; now does not use the disk at all for indexing. BBSplitter still uses the disk.
|
jpayne@68
|
4920 Added "fast" flag, which changes some paramters to make mapping go faster, with slightly lower sensitivity.
|
jpayne@68
|
4921 Improved error handling; corrupt input files should be more likely to crash with an error message and less likely to hang. Noted by Alex Copeland.
|
jpayne@68
|
4922 Improved SAM input, particularly coordinates and cigar-string parsing; this should now be correct but requires an indexed reference. Of course this information is irrelevant for mapping so this parsing is turned off by default for bbmap.
|
jpayne@68
|
4923 Increased maximum read speed with ByteFile2, by using 2 threads per file. May be useful in input-speed limited scenarios, as when reading compressed input on a node with many cores. Also accelerates sam input.
|
jpayne@68
|
4924 TODO: Consider moving THREADS to Shared.
|
jpayne@68
|
4925 Updated match/cigar flag syntax.
|
jpayne@68
|
4926 Updated shellscript documentation.
|
jpayne@68
|
4927 Changed ByteFile2 from array lists to arrays; should reduce overhead.
|
jpayne@68
|
4928 TODO: Increase speed of sam input.
|
jpayne@68
|
4929 TODO: Increase speed of output, for all formats.
|
jpayne@68
|
4930 TODO: Finish ReadStreamWriter.addStringList(), which allows formatting to be done in the host.
|
jpayne@68
|
4931 In progress: Moving all MapThread fields to abstract class.
|
jpayne@68
|
4932 MapThread now passes reverse-complemented bases to functions to prevent replication of this array.
|
jpayne@68
|
4933 Fixed very rare bug when a non-semiperfect site becomes semiperfect after realignment, but subsequently is no longer highest-ranked.
|
jpayne@68
|
4934 strictmaxindel can now be assigned a number (e.g. stricmaxindel=5).
|
jpayne@68
|
4935 If a fasta read is broken into pieces, now all pieces will recieve the _# suffix in their name. Previously, the first piece was exempt.
|
jpayne@68
|
4936 TODO: Consider changing SamLine.rname to a String and seq, qual to byte[].
|
jpayne@68
|
4937 Changed SamLine.seq, qual to byte[]. Now stored in original read order and only reversed for minus strand during I/O.
|
jpayne@68
|
4938 Added sortscaffolds flag (requested by Vasanth Singan).
|
jpayne@68
|
4939 Fixed XS tag bug; in some cases read 2 was getting opposite flag (noted by Vasanth Singan).
|
jpayne@68
|
4940 Fixed bug when reading sam files without qualities (noted by Brian Foster).
|
jpayne@68
|
4941 Fixed bug where absent cigar strings were printed as "null" instead of "*" as a result of recent changes to sam I/O (noted by Vasanth Singan).
|
jpayne@68
|
4942 Found error when a read goes off the beginning of a block. Ref padding seems to be absent, because Ns were replaced by random sequence. Cause is unknown; cannot replicate.
|
jpayne@68
|
4943 Fixed Block.getHitList(int, int).
|
jpayne@68
|
4944 Changed calcAffineScore() to require base array for information when throwing exceptions.
|
jpayne@68
|
4945 Changed generated bamscript to unload samtools module before loading samtools/0.1.19.
|
jpayne@68
|
4946 sam file idflag and stopflag are both now faster, particularly for perfect mappings. But both default to off because they are still slow nonetheless.
|
jpayne@68
|
4947 Fixed bug in BBIndex in which a site was considered perfect because all bases matched the reference, but some of the bases were N. Canonically, reads with Ns can never be perfect even if the ref has Ns in the same locations.
|
jpayne@68
|
4948 Fixed above bug again because it was not fully fixed: CHECKSITES was allowing a read to be classified as perfect even if it contained an N.
|
jpayne@68
|
4949 Increased sam read speed by ~2x; 30MB/s to 66MB/s
|
jpayne@68
|
4950 Increased sam write speed from ~18MB/s to ~32MB/s on my 4-core computer (during mapping), with mapping at peak 42MB/s with out=null. Standalone (no mapping) sam output seems to run at 51MB/s but it's hard to tell.
|
jpayne@68
|
4951 Increased fasta write from 118MB/s to 140 MB/s
|
jpayne@68
|
4952 Increased fastq write from 70MB/s to 100MB/s
|
jpayne@68
|
4953 Increased fastq read from 120MB/s (I think) to 296MB/s (663 megabytes/sec!) with 2 threads or 166MB/s with 1 thread
|
jpayne@68
|
4954 Some of these speed increases come from writing byte[] into char[] buffer held in a ThreadLocal, instead of turning them into Strings or appending them byte-by-byte.
|
jpayne@68
|
4955 All of these speed optimizations caused a few I/O bugs that temporarily affected some users between Oct 1 and Oct 4, 2013. Sorry!
|
jpayne@68
|
4956 Flipped XS tag from + to - or vice versa. I seem to have misinterpreted the Cufflinks documentation (noted by Vasanth Singan).
|
jpayne@68
|
4957 Fixed bug in which (as a result of speed optimizations) reads outside scaffold boundaries, in sam 1.3 format, were not getting clipped (Noted by Brian Foster).
|
jpayne@68
|
4958 Changed default behavior of all shellscripts to run with -Xmx4g if maximum memory cannot be detected (typically, because ulimit=infinity). Was 31. Unfortunately things will break either way.
|
jpayne@68
|
4959 Fixed off-by-1 error in sam TLEN calculation; also simplified it to give sign based on leftmost POS and always give a plus and minus even when POS is equal.
|
jpayne@68
|
4960 Added sam NH tag (when ambig=all).
|
jpayne@68
|
4961 Disabled sam XM tag because the bowtie documentation and output do not make any sense.
|
jpayne@68
|
4962 Changed sam MD and NM tags to account for 'N' symbol in cigar strings.
|
jpayne@68
|
4963 Made sam SM tag score compatible with mapping score.
|
jpayne@68
|
4964 Fixed bug in SamLine when cigar=f (null pointer when parsing match string). (Found by Vasanth Singan)
|
jpayne@68
|
4965 Fixed bug in BBMapThread* when local=true and ambiguous=toss (null pointer to read.list). (Found by Alexander Spunde)
|
jpayne@68
|
4966 Changed synthetic read naming and parsing (parsecustom flag) to use " /1" and " /2" at the end of paired read names. (Requested by Kurt LaButti)
|
jpayne@68
|
4967 Increased fastq write to 200MB/s (590 megabytes/s)
|
jpayne@68
|
4968 Increased fasta write to 212MB/s (624 megabytes/s measured by fastq input)
|
jpayne@68
|
4969 Increased sam write to 167MB/s (492 megabytes/s measured by fastq input)
|
jpayne@68
|
4970 Increased bread write to 196MB/s (579 megabytes/s measured by fastq input)
|
jpayne@68
|
4971 bf2 (multithreaded input) is now enabled by default on systems with >4 cores, or in ReformatReads always.
|
jpayne@68
|
4972 Fixed RTextOutputStream3.finishedSuccessfully() returning false when output was in 2 files.
|
jpayne@68
|
4973 Changed output streams to unbuffered. No notable speed increase.
|
jpayne@68
|
4974 Fixed bug in ByteFile2 in which reads would be recycled when end of file was hit (found by Brian Foster, Bryce Foster, and Kecia Duffy).
|
jpayne@68
|
4975
|
jpayne@68
|
4976
|
jpayne@68
|
4977 v26.
|
jpayne@68
|
4978 Fixed crash from consecutive newlines in ByteFile.
|
jpayne@68
|
4979 Made SiteScore clonable/copyable.
|
jpayne@68
|
4980 Removed @RG line from headers. It implies that reads should be annotated with addition fields based on the RG line information.
|
jpayne@68
|
4981 Changed sam flags (at advice of Joel Martin). Now single-ended reads will never have flags 0x2, 0x40, or 0x80 set.
|
jpayne@68
|
4982 Added correct insert size average to output stats, in place of old inner distance and mapping length.
|
jpayne@68
|
4983 Fixed crash when detecting length of SamLines with no cigar string. (Found by Shayna Stein)
|
jpayne@68
|
4984 Added flag "keepnames" which keeps the read names unchanged when writing in sam format. Normally, a trailing "/1", "/2", " 1", or " 2" are stripped off, and if read 2's name differs from read 1's name, read 1's name is used for both. This is to remain spec-compliant with the sam format. However, in some cases (such as grading synthetic reads tagged with the correct mapping location) it is useful to retain the original name of each read.
|
jpayne@68
|
4985 Added local alignment option, "local". Translates global alignments into a local alignments using the same affine transform (and soft-clips ends).
|
jpayne@68
|
4986 Changed killbadpairs default to false. Now by default improperly paired reads are allowed.
|
jpayne@68
|
4987 Merged TranslateColorspaceRead versions into a single class.
|
jpayne@68
|
4988 Added interleaved input and output for bread format. May be useful for error correction pipeline.
|
jpayne@68
|
4989 TODO: Mode where reads are mapped to multiple scaffolds, but are mapped at most one time per scaffold. I.e., remove all but top site per scaffold (and forbid self-mapping).
|
jpayne@68
|
4990 Fixed yet another instance of negative coordinates appearing in an unmapped read, which the new version of samtools can't handle.
|
jpayne@68
|
4991 Fixed bug in counting ambiguous reads; was improperly including in statistics reads that were ambiguous but had a score lower than minratio.
|
jpayne@68
|
4992 Fixed rare crash found related to realignment of reads with ambiguous mappings (found by Rob Egan).
|
jpayne@68
|
4993 Unified many of the differences between the MapThread variants, and added a new self-checking function (checkTopSite) to ensure a Read is self-consistent.
|
jpayne@68
|
4994 Added some bitflag fetch functions to SamLine and fixed 'pairedOnSameChrom()' which was not handling the '=' symbol.
|
jpayne@68
|
4995 TODO: Make GENERATE_BASE_SCORES_FROM_QUALITY a parameter, default false in BBMapPacBio and true elsewhere. (I verified this should work fine)
|
jpayne@68
|
4996 TODO: Make GENERATE_KEY_SCORES_FROM_QUALITY a parameter, default true (probably even in BBMapPacBio). (I verified this should work fine)
|
jpayne@68
|
4997 Updated LongM (merged with LongM from Dedupe).
|
jpayne@68
|
4998 Fixed bug in SamLine in which clipped leading indels were not considered, causing potential negative coordinates. (Found by Brian Foster)
|
jpayne@68
|
4999 TODO: Match strings like NNNNNNDDDDDNNNNNmmmmmmmmmmmmmmmmm...mmmmmmm should never exist in the first place. Why did that happen?
|
jpayne@68
|
5000 Added "strictmaxindel" flag (default: strictmaxindel=f). Attempts to kill mappings in which there is a single indel event longer than the "maxindel" setting. Requested by James Han.
|
jpayne@68
|
5001 TODO: Ensure strictmaxindel works in all situations, including rescued paired ends and recursively regenerated padded match strings.
|
jpayne@68
|
5002 TODO: Redo msa to be strictly subtractive. Start with score=100*bases, then use e.g. 0 for match, -1 for del, -370 for sub, -100 for N, etc. No need for negative values.
|
jpayne@68
|
5003 Changed TIMEBITS in MultiStateAligner9PacBio from 10 to 9 to address a score underflow assertion error found by Alicia Clum. The underflow occuerd around length 5240; new limit should be around 10480.
|
jpayne@68
|
5004 TODO: Alicia found an error of exceeding gref bounds.
|
jpayne@68
|
5005 Fixed race condition in TextStreamWriter.
|
jpayne@68
|
5006 Improved functionality of splitter. Now you can index once and map subsequently using "basename" without specifying "ref=" every single time.
|
jpayne@68
|
5007 "Reads Used" in output now dispays the number of reads used. Before, for paired reads, it would display the number of pairs (half as many).
|
jpayne@68
|
5008 Added bases used to reads used at Kurt's request.
|
jpayne@68
|
5009 Improved bam script generation. Now correctly sets samtools memory based on detected memory, and warns user that crashes may be memory-related.
|
jpayne@68
|
5010 Fixed an obsolete assertion in SamLine found by Alicia.
|
jpayne@68
|
5011 Added XS tag option ("xstag=t") for Cufflinks; the need for this was noted by requested by Vasanth Singan.
|
jpayne@68
|
5012 Added 'N' cigar operation for deletions longer than X bases (intronlen=X). Also needed by Cufflinks.
|
jpayne@68
|
5013 Secondary alignments now get "*" for bases and qualities, as recommended by the SAM spec. This saves space, but may cause problems when converting sam into other formats.
|
jpayne@68
|
5014 Fixed bug that caused interleaved=true to override in2. Now if you set in and in2, interleaved input will be disabled. (noted by Andrew Tritt).
|
jpayne@68
|
5015 Fixed some low-level bugs in I/O streams. When shutting down streams I was waiting until !Thread.isAlive() rather than Thread.getState()==Thread.State.TERMINATED, which caused a race condition (since a thread is not alive before it starts execution).
|
jpayne@68
|
5016 Added debugging file with random name written to /ref/ directory. This should help debugging if somewhere deep in a pipeline multiple processes try to index at the same location simultaneously. Suggested by Bryce Foster.
|
jpayne@68
|
5017 Fixed log file generation causing a crash if the /ref/ directory did not exist, found by Vasanth Singan. Also logging is now disabled by default but enabled if you set "log=t".
|
jpayne@68
|
5018 Input sequence data will now translate '.' and '-' to 'N' automatically, as some fasta databases appear to use '.' instead of 'N'. (Thanks to Kecia Duffy and James Han)
|
jpayne@68
|
5019 Added capability to convert lowercase reads to upper case (crash on lowercase noted by Vasanth Singan).
|
jpayne@68
|
5020
|
jpayne@68
|
5021
|
jpayne@68
|
5022 v25.
|
jpayne@68
|
5023 Increased BBMapPacBio max read length to 6000, and BBMapPacBioSkimmer to 4000.
|
jpayne@68
|
5024 Fixed bugs in padding calculations during match string generation.
|
jpayne@68
|
5025 Improved some assertion error output.
|
jpayne@68
|
5026 Added flag "maxsites" for max alignments to print.
|
jpayne@68
|
5027 Added match field to sitescore.
|
jpayne@68
|
5028 Made untrim() affect sitescores as well.
|
jpayne@68
|
5029 Decreased read array buffer from 500 to 20 in MapPacBio.
|
jpayne@68
|
5030 TODO: stitcher for super long reads.
|
jpayne@68
|
5031 TODO: wrapper for split reference mapping and merging.
|
jpayne@68
|
5032 Improved fillAndScoreLimited to return additional information.
|
jpayne@68
|
5033 Added flag "secondary" to print secondary alignments. Does not yet ensure that all secondary alignments will get cigar strings, but most do.
|
jpayne@68
|
5034 Added flag "quickmatch" to generate match strings for SiteScores during slow align. Speeds up the overall process somewhat (at least on my PC; have not tested it on cluster).
|
jpayne@68
|
5035 Improved pruning during slow align by dynamically increasing msa limit.
|
jpayne@68
|
5036 Addressed a bug in which reads sometimes have additional sites aligned to the same coordinates as the primary site. The bug can still occur (typically during match generation or as a result of padding), but is detected and corrected during runtime.
|
jpayne@68
|
5037 Tracked down and fixed a bug relating to negative coordinates in sam output for unmapped reads paired with reads mapped off the beginning of a scaffold, with help from Rob Egan.
|
jpayne@68
|
5038 Disabled frowny-face warning message which had caused some confusion.
|
jpayne@68
|
5039 TODO: Add verification of match strings on site scores.
|
jpayne@68
|
5040 Made superclass for MSA. This will allow merging of redundant code over the various BBMap versions.
|
jpayne@68
|
5041 Fixed a crash-hang out-of-memory error caused by initialization order. Now crashes cleanly and terminates. Found by James Han.
|
jpayne@68
|
5042 Fixed bug in output related to detecting cigar string length under sam 1.4 specification (found by Rob Egan).
|
jpayne@68
|
5043 Added flag "killbadpairs"/"kbp".
|
jpayne@68
|
5044 Added flag "fakequality" for fasta.
|
jpayne@68
|
5045 Permanently fixed bugs related to unexpected short match strings caused by error messages.
|
jpayne@68
|
5046 Increased speed of dynamic program phase when dealing with lots of Ns.
|
jpayne@68
|
5047 TODO: In-line generation of short match string when printing a read, rather than mutating the read. (mutation is now temporary)
|
jpayne@68
|
5048 Added flag, "stoptag". Allows generation of SAM tag YS:i:<read stop location>
|
jpayne@68
|
5049 Added flag, "idtag". Allows generation of SAM tag YI:f:<percent identity>
|
jpayne@68
|
5050
|
jpayne@68
|
5051 v24.
|
jpayne@68
|
5052 Fixed bug that slightly reduced accuracy for reads with exactly 1 mismatch. They were always skipping slow align, sometimes preventing ambiguous reads from being detected.
|
jpayne@68
|
5053 Increased speed of MakeRocCurve (for automatic grading of sam files from synthetic reads). Had used 1 pass per quality level; now it uses only 1 pass total.
|
jpayne@68
|
5054 Increased accuracy of processing reads and contigs with ambiguous bases (in mapping phase).
|
jpayne@68
|
5055 Adjusted clearzones to use gradient functions and asymptotes rather than step functions. Reduces false positives and increases true positives, especially near the old step cutoffs.
|
jpayne@68
|
5056 Fixed trimSitesBelowCutoff assertion that failed for paired reads.
|
jpayne@68
|
5057 Added single scaffold toggle to RandomReads. Default 'singlescaffold=true'; forces reads to come from a single scaffold). This can cause non-termination if no scaffolds are long enough, and may bias against shorter scaffolds.
|
jpayne@68
|
5058 Added min scaffold overlap to RandomReads. Default 'overlap=1'; forces reads to overlap a scaffold at least this much. This can cause non-termination if no scaffolds are long enough, and may bias against shorter scaffolds.
|
jpayne@68
|
5059 Fixed setPerfect(). Previously, reads with 'N' overlapping 'N' in the reference could be considered perfect matches, but no reads containing 'N' should ever be considered a perfect mapping to anything.
|
jpayne@68
|
5060 Formalized definition of semiperfect to require read having no ambiguous bases, and fixed "isSemiperfect()" function accordingly.
|
jpayne@68
|
5061 Shortened and clarified executable names.
|
jpayne@68
|
5062 Fixed soft-clipped read start position calculation (mainly relevant to grading).
|
jpayne@68
|
5063 Prevented reads from being double-counted when grading, when a program gives multiple primary alignments for a read.
|
jpayne@68
|
5064 Fixed a bug in splitter initialization.
|
jpayne@68
|
5065 Added "ambiguous2". Reads that map to multiple references can now be written to distinct files (prefixed by "AMBIGUOUS_") or thrown away, independantly of whether they are ambiguous in the normal sense (which includes ambiguous within a single reference).
|
jpayne@68
|
5066 Added statistics tracking per reference and per scaffold. Enable with "scafstats=<file>" or "refstats=<file>".
|
jpayne@68
|
5067 "ambiguous" may now be shortened to "ambig" on the command line.
|
jpayne@68
|
5068 "true" and "false" may now be shortened to t, 1, or f, 0. If omitted entirely, "true" is assumed; e.g. "overwrite" is equivalent to "overwrite=true".
|
jpayne@68
|
5069 Added stderr as a vaild output destination specified from the command line.
|
jpayne@68
|
5070 BBSplitter now has a flag, "mapmode"; can be set to normal, accurate, pacbio, or pacbioskimmer.
|
jpayne@68
|
5071 Fixed issue where stuff was being written to stdout instead of stderr and ended up in SAM files (found by Brian Foster).
|
jpayne@68
|
5072 TODO: Add secondary alignments.
|
jpayne@68
|
5073 TODO: Unlimited length reads.
|
jpayne@68
|
5074 TODO: Protein mapping.
|
jpayne@68
|
5075 TODO: Soft clipping in both bbmap and GradeSamFile. Should universally adjust coords by soft-clip amount when reported in SAM format.
|
jpayne@68
|
5076 Fixed assertion error concerning reads containing Ns marked as perfect, when aligned to reference Ns (found by Rob Egan).
|
jpayne@68
|
5077 Fixed potential null-pointer error in "showprogress" flag.
|
jpayne@68
|
5078
|
jpayne@68
|
5079 v23.
|
jpayne@68
|
5080 Created BBSplitter wrapper for BBMap that allows merging any number references together and splitting the output into different streams.
|
jpayne@68
|
5081 Added support for ambiguous=random with paired reads (before it was limited to unpaired).
|
jpayne@68
|
5082 TODO: Iterative anchored alignment for very long reads, with a full master gref.
|
jpayne@68
|
5083 TODO: untrim=c/m/s/n/r
|
jpayne@68
|
5084 TODO: mode=vfast/veryfast: k=14 minratio=0.8 minhits=2 maxindel=20
|
jpayne@68
|
5085 TODO: mode=slow/accurate: BBMapi
|
jpayne@68
|
5086 TODO: mode=pacbio: BBMapPacBio k=12
|
jpayne@68
|
5087 TODO: mode=rnaseq
|
jpayne@68
|
5088 TODO: Put untrim in caclStatistics section
|
jpayne@68
|
5089 TODO: Test with MEGAN.
|
jpayne@68
|
5090 Finished new random read generator. Much faster, and solves coordinate problem with multiple indels.
|
jpayne@68
|
5091 Improved error message on read parsing failures.
|
jpayne@68
|
5092 TODO: Insert size histogram
|
jpayne@68
|
5093 TODO: "outp=", output for reads that mapped paired
|
jpayne@68
|
5094 TODO: "outs=", output for reads that mapped singly
|
jpayne@68
|
5095 Corrected assertion in "isSingleScaffold()"
|
jpayne@68
|
5096 Fixed a rare bug preventing recursive realignment when ambiguous=random (found by Brian Foster)
|
jpayne@68
|
5097 Added samversion/samv flag. Set to 1.3 for cigar strings with 'M' or 1.4 for cigar strings with '=' and 'X'. Default is 1.3.
|
jpayne@68
|
5098 Added enforcement of thread limit when indexing.
|
jpayne@68
|
5099 Added internal autodetection of gpint machines. Set default threadcount for gpints at 2.
|
jpayne@68
|
5100 Improved ability to map with maxindel=0
|
jpayne@68
|
5101 Added XM:i:<N> optional SAM flag because some programs seem to demand it. Like all extra flags, this is omitted if the read is not mapped. Otherwise, it is set to 1 for unambiguously mapped reads, and 2 or more for ambiguously mapped reads. The number can range as high as the total number of equal-scoring sites, but this is not guaranteed unless the "ambiguous=random" flag is used.
|
jpayne@68
|
5102 Fixed bug in autodetection of paired ends, found by Rob Egan.
|
jpayne@68
|
5103
|
jpayne@68
|
5104
|
jpayne@68
|
5105
|
jpayne@68
|
5106 v22.
|
jpayne@68
|
5107 Added match histogram support.
|
jpayne@68
|
5108 Added quality histogram support.
|
jpayne@68
|
5109 Added interleaving support to random read generator.
|
jpayne@68
|
5110 Added ability to disable pair rescue ("rescue=false" flag), which can speed things up in some cases.
|
jpayne@68
|
5111 Disabled dynamic-programming slow alignment phase when no indels are allowed.
|
jpayne@68
|
5112 Accelerated rescue in perfect and semiperfect mode.
|
jpayne@68
|
5113 Vastly accelerated paired mapping against references with a very low expected mapping rate.
|
jpayne@68
|
5114 Fixed crash in rescue caused by reads without quality strings (e.g. paired fasta files). (found by Brian Foster)
|
jpayne@68
|
5115
|
jpayne@68
|
5116
|
jpayne@68
|
5117 v21.
|
jpayne@68
|
5118 If reference specified is same as already-processed reference, the old index will not be deleted.
|
jpayne@68
|
5119 Added BBMap memory usage estimator to assembly statistics tool: java -Xmx120m jgi.AssemblyStats2 <fasta file> k=<kmer size for BBMap>
|
jpayne@68
|
5120 Added support for multiple output read streams: all reads (set by out=), mapped reads (set by outm=), and unmapped reads (set by outu=). They can be in different formats and any combination can be used at once. You can set pair output to secondary files with out2, outm2, and outu2.
|
jpayne@68
|
5121 Changed definition of "out=". You can no longer specify split output streams implicitly by using a "#" in the filename; it must be explicit. the "#" wildcard is still allowed for input streams.
|
jpayne@68
|
5122 Fixed a bug with sam input not working. (found by Brian Foster)
|
jpayne@68
|
5123 Added additional interleaved autodetection pattern for reads named "xxxxx 1:xxxx" and "xxxxx 2:xxxx"
|
jpayne@68
|
5124 Fixed a bug with soft-clipped deletions causing an incorrect cigar length. (found by Brian Foster)
|
jpayne@68
|
5125 Fixed a bug with parsing of negative numbers in byte arrays.
|
jpayne@68
|
5126 TODO: Found a new situation in which poly-N reads preferentially map to poly-N reference (probably tip search?)
|
jpayne@68
|
5127 Fixed a bug in which paired reads occasionally are incorrectly considered non-semiperfect. (found by Brian Foster)
|
jpayne@68
|
5128 Added more assertion tests for perfection/imperfection status.
|
jpayne@68
|
5129 Added blacklist support. This allows selection of output stream based on the name of the scaffold to which a read maps.
|
jpayne@68
|
5130 Created Blacklist class, allowing creation of blacklists and whitelists.
|
jpayne@68
|
5131 Added outb (aka outblacklist) and outb2 streams, to output reads that mapped to blacklisted scaffolds.
|
jpayne@68
|
5132 Added flag "outputblacklisted=<true/false>" which contols whether blacklisted reads are printed to the "out=" stream. Default is true.
|
jpayne@68
|
5133 Added support for streaming references. e.g. "cat ref1.fa ref2.fa | java BBMap ref=stdin.fa"
|
jpayne@68
|
5134 Updated and reorganized this readme.
|
jpayne@68
|
5135 Removed a dependency on Java 7 libraries (so that the code runs in Java 6).
|
jpayne@68
|
5136 Added per-read error rate histogram. Enable with qhist=<filename>
|
jpayne@68
|
5137 TODO: generate standard deviation.
|
jpayne@68
|
5138 Added per-base-position M/S/D/I/N rate tracking. Enable with mhist=<filename>
|
jpayne@68
|
5139 Added quality trimming. Reads may be trimmed prior to mapping, and optionally untrimmed after mapping, so that no data is lost. Trimmed bases are reported as soft-clipped in this case.
|
jpayne@68
|
5140 Trimming will extend until at least 2 consecutive bases have a quality greater than trimq (default 5).
|
jpayne@68
|
5141 Added flags: trim=<left/right/both/false>, trimq=<5>, untrim=<true/false>
|
jpayne@68
|
5142 TODO: Correct insert size in realtime for trim length.
|
jpayne@68
|
5143 TODO: Consider adding a TrimRead pointer to reads, rather than using obj.
|
jpayne@68
|
5144 TODO: Consider extending match string as 'M' rather than 'C' as long as clipped bases match.
|
jpayne@68
|
5145 Found and made safe some instances where reads could be trimmed to less than kmer length.
|
jpayne@68
|
5146 Found and fixed instance where rescue was attempted for length-zero reads.
|
jpayne@68
|
5147 Fixed an instance where perfect reads were not marked perfect (while making match string).
|
jpayne@68
|
5148
|
jpayne@68
|
5149
|
jpayne@68
|
5150 v20.1 (not differentiated from v20 since the differences are minor)
|
jpayne@68
|
5151 Fixed a minor, longstanding bug that prevented minus-strand alignment of rads that only had a single valid key (due to low complexity or low quality).
|
jpayne@68
|
5152 Increased accuracy of perfectmode and semiperfectmode, by allowing mapping of reads with only one valid key, without loss of speed. They still don't quite match normal mode since they use fewer keys.
|
jpayne@68
|
5153 Added detection of and error messages for reads that are too long to map.
|
jpayne@68
|
5154 Improved shell script usage information.
|
jpayne@68
|
5155
|
jpayne@68
|
5156
|
jpayne@68
|
5157 v20.
|
jpayne@68
|
5158 Made all MapThreads subclasses of MapThread, eliminating duplicate code.
|
jpayne@68
|
5159 Any exception thrown by a MapThread will now be detected, allowing the process to complete normally without hanging.
|
jpayne@68
|
5160 Exceptions (e.g. OutOfMemory) when loading reference genome are now detected, typically causing a crash exit instead of a hang.
|
jpayne@68
|
5161 Exceptions (e.g. OutOfMemory) when generating index are now detected, causing a crash exit instead of a hang.
|
jpayne@68
|
5162 Exceptions in output stream (RTextOutputStream) subthreads are now detected, throwing an exception.
|
jpayne@68
|
5163 Added support for soft clipping. All reads that go off the ends of scaffolds will be soft-clipped when output to SAM format. (The necessity of this was noted by Rob Egan, as negative scaffold indices can cause software such as samtools to crash)
|
jpayne@68
|
5164
|
jpayne@68
|
5165
|
jpayne@68
|
5166 v19.
|
jpayne@68
|
5167 Added support for leading FASTA comments (denoted by semicolon).
|
jpayne@68
|
5168 Fixed potential problem in FASTA read input stream with very long reads.
|
jpayne@68
|
5169 Recognizes additional FASTA file extensions: .seq, .fna, .ffn, .frn, .fsa, .fas
|
jpayne@68
|
5170 Disabled gzip subprocesses to circumvent a bug in UGE: Forking can cause a program to be terminated. Gzip is still supported.
|
jpayne@68
|
5171 Slightly reduced memory allocation in shellscript.
|
jpayne@68
|
5172 Ported "Analyze Index" improvement over to all versions (except v5).
|
jpayne@68
|
5173 Added flags: fastaminread, showprogress
|
jpayne@68
|
5174 Fixed problem noted by Rob Egan in which paired-end reads containing mostly 'N' could be rescued by aligning to the poly-N section off the end of a contig.
|
jpayne@68
|
5175 Fixed: Synthetic read headers were being improperly parsed by new FASTQ input stream.
|
jpayne@68
|
5176 Made a new, faster, more correct version of "isSemiperfect".
|
jpayne@68
|
5177 Added "semiperfect" test for reads changed during findDeletions.
|
jpayne@68
|
5178 Identified locations in "scoreNoIndels" where call 'N' == ref 'N' is considered a match. Does not seem to cause problems.
|
jpayne@68
|
5179 Noted that SAM flag 0x40 and 0x80 definitions differ from my usage.
|
jpayne@68
|
5180
|
jpayne@68
|
5181
|
jpayne@68
|
5182 v18.
|
jpayne@68
|
5183 Fastq read input speed doubled.
|
jpayne@68
|
5184 Fasta read input speed increased 50%.
|
jpayne@68
|
5185 Increased speed of "Analyze Index" by a factor of 3+ (just for BBMap so far; have not yet ported change over to other versions).
|
jpayne@68
|
5186 Fixed an array out-of-bounds bug found by Alicia Clum.
|
jpayne@68
|
5187 Added bam output option (relies on Samtools being installed).
|
jpayne@68
|
5188 Allows gzip subprocesses, which can sometimes improve gzipping and gunzipping speed over Java's implementation (will be used automatically if gzip is installed). This can be disabled with with the flags "usegzip=false" and "usegunzip=false".
|
jpayne@68
|
5189 Started a 32-bit mode which allows 4GB per block instead of 2GB, for a slight memory savings (not finished yet).
|
jpayne@68
|
5190 Added nondeterministic random read sampling option.
|
jpayne@68
|
5191 Added flags: minscaf, startpad, stoppad, samplerate, sampleseed, kfilter, usegzip, usegunzip
|
jpayne@68
|
5192
|
jpayne@68
|
5193
|
jpayne@68
|
5194 v17.
|
jpayne@68
|
5195 Changed the way error rate statistics are displayed. All now use match string length as denominator.
|
jpayne@68
|
5196 Identified error in random read generator regarding multiple insertions. It will be hard to fix but does not matter much.
|
jpayne@68
|
5197 Found out-of-bounds error when filling gref. Fixed (but maybe not everywhere...).
|
jpayne@68
|
5198 Added random mapping for ambiguous reads.
|
jpayne@68
|
5199 Changed index from 2d array to single array (saves a lot of memory).
|
jpayne@68
|
5200 Increased speed by ~10%.
|
jpayne@68
|
5201 Improved index generation and loading speed (typically more than doubled).
|
jpayne@68
|
5202 Changed chrom format to gzipped.
|
jpayne@68
|
5203 Added "nodisk" flag; index is not written to disk.
|
jpayne@68
|
5204 Fixed a rare out-of-bounds error.
|
jpayne@68
|
5205 Increased speed of perfect read mapping.
|
jpayne@68
|
5206 Fixed rare human PAR bug.
|
jpayne@68
|
5207
|
jpayne@68
|
5208
|
jpayne@68
|
5209 v16. Changes since last version:
|
jpayne@68
|
5210 Supports unlimited number of unscaffolded contigs.
|
jpayne@68
|
5211 Supports piping in and out. Set "out=stdout.sam" and "in=stdin.fq" to pipe in a fastq file and pipe out a sam file (other extensions are also supported).
|
jpayne@68
|
5212 Ambiguously named files (without proper extensions) will be autodetected as fasta or fastq (though I suggest not relying on that).
|
jpayne@68
|
5213 Added additional flags (described in parameters section): minapproxhits, padding, tipsearch, maxindel.
|
jpayne@68
|
5214 minapproxhits has a huge impact on speed. Going from 1 to 2 will typically at least double the speed (on a large genome) at some cost to accuracy.
|
jpayne@68
|
5215
|
jpayne@68
|
5216
|
jpayne@68
|
5217 v15. Changes since last version:
|
jpayne@68
|
5218 Contig names are retained for output.
|
jpayne@68
|
5219 SAM header @SQ tags fixed.
|
jpayne@68
|
5220 SAM header @PG tag added.
|
jpayne@68
|
5221 An out-of-bounds error was fixed.
|
jpayne@68
|
5222 An error related to short match strings was found and possibly handled.
|
jpayne@68
|
5223 All versions now give full statistics related to %matches, %substitutions, %deletions, and %insertions (unless match string generation is disabled).
|
jpayne@68
|
5224 Increased speed and accuracy for tiny (<20MB) genomes.
|
jpayne@68
|
5225 Added dynamic detection of scaffold sizes to better partition index, reducing memory in some cases.
|
jpayne@68
|
5226 Added command-line specification of kmer length.
|
jpayne@68
|
5227 Added more command line flags and described them in this readme.
|
jpayne@68
|
5228 Allowed overwriting of existing indices, for ease of use (only when overwrite=true). For efficiency you should still only specify "ref=" the first time you map to a particular reference, and just specify the build number subsequently.
|