Mercurial > repos > rliterman > csp2
view CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/opt/bbmap-39.01-1/randomreads.sh @ 69:33d812a61356
planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author | jpayne |
---|---|
date | Tue, 18 Mar 2025 17:55:14 -0400 |
parents | |
children |
line wrap: on
line source
#!/bin/bash usage(){ echo " Written by Brian Bushnell Last modified April 1, 2019 Description: Generates random synthetic reads from a reference genome. Read names indicate their genomic origin. Allows precise customization of things like insert size and synthetic mutation type, sizes, and rates. Read names generated by this program are used by MakeRocCure (samtoroc.sh) and GradeSamFile (gradesam.sh). They can also be used by BBMap (bbmap.sh) and BBMerge (bbmerge.sh) to automatically calculate true and false positive rates, if the flag 'parsecustom' is used. Usage: randomreads.sh ref=<file> out=<file> length=<number> reads=<number> Basic parameters: out=null Output file. If reads are paired and a single file name is given, output will be interleaved. For paired reads in twin files, set out1= and out2= ref=null Reference file. Not needed if the reference is already indexed. build=1 If multiple references are indexed in the same directory, each needs a unique build ID. midpad=300 Specifies space between scaffolds in packed index. reads=0 Generate this many reads (or pairs). coverage=-1 If positive, generate enough reads to hit this coverage target, based on the genome size. overwrite=t Set to false to disallow overwriting of existing files. replacenoref=f Set to true to replace Ns in the reference sequence with random letters. simplenames=f Set to true to generate read names that clearly indicate genomic origin, without BBMap internal coordinates. illuminanames=f Set to true to have matching names for paired reads, rather than naming by location. renamebyinsert=f Insert the insert size into the name. addpairnum=f Set true to add ' 1:' and ' 2:' to the end of read names. addslash=f Set true to add '/1' and '/2' to the end of read names. spaceslash=f Set true to add a space before slash read pairnum. prefix=null Generated reads will start with this prefix, rather than naming by location. seed=0 Use this to set the random number generator seed; use -1 for a random seed. Length Parameters - normally only minlength and maxlength are needed. minlength=150 Generate reads of up to this length. maxlength=150 Generate reads of at least this length. gaussianlength=f Use a gaussian length distribution (for PacBio). Otherwise, the distribution is linear. midlength=-1 Gaussian curve peaks at this point. Must be between minlength and maxlength, in Gaussian mode. readlengthsd=-1 Standard deviation of the Gaussian curve. Note that the final curve is a sum of multiple curves, but this will affect overall curve width. By default this is set to 1/4 of range. Pairing parameters: paired=f Set to true for paired reads. mininsert= Controls minimum insert length. Default depends on read length. maxinsert= Controls maximum insert length. Default depends on read length. triangle=f Make a triangular insert size distribution. flat=f Make a roughly flat insert size distribution.. superflat=f Make a perfectly flat insert size distribution. gaussian=t Make a bell-shaped insert size distribution, with standard deviation of (maxinsert-mininsert)/6. samestrand=f Generate paired reads on the same strand. Mutation parameters: snprate=0 Add snps to reads with this probability (0-1). insrate=0 Add insertions to reads with this probability (0-1). delrate=0 Add deletions to reads with this probability (0-1). subrate=0 Add contiguous substitutions to reads with this probability (0-1). nrate=0 Add nocalls to reads with this probability (0-1). Note: With a 'rate' of X, each read has an X chance of getting at least 1 mutation, X^2 chance of 2+ mutations, X^3 chance of 3+ mutations, and so forth up to the maximum allowed number of mutations of that type. maxsnps=3 Add at most this many snps per read. maxinss=2 Add at most this many deletions per read. maxdels=2 Add at most this many insertions per read. maxsubs=2 Add at most this many contiguous substitutions per read. maxns=0 Add at most this many blocks of Ns per read. maxinslen=12 Max length of insertions. maxdellen=400 Max length of deletions. maxsublen=12 Max length of contiguous substitutions. maxnlen=1 Min length of N blocks. mininslen=1 Min length of insertions. mindellen=1 Min length of deletions. minsublen=2 Min length of contiguous substitutions. minnlen=1 Min length of N blocks. Illumina quality parameters: maxq=36 Upper bound of quality values. midq=28 Approximate average of quality values. minq=20 Lower bound of quality values. q= Sets maxq, midq, and minq to the same value. adderrors=t Add substitution errors based on quality values, after mutations. qv=4 Vary the base quality of reads by up to this much to simulate tile effects. PacBio quality parameters: pacbio=f Use a PacBio error model, rather than Illumina error model, and add PacBio errors after mutations. pbmin=0.13 Minimum rate of PacBio errors for a read. pbmax=0.17 Maximum rate of PacBio errors for a read. Other Parameters: overlap=1 Require reads to overlap scaffold end by at least this much. banns=f Do not generate reads over reference Ns. metagenome=f Assign scaffolds a random exponential coverage level, to simulate a metagenomic or RNA coverage distribution. randomscaffold=f Choose random scaffolds without respect to length. amp=1 Simulate highly-amplified MDA single-cell data by setting this to a higher number like 1000. replacenoref=f Replace intra- and inter-scaffold Ns with random bases. pbadapter= Add adapter sequence to some reads using this literal string. fragadapter= Add this sequence to paired reads with insert size shorter than read length. fragadapter2= Use this sequence for read 2. Java Parameters: -Xmx This will set Java's memory usage, overriding the program's automatic memory detection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory. -eoom This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+. -da Disable assertions. " } #This block allows symlinked shellscripts to correctly set classpath. pushd . > /dev/null DIR="${BASH_SOURCE[0]}" while [ -h "$DIR" ]; do cd "$(dirname "$DIR")" DIR="$(readlink "$(basename "$DIR")")" done cd "$(dirname "$DIR")" DIR="$(pwd)/" popd > /dev/null #DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )/" CP="$DIR""current/" z="-Xmx1g" z2="-Xms1g" set=0 if [ -z "$1" ] || [[ $1 == -h ]] || [[ $1 == --help ]]; then usage exit fi calcXmx () { source "$DIR""/calcmem.sh" setEnvironment parseXmx "$@" if [[ $set == 1 ]]; then return fi freeRam 3200m 84 z="-Xmx${RAM}m" z2="-Xms${RAM}m" } calcXmx "$@" randomreads() { local CMD="java $EA $EOOM $z -cp $CP align2.RandomReads3 build=1 $@" echo $CMD >&2 eval $CMD } randomreads "$@"