jpayne@69: #!/bin/bash jpayne@69: jpayne@69: usage(){ jpayne@69: echo " jpayne@69: Written by Brian Bushnell jpayne@69: Last modified April 1, 2019 jpayne@69: jpayne@69: Description: Generates random synthetic reads from a reference genome. Read names indicate their genomic origin. jpayne@69: Allows precise customization of things like insert size and synthetic mutation type, sizes, and rates. jpayne@69: Read names generated by this program are used by MakeRocCure (samtoroc.sh) and GradeSamFile (gradesam.sh). jpayne@69: They can also be used by BBMap (bbmap.sh) and BBMerge (bbmerge.sh) to automatically calculate jpayne@69: true and false positive rates, if the flag 'parsecustom' is used. jpayne@69: jpayne@69: Usage: randomreads.sh ref= out= length= reads= jpayne@69: jpayne@69: Basic parameters: jpayne@69: out=null Output file. If reads are paired and a single file name is jpayne@69: given, output will be interleaved. For paired reads in twin jpayne@69: files, set out1= and out2= jpayne@69: ref=null Reference file. Not needed if the reference is already indexed. jpayne@69: build=1 If multiple references are indexed in the same directory, jpayne@69: each needs a unique build ID. jpayne@69: midpad=300 Specifies space between scaffolds in packed index. jpayne@69: reads=0 Generate this many reads (or pairs). jpayne@69: coverage=-1 If positive, generate enough reads to hit this coverage jpayne@69: target, based on the genome size. jpayne@69: overwrite=t Set to false to disallow overwriting of existing files. jpayne@69: replacenoref=f Set to true to replace Ns in the reference sequence jpayne@69: with random letters. jpayne@69: simplenames=f Set to true to generate read names that clearly indicate jpayne@69: genomic origin, without BBMap internal coordinates. jpayne@69: illuminanames=f Set to true to have matching names for paired reads, jpayne@69: rather than naming by location. jpayne@69: renamebyinsert=f Insert the insert size into the name. jpayne@69: addpairnum=f Set true to add ' 1:' and ' 2:' to the end of read names. jpayne@69: addslash=f Set true to add '/1' and '/2' to the end of read names. jpayne@69: spaceslash=f Set true to add a space before slash read pairnum. jpayne@69: prefix=null Generated reads will start with this prefix, jpayne@69: rather than naming by location. jpayne@69: seed=0 Use this to set the random number generator seed; jpayne@69: use -1 for a random seed. jpayne@69: jpayne@69: Length Parameters - normally only minlength and maxlength are needed. jpayne@69: minlength=150 Generate reads of up to this length. jpayne@69: maxlength=150 Generate reads of at least this length. jpayne@69: gaussianlength=f Use a gaussian length distribution (for PacBio). jpayne@69: Otherwise, the distribution is linear. jpayne@69: midlength=-1 Gaussian curve peaks at this point. Must be between jpayne@69: minlength and maxlength, in Gaussian mode. jpayne@69: readlengthsd=-1 Standard deviation of the Gaussian curve. Note that the jpayne@69: final curve is a sum of multiple curves, but this will affect jpayne@69: overall curve width. By default this is set to 1/4 of range. jpayne@69: jpayne@69: Pairing parameters: jpayne@69: paired=f Set to true for paired reads. jpayne@69: mininsert= Controls minimum insert length. Default depends on read length. jpayne@69: maxinsert= Controls maximum insert length. Default depends on read length. jpayne@69: triangle=f Make a triangular insert size distribution. jpayne@69: flat=f Make a roughly flat insert size distribution.. jpayne@69: superflat=f Make a perfectly flat insert size distribution. jpayne@69: gaussian=t Make a bell-shaped insert size distribution, with jpayne@69: standard deviation of (maxinsert-mininsert)/6. jpayne@69: samestrand=f Generate paired reads on the same strand. jpayne@69: jpayne@69: Mutation parameters: jpayne@69: snprate=0 Add snps to reads with this probability (0-1). jpayne@69: insrate=0 Add insertions to reads with this probability (0-1). jpayne@69: delrate=0 Add deletions to reads with this probability (0-1). jpayne@69: subrate=0 Add contiguous substitutions to reads with this probability (0-1). jpayne@69: nrate=0 Add nocalls to reads with this probability (0-1). jpayne@69: jpayne@69: Note: With a 'rate' of X, each read has an X chance of getting at least jpayne@69: 1 mutation, X^2 chance of 2+ mutations, X^3 chance of 3+ mutations, jpayne@69: and so forth up to the maximum allowed number of mutations of that type. jpayne@69: jpayne@69: maxsnps=3 Add at most this many snps per read. jpayne@69: maxinss=2 Add at most this many deletions per read. jpayne@69: maxdels=2 Add at most this many insertions per read. jpayne@69: maxsubs=2 Add at most this many contiguous substitutions per read. jpayne@69: maxns=0 Add at most this many blocks of Ns per read. jpayne@69: jpayne@69: maxinslen=12 Max length of insertions. jpayne@69: maxdellen=400 Max length of deletions. jpayne@69: maxsublen=12 Max length of contiguous substitutions. jpayne@69: maxnlen=1 Min length of N blocks. jpayne@69: jpayne@69: mininslen=1 Min length of insertions. jpayne@69: mindellen=1 Min length of deletions. jpayne@69: minsublen=2 Min length of contiguous substitutions. jpayne@69: minnlen=1 Min length of N blocks. jpayne@69: jpayne@69: Illumina quality parameters: jpayne@69: maxq=36 Upper bound of quality values. jpayne@69: midq=28 Approximate average of quality values. jpayne@69: minq=20 Lower bound of quality values. jpayne@69: q= Sets maxq, midq, and minq to the same value. jpayne@69: adderrors=t Add substitution errors based on quality values, jpayne@69: after mutations. jpayne@69: qv=4 Vary the base quality of reads by up to this much jpayne@69: to simulate tile effects. jpayne@69: jpayne@69: PacBio quality parameters: jpayne@69: pacbio=f Use a PacBio error model, rather than Illumina jpayne@69: error model, and add PacBio errors after mutations. jpayne@69: pbmin=0.13 Minimum rate of PacBio errors for a read. jpayne@69: pbmax=0.17 Maximum rate of PacBio errors for a read. jpayne@69: jpayne@69: Other Parameters: jpayne@69: overlap=1 Require reads to overlap scaffold end by at least this much. jpayne@69: banns=f Do not generate reads over reference Ns. jpayne@69: metagenome=f Assign scaffolds a random exponential coverage level, jpayne@69: to simulate a metagenomic or RNA coverage distribution. jpayne@69: randomscaffold=f Choose random scaffolds without respect to length. jpayne@69: amp=1 Simulate highly-amplified MDA single-cell data by jpayne@69: setting this to a higher number like 1000. jpayne@69: replacenoref=f Replace intra- and inter-scaffold Ns with random bases. jpayne@69: pbadapter= Add adapter sequence to some reads using this literal string. jpayne@69: fragadapter= Add this sequence to paired reads with insert size jpayne@69: shorter than read length. jpayne@69: fragadapter2= Use this sequence for read 2. jpayne@69: jpayne@69: Java Parameters: jpayne@69: -Xmx This will set Java's memory usage, overriding the jpayne@69: program's automatic memory detection. jpayne@69: -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify jpayne@69: 200 megs. jpayne@69: The max is typically 85% of physical memory. jpayne@69: -eoom This flag will cause the process to exit if an out-of-memory jpayne@69: exception occurs. Requires Java 8u92+. jpayne@69: -da Disable assertions. jpayne@69: " jpayne@69: } jpayne@69: jpayne@69: #This block allows symlinked shellscripts to correctly set classpath. jpayne@69: pushd . > /dev/null jpayne@69: DIR="${BASH_SOURCE[0]}" jpayne@69: while [ -h "$DIR" ]; do jpayne@69: cd "$(dirname "$DIR")" jpayne@69: DIR="$(readlink "$(basename "$DIR")")" jpayne@69: done jpayne@69: cd "$(dirname "$DIR")" jpayne@69: DIR="$(pwd)/" jpayne@69: popd > /dev/null jpayne@69: jpayne@69: #DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )/" jpayne@69: CP="$DIR""current/" jpayne@69: jpayne@69: z="-Xmx1g" jpayne@69: z2="-Xms1g" jpayne@69: set=0 jpayne@69: jpayne@69: if [ -z "$1" ] || [[ $1 == -h ]] || [[ $1 == --help ]]; then jpayne@69: usage jpayne@69: exit jpayne@69: fi jpayne@69: jpayne@69: calcXmx () { jpayne@69: source "$DIR""/calcmem.sh" jpayne@69: setEnvironment jpayne@69: parseXmx "$@" jpayne@69: if [[ $set == 1 ]]; then jpayne@69: return jpayne@69: fi jpayne@69: freeRam 3200m 84 jpayne@69: z="-Xmx${RAM}m" jpayne@69: z2="-Xms${RAM}m" jpayne@69: } jpayne@69: calcXmx "$@" jpayne@69: jpayne@69: randomreads() { jpayne@69: local CMD="java $EA $EOOM $z -cp $CP align2.RandomReads3 build=1 $@" jpayne@69: echo $CMD >&2 jpayne@69: eval $CMD jpayne@69: } jpayne@69: jpayne@69: randomreads "$@"