Mercurial > repos > rliterman > csp2
diff CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/opt/bbmap-39.01-1/calctruequality.sh @ 69:33d812a61356
planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author | jpayne |
---|---|
date | Tue, 18 Mar 2025 17:55:14 -0400 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/opt/bbmap-39.01-1/calctruequality.sh Tue Mar 18 17:55:14 2025 -0400 @@ -0,0 +1,152 @@ +#!/bin/bash + +usage(){ +echo " +Written by Brian Bushnell +Last modified March 21, 2019 + +Description: Calculates observed quality scores from mapped sam/bam files. +Generates matrices for use in recalibrating quality scores. By default, +the matrices are written to /ref/qual/ in the current directory. + +If you have multiple sam/bam files demultiplexed from a single sequencing run, +it is recommended to use all of them as input for increased statistical power. +Once the matrices are generated, recalibration can be done on mapped or +unmapped reads; you may get better results by recalibrating the fastq and +remapping the calibrated reads. + +Note! Diploid organisms with a high heterozygousity rate will induce +inaccurate recalibration at the high end of the quality scale unless SNP +locations are masked or variations are called. For example, recalibrating +human reads mapped to an unmasked human reference would generate an +expected maximal Q-score of roughly 30 due to the human 1/1000 SNP rate. +Variations can be ignored by using the callvars flag or providing +a file of variations. + +Usage: + +Step 1. Generate matrices (from mapped sam or bam files): +calctruequality.sh in=<file,file,...file> path=<directory> + +Step 2. Recalibrate reads (any kind of files): +bbduk.sh in=<file> out=<file> recalibrate + + +Parameters (and their defaults) + +Input parameters: +in=<file,file> Sam file or comma-delimited list of files. Alignments + must use = and X cigar symbols, or have MD tags, or + ref must be specified. +reads=-1 Stop after processing this many reads (if positive). +samstreamer=t (ss) Load reads multithreaded to increase speed. +unpigz=t Use pigz to decompress. + +Output parameters: +overwrite=t (ow) Set to true to allow overwriting of existing files. +path=. Directory to write quality matrices (within /ref subdir). +write=t Write matrices. +showstats=t Print a summary. +pigz=f Use pigz to compress. + +Other parameters: +t=auto Number of worker threads. +passes=2 Recalibration passes, 1 or 2. 2 is slower but gives more + accurate quality scores. +recalqmax=42 Adjust max quality scores tracked. The actual highest + quality score allowed is recalqmax-1. +trackall=f Track all available quality metrics and produce all + matrices, including the ones that are not selected for + quality adjustment. Reduces speed, but allows testing the + effects of different recalibration matrices. +indels=t Include indels in quality calculations. + +Variation calling: +varfile=<file> Use the variants in this var file, instead of calling + variants. The format can be produced by CallVariants. +vcf=<file> Use the variants in this VCF file, instead of + calling variants. +callvars=f Call SNPs, and do not count them as errors. +ploidy=1 Set the organism's ploidy. +ref= Required for variation-calling. + +*** 'Variant-Calling Cutoffs' flags in callvariants.sh are also supported *** + +Selecting matrices: +loadq102= For each recalibration matrix, enable or disable that matrix with t/f. + You can specify pass1 or pass2 like this: loadq102_p1=f loadq102_p2=t. + The default is loadqbp_p1=t loadqbp_p2=t loadqb123_p=t. +clearmatrices=f If true, clear all the existing matrix selections. For example: + 'clearmatrices loadqbp_p1' + This would ignore defaults and select only qbp for the first pass. + +Available matrices: +q102 Quality, leading quality, trailing quality. +qap Quality, average quality, position. +qbp Quality, current base, position. +q10 Quality, leading quality. +q12 Quality, trailing quality. +qb12 Quality, leading base, current base. +qb012 Quality, two leading bases, current base. +qb123 Quality, leading base, current base, trailing base. +qb234 Quality, current base, two trailing bases. +q12b12 Quality, trailing quality, leading base, current base. +qp Quality, position. +q Current quality score only. + + +Java Parameters: +-Xmx This will set Java's memory usage, overriding autodetection. + -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. + The max is typically 85% of physical memory. +-eoom This flag will cause the process to exit if an + out-of-memory exception occurs. Requires Java 8u92+. +-da Disable assertions. + +Please contact Brian Bushnell at bbushnell@lbl.gov if you encounter any problems. +" +} + +#This block allows symlinked shellscripts to correctly set classpath. +pushd . > /dev/null +DIR="${BASH_SOURCE[0]}" +while [ -h "$DIR" ]; do + cd "$(dirname "$DIR")" + DIR="$(readlink "$(basename "$DIR")")" +done +cd "$(dirname "$DIR")" +DIR="$(pwd)/" +popd > /dev/null + +#DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )/" +CP="$DIR""current/" + +z="-Xmx2g" +z2="-Xms2g" +set=0 + +if [ -z "$1" ] || [[ $1 == -h ]] || [[ $1 == --help ]]; then + usage + exit +fi + +calcXmx () { + source "$DIR""/calcmem.sh" + setEnvironment + parseXmx "$@" + if [[ $set == 1 ]]; then + return + fi + freeRam 3200m 84 + z="-Xmx${RAM}m" + z2="-Xms${RAM}m" +} +calcXmx "$@" + +calctruequality() { + local CMD="java $EA $EOOM $z $z2 -cp $CP jgi.CalcTrueQuality $@" + echo $CMD >&2 + eval $CMD +} + +calctruequality "$@"