jpayne@69: #!/bin/bash jpayne@69: jpayne@69: usage(){ jpayne@69: echo " jpayne@69: Written by Brian Bushnell jpayne@69: Last modified March 3, 2020 jpayne@69: jpayne@69: Description: Generates basic assembly statistics such as scaffold count, jpayne@69: N50, L50, GC content, gap percent, etc. For multiple files, please use jpayne@69: statswrapper.sh. Works with fasta and fastq only (gzipped is fine). jpayne@69: Please read bbmap/docs/guides/StatsGuide.txt for more information. jpayne@69: jpayne@69: Usage: stats.sh in= jpayne@69: jpayne@69: Parameters: jpayne@69: in=file Specify the input fasta file, or stdin. jpayne@69: out=stdout Destination of primary output; may be directed to a file. jpayne@69: gc=file Writes ACGTN content per scaffold to a file. jpayne@69: gchist=file Filename to output scaffold gc content histogram. jpayne@69: shist=file Filename to output cumulative scaffold length histogram. jpayne@69: gcbins=200 Number of bins for gc histogram. jpayne@69: n=10 Number of contiguous Ns to signify a break between contigs. jpayne@69: k=13 Estimate memory usage of BBMap with this kmer length. jpayne@69: minscaf=0 Ignore scaffolds shorter than this. jpayne@69: phs=f (printheaderstats) Set to true to print total size of headers. jpayne@69: n90=t (printn90) Print the N/L90 metrics. jpayne@69: extended=f Print additional metrics such as L90, logsum, and score. jpayne@69: pdl=f (printduplicatelines) Set to true to print lines in the jpayne@69: scaffold size table where the counts did not change. jpayne@69: n_=t This flag will prefix the terms 'contigs' and 'scaffolds' jpayne@69: with 'n_' in formats 3-6. jpayne@69: addname=f Adds a column for input file name, for formats 3-6. jpayne@69: jpayne@69: Logsum and Powsum: jpayne@69: logoffset=1000 Minimum length for calculating log sum. jpayne@69: logbase=2 Log base for calculating log sum. jpayne@69: logpower=1 Raise the log to a power to increase the weight jpayne@69: of longer scaffolds for log sum. jpayne@69: powsum=0.25 Use this power of the length to increase weight jpayne@69: of longer scaffolds for power sum. jpayne@69: jpayne@69: Assembly Score Metric: jpayne@69: score=f Print assembly score. jpayne@69: aligned=0.0 Set the fraction of aligned reads (0-1). jpayne@69: assemblyscoreminlen=2000 Minimum length of scaffolds to include in jpayne@69: assembly score calculation. jpayne@69: assemblyscoremaxlen=50000 Maximum length of scaffolds to get bonus points jpayne@69: for being long. jpayne@69: jpayne@69: jpayne@69: format=<0-7> Format of the stats information; default 1. jpayne@69: format=0 prints no assembly stats. jpayne@69: format=1 uses variable units like MB and KB, and is designed for compatibility with existing tools. jpayne@69: format=2 uses only whole numbers of bases, with no commas in numbers, and is designed for machine parsing. jpayne@69: format=3 outputs stats in 2 rows of tab-delimited columns: a header row and a data row. jpayne@69: format=4 is like 3 but with scaffold data only. jpayne@69: format=5 is like 3 but with contig data only. jpayne@69: format=6 is like 3 but the header starts with a #. jpayne@69: format=7 is like 1 but only prints contig info. jpayne@69: format=8 is like 3 but in JSON. You can also just use the 'json' flag. jpayne@69: jpayne@69: gcformat=<0-5> Select GC output format; default 1. jpayne@69: gcformat=0: (no base content info printed) jpayne@69: gcformat=1: name length A C G T N GC jpayne@69: gcformat=2: name GC jpayne@69: gcformat=4: name length GC jpayne@69: gcformat=5: name length GC logsum powsum jpayne@69: Note that in gcformat 1, A+C+G+T=1 even when N is nonzero. jpayne@69: jpayne@69: Please contact Brian Bushnell at bbushnell@lbl.gov if you encounter any problems. jpayne@69: " jpayne@69: } jpayne@69: jpayne@69: #This block allows symlinked shellscripts to correctly set classpath. jpayne@69: pushd . > /dev/null jpayne@69: DIR="${BASH_SOURCE[0]}" jpayne@69: while [ -h "$DIR" ]; do jpayne@69: cd "$(dirname "$DIR")" jpayne@69: DIR="$(readlink "$(basename "$DIR")")" jpayne@69: done jpayne@69: cd "$(dirname "$DIR")" jpayne@69: DIR="$(pwd)/" jpayne@69: popd > /dev/null jpayne@69: jpayne@69: #DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )/" jpayne@69: CP="$DIR""current/" jpayne@69: jpayne@69: z="-Xmx120m" jpayne@69: set=0 jpayne@69: jpayne@69: if [ -z "$1" ] || [[ $1 == -h ]] || [[ $1 == --help ]]; then jpayne@69: usage jpayne@69: exit jpayne@69: fi jpayne@69: jpayne@69: calcXmx () { jpayne@69: source "$DIR""/calcmem.sh" jpayne@69: setEnvironment jpayne@69: parseXmx "$@" jpayne@69: } jpayne@69: calcXmx "$@" jpayne@69: jpayne@69: stats() { jpayne@69: local CMD="java $EA $EOOM $z -cp $CP jgi.AssemblyStats2 $@" jpayne@69: # echo $CMD >&2 jpayne@69: eval $CMD jpayne@69: } jpayne@69: jpayne@69: stats "$@"