Mercurial > repos > rliterman > csp2

diff CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/share/man/man1/samtools.1 @ 68:5028fdace37b
planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author: jpayne
date: Tue, 18 Mar 2025 16:23:26 -0400
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/share/man/man1/samtools.1	Tue Mar 18 16:23:26 2025 -0400
@@ -0,0 +1,2192 @@
+'\" t
+.TH samtools 1 "21 June 2017" "samtools-1.5" "Bioinformatics tools"
+.SH NAME
+samtools \- Utilities for the Sequence Alignment/Map (SAM) format
+.\"
+.\" Copyright (C) 2008-2011, 2013-2017 Genome Research Ltd.
+.\" Portions copyright (C) 2010, 2011 Broad Institute.
+.\"
+.\" Author: Heng Li <lh3@sanger.ac.uk>
+.\" Author: Joshua C. Randall <jcrandall@alum.mit.edu>
+.\"
+.\" Permission is hereby granted, free of charge, to any person obtaining a
+.\" copy of this software and associated documentation files (the "Software"),
+.\" to deal in the Software without restriction, including without limitation
+.\" the rights to use, copy, modify, merge, publish, distribute, sublicense,
+.\" and/or sell copies of the Software, and to permit persons to whom the
+.\" Software is furnished to do so, subject to the following conditions:
+.\"
+.\" The above copyright notice and this permission notice shall be included in
+.\" all copies or substantial portions of the Software.
+.\"
+.\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+.\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+.\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+.\" THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+.\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+.\" FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+.\" DEALINGS IN THE SOFTWARE.
+.
+.\" For code blocks and examples (cf groff's Ultrix-specific man macros)
+.de EX
+
+.  in +\\$1
+.  nf
+.  ft CR
+..
+.de EE
+.  ft
+.  fi
+.  in
+
+..
+.
+.SH SYNOPSIS
+.PP
+samtools view -bt ref_list.txt -o aln.bam aln.sam.gz
+.PP
+samtools sort -T /tmp/aln.sorted -o aln.sorted.bam aln.bam
+.PP
+samtools index aln.sorted.bam
+.PP
+samtools idxstats aln.sorted.bam
+.PP
+samtools flagstat aln.sorted.bam
+.PP
+samtools stats aln.sorted.bam
+.PP
+samtools bedcov aln.sorted.bam
+.PP
+samtools depth aln.sorted.bam
+.PP
+samtools view aln.sorted.bam chr2:20,100,000-20,200,000
+.PP
+samtools merge out.bam in1.bam in2.bam in3.bam
+.PP
+samtools faidx ref.fasta
+.PP
+samtools tview aln.sorted.bam ref.fasta
+.PP
+samtools split merged.bam
+.PP
+samtools quickcheck in1.bam in2.cram
+.PP
+samtools dict -a GRCh38 -s "Homo sapiens" ref.fasta
+.PP
+samtools fixmate in.namesorted.sam out.bam
+.PP
+samtools mpileup -C50 -gf ref.fasta -r chr3:1,000-2,000 in1.bam in2.bam
+.PP
+samtools flags PAIRED,UNMAP,MUNMAP
+.PP
+samtools fastq input.bam > output.fastq
+.PP
+samtools fasta input.bam > output.fasta
+.PP
+samtools addreplacerg -r 'ID:fish' -r 'LB:1334' -r 'SM:alpha' -o output.bam input.bam
+.PP
+samtools collate aln.sorted.bam aln.name_collated.bam
+.PP
+samtools depad input.bam
+
+.SH DESCRIPTION
+.PP
+Samtools is a set of utilities that manipulate alignments in the BAM
+format. It imports from and exports to the SAM (Sequence Alignment/Map)
+format, does sorting, merging and indexing, and allows to retrieve reads
+in any regions swiftly.
+
+Samtools is designed to work on a stream. It regards an input file `-'
+as the standard input (stdin) and an output file `-' as the standard
+output (stdout). Several commands can thus be combined with Unix
+pipes. Samtools always output warning and error messages to the standard
+error output (stderr).
+
+Samtools is also able to open a BAM (not SAM) file on a remote FTP or
+HTTP server if the BAM file name starts with `ftp://' or `http://'.
+Samtools checks the current working directory for the index file and
+will download the index upon absence. Samtools does not retrieve the
+entire alignment file unless it is asked to do so.
+
+.SH COMMANDS AND OPTIONS
+
+.TP 10 \"-------- view
+.B view
+samtools view
+.RI [ options ]
+.IR in.sam | in.bam | in.cram
+.RI [ region ...]
+
+With no options or regions specified, prints all alignments in the specified
+input alignment file (in SAM, BAM, or CRAM format) to standard output
+in SAM format (with no header).
+
+You may specify one or more space-separated region specifications after the
+input filename to restrict output to only those alignments which overlap the
+specified region(s). Use of region specifications requires a coordinate-sorted
+and indexed input file (in BAM or CRAM format).
+
+The
+.BR -b ,
+.BR -C ,
+.BR -1 ,
+.BR -u ,
+.BR -h ,
+.BR -H ,
+and
+.B -c
+options change the output format from the default of headerless SAM, and the
+.B -o
+and
+.B -U
+options set the output file name(s).
+
+The
+.B -t
+and
+.B -T
+options provide additional reference data. One of these two options is required
+when SAM input does not contain @SQ headers, and the
+.B -T
+option is required whenever writing CRAM output.
+
+The
+.BR -L ,
+.BR -r ,
+.BR -R ,
+.BR -s ,
+.BR -q ,
+.BR -l ,
+.BR -m ,
+.BR -f ,
+.BR -F ,
+and
+.B -G
+options filter the alignments that will be included in the output to only those
+alignments that match certain criteria.
+
+The
+.B -x
+and
+.B -B
+options modify the data which is contained in each alignment.
+
+Finally, the
+.B -@
+option can be used to allocate additional threads to be used for compression, and the
+.B -?
+option requests a long help message.
+
+.TP
+.B REGIONS:
+.RS
+Regions can be specified as: RNAME[:STARTPOS[-ENDPOS]] and all position
+coordinates are 1-based.
+
+Important note: when multiple regions are given, some alignments may be output
+multiple times if they overlap more than one of the specified regions.
+
+Examples of region specifications:
+.TP 10
+.B chr1
+Output all alignments mapped to the reference sequence named `chr1' (i.e. @SQ SN:chr1).
+.TP
+.B chr2:1000000
+The region on chr2 beginning at base position 1,000,000 and ending at the
+end of the chromosome.
+.TP
+.B chr3:1000-2000
+The 1001bp region on chr3 beginning at base position 1,000 and ending at base
+position 2,000 (including both end positions).
+.TP
+.B '*'
+Output the unmapped reads at the end of the file.
+(This does not include any unmapped reads placed on a reference sequence
+alongside their mapped mates.)
+.TP
+.B .
+Output all alignments.
+(Mostly unnecessary as not specifying a region at all has the same effect.)
+.RE
+
+.B OPTIONS:
+.RS
+.TP 10
+.B -b
+Output in the BAM format.
+.TP
+.B -C
+Output in the CRAM format (requires -T).
+.TP
+.B -1
+Enable fast BAM compression (implies -b).
+.TP
+.B -u
+Output uncompressed BAM. This option saves time spent on
+compression/decompression and is thus preferred when the output is piped
+to another samtools command.
+.TP
+.B -h
+Include the header in the output.
+.TP
+.B -H
+Output the header only.
+.TP
+.B -c
+Instead of printing the alignments, only count them and print the
+total number. All filter options, such as
+.BR -f ,
+.BR -F ,
+and
+.BR -q ,
+are taken into account.
+.TP
+.B -?
+Output long help and exit immediately.
+.TP
+.BI "-o " FILE
+Output to
+.I FILE [stdout].
+.TP
+.BI "-U " FILE
+Write alignments that are
+.I not
+selected by the various filter options to
+.IR FILE .
+When this option is used, all alignments (or all alignments intersecting the
+.I regions
+specified) are written to either the output file or this file, but never both.
+.TP
+.BI "-t " FILE
+A tab-delimited
+.IR FILE .
+Each line must contain the reference name in the first column and the length of
+the reference in the second column, with one line for each distinct reference.
+Any additional fields beyond the second column are ignored. This file also
+defines the order of the reference sequences in sorting. If you run:
+`samtools faidx <ref.fa>', the resulting index file
+.I <ref.fa>.fai
+can be used as this
+.IR FILE .
+.TP
+.BI "-T " FILE
+A FASTA format reference
+.IR FILE ,
+optionally compressed by
+.B bgzip
+and ideally indexed by
+.B samtools
+.BR faidx .
+If an index is not present, one will be generated for you.
+.TP
+.BI "-L " FILE
+Only output alignments overlapping the input BED
+.I FILE
+[null].
+.TP
+.BI "-r " STR
+Only output alignments in read group
+.I STR
+[null].
+.TP
+.BI "-R " FILE
+Output alignments in read groups listed in
+.I FILE
+[null].
+.TP
+.BI "-q " INT
+Skip alignments with MAPQ smaller than
+.I INT
+[0].
+.TP
+.BI "-l " STR
+Only output alignments in library
+.I STR
+[null].
+.TP
+.BI "-m " INT
+Only output alignments with number of CIGAR bases consuming query
+sequence \(>=
+.I INT
+[0]
+.TP
+.BI "-f " INT
+Only output alignments with all bits set in
+.I INT
+present in the FLAG field.
+.I INT
+can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/)
+or in octal by beginning with `0' (i.e. /^0[0-7]+/) [0].
+.TP
+.BI "-F " INT
+Do not output alignments with any bits set in
+.I INT
+present in the FLAG field.
+.I INT
+can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/)
+or in octal by beginning with `0' (i.e. /^0[0-7]+/) [0].
+.TP
+.BI "-G " INT
+Do not output alignments with all bits set in
+.I INT
+present in the FLAG field.  This is the opposite of \fI-f\fR such
+that \fI-f12 -G12\fR is the same as no filtering at all.
+.I INT
+can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/)
+or in octal by beginning with `0' (i.e. /^0[0-7]+/) [0].
+.TP
+.BI "-x " STR
+Read tag to exclude from output (repeatable) [null]
+.TP
+.B -B
+Collapse the backward CIGAR operation.
+.TP
+.BI "-s " FLOAT
+Output only a proportion of the input alignments.
+This subsampling acts in the same way on all of the alignment records in
+the same template or read pair, so it never keeps a read but not its mate.
+.IP
+The integer and fractional parts of the
+.BI "-s " INT . FRAC
+option are used separately: the part after the
+decimal point sets the fraction of templates/pairs to be kept,
+while the integer part is used as a seed that influences
+.I which
+subset of reads is kept.
+.IP
+.\" Reads are retained based on a score computed by hashing their QNAME
+.\" field and the seed value.
+When subsampling data that has previously been subsampled, be sure to use
+a different seed value from those used previously; otherwise more reads
+will be retained than expected.
+.TP
+.BI "-@ " INT
+Number of BAM compression threads to use in addition to main thread [0].
+.TP
+.B -S
+Ignored for compatibility with previous samtools versions.
+Previously this option was required if input was in SAM format, but now the
+correct format is automatically detected by examining the first few characters
+of input.
+.RE
+
+.TP \"-------- sort
+.B sort
+.na
+samtools sort
+.RB [ -l
+.IR level ]
+.RB [ -m
+.IR maxMem ]
+.RB [ -o
+.IR out.bam ]
+.RB [ -O
+.IR format ]
+.RB [ -n ]
+.RB [ -t
+.IR tag ]
+.RB [ -T
+.IR tmpprefix ]
+.RB [ -@
+.IR threads "] [" in.sam | in.bam | in.cram ]
+.ad
+
+Sort alignments by leftmost coordinates, or by read name when
+.B -n
+is used.
+An appropriate
+.B @HD-SO
+sort order header tag will be added or an existing one updated if necessary.
+
+The sorted output is written to standard output by default, or to the
+specified file
+.RI ( out.bam )
+when
+.B -o
+is used.
+This command will also create temporary files
+.IB tmpprefix . %d .bam
+as needed when the entire alignment data cannot fit into memory
+(as controlled via the
+.B -m
+option).
+
+.B Options:
+.RS
+.TP 11
+.BI "-l " INT
+Set the desired compression level for the final output file, ranging from 0
+(uncompressed) or 1 (fastest but minimal compression) to 9 (best compression
+but slowest to write), similarly to
+.BR gzip (1)'s
+compression level setting.
+.IP
+If
+.B -l
+is not used, the default compression level will apply.
+.TP
+.BI "-m " INT
+Approximately the maximum required memory per thread, specified either in bytes
+or with a
+.BR K ", " M ", or " G
+suffix.
+[768 MiB]
+.IP
+To prevent sort from creating a huge number of temporary files, it enforces a
+minimum value of 1M for this setting.
+.TP
+.B -n
+Sort by read names (i.e., the
+.B QNAME
+field) rather than by chromosomal coordinates.
+.TP
+.BI "-t " TAG
+Sort first by the value in the alignment tag TAG, then by position or name (if
+also using \fB-n\fP).
+.BI "-o " FILE
+Write the final sorted output to
+.IR FILE ,
+rather than to standard output.
+.TP
+.BI "-O " FORMAT
+Write the final output as
+.BR sam ", " bam ", or " cram .
+
+By default, samtools tries to select a format based on the
+.B -o
+filename extension; if output is to standard output or no format can be
+deduced,
+.B bam
+is selected.
+.TP
+.BI "-T " PREFIX
+Write temporary files to
+.IB PREFIX . nnnn .bam,
+or if the specified
+.I PREFIX
+is an existing directory, to
+.IB PREFIX /samtools. mmm . mmm .tmp. nnnn .bam,
+where
+.I mmm
+is unique to this invocation of the
+.B sort
+command.
+.IP
+By default, any temporary files are written alongside the output file, as
+.IB out.bam .tmp. nnnn .bam,
+or if output is to standard output, in the current directory as
+.BI samtools. mmm . mmm .tmp. nnnn .bam.
+.TP
+.BI "-@ " INT
+Set number of sorting and compression threads.
+By default, operation is single-threaded.
+.PP
+.B Ordering Rules
+
+The following rules are used for ordering records.
+
+If option \fB-t\fP is in use, records are first sorted by the value of
+the given alignment tag, and then by position or name (if using \fB-n\fP).
+For example, \*(lq-t RG\*(rq will make read group the primary sort key.  The
+rules for ordering by tag are:
+
+.IP \(bu 4
+Records that do not have the tag are sorted before ones that do.
+.IP \(bu 4
+If the types of the tags are different, they will be sorted so
+that single character tags (type A) come before array tags (type B), then
+string tags (types H and Z), then numeric tags (types f and i).
+.IP \(bu 4
+Numeric tags (types f and i) are compared by value.  Note that comparisons
+of floating-point values are subject to issues of rounding and precision.
+.IP \(bu 4
+String tags (types H and Z) are compared based on the binary
+contents of the tag using the C
+.BR strcmp (3)
+function.
+.IP \(bu 4
+Character tags (type A) are compared by binary character value.
+.IP \(bu 4
+No attempt is made to compare tags of other types \(em notably type B
+array values will not be compared.
+.PP
+When the \fB-n\fP option is present, records are sorted by name.  Names are
+compared so as to give a \*(lqnatural\*(rq ordering \(em i.e. sections
+consisting of digits are compared numerically while all other sections are
+compared based on their binary representation.  This means \*(lqa1\*(rq will
+come before \*(lqb1\*(rq and \*(lqa9\*(rq will come before \*(lqa10\*(rq.
+Records with the same name will be ordered according to the values of
+the READ1 and READ2 flags (see
+.BR flags ).
+
+When the \fB-n\fP option is
+.B not
+present, reads are sorted by reference (according to the order of the @SQ
+header records), then by position in the reference, and then by the REVERSE
+flag.
+
+.B Note
+
+.PP
+Historically
+.B samtools sort
+also accepted a less flexible way of specifying the final and
+temporary output filenames:
+.IP
+samtools sort
+.RB [ -f "] [" -o ]
+.I in.bam out.prefix
+.PP
+This has now been removed.
+The previous \fIout.prefix\fP argument (and \fB-f\fP option, if any)
+should be changed to an appropriate combination of \fB-T\fP \fIPREFIX\fP
+and \fB-o\fP \fIFILE\fP.  The previous \fB-o\fP option should be removed,
+as output defaults to standard output.
+.RE
+
+.TP \"-------- index
+.B index
+samtools index
+.RB [ -bc ]
+.RB [ -m
+.IR INT ]
+.IR aln.bam | aln.cram
+.RI [ out.index ]
+
+Index a coordinate-sorted BAM or CRAM file for fast random access.
+(Note that this does not work with SAM files even if they are bgzip
+compressed \(em to index such files, use tabix(1) instead.)
+
+This index is needed when
+.I region
+arguments are used to limit
+.B samtools view
+and similar commands to particular regions of interest.
+
+If an output filename is given, the index file will be written to
+.IR out.index .
+Otherwise, for a CRAM file
+.IR aln.cram ,
+index file
+.IB aln.cram .crai
+will be created; for a BAM file
+.IR aln.bam ,
+either
+.IB aln.bam .bai
+or
+.IB aln.bam .csi
+will be created, depending on the index format selected.
+
+.B Options:
+.RS
+.TP 8
+.B -b
+Create a BAI index.
+This is currently the default when no format options are used.
+.TP
+.B -c
+Create a CSI index.
+By default, the minimum interval size for the index is 2^14, which is the same
+as the fixed value used by the BAI format.
+.TP
+.BI "-m " INT
+Create a CSI index, with a minimum interval size of 2^INT.
+.RE
+
+.TP \"-------- idxstats
+.B idxstats
+samtools idxstats
+.IR in.sam | in.bam | in.cram
+
+Retrieve and print stats in the index file corresponding to the input file.
+Before calling idxstats, the input BAM file must be indexed by samtools index.
+
+The output is TAB-delimited with each line consisting of reference sequence
+name, sequence length, # mapped reads and # unmapped reads. It is written to
+stdout.
+
+.TP \"-------- flagstat
+.B flagstat
+samtools flagstat
+.IR in.sam | in.bam | in.cram
+
+Does a full pass through the input file to calculate and print statistics
+to stdout.
+
+Provides counts for each of 13 categories based primarily on bit flags in
+the FLAG field. Each category in the output is broken down into QC pass and
+QC fail, which is presented as "#PASS + #FAIL" followed by a description of
+the category.
+
+The first row of output gives the total number of reads that are QC pass and
+fail (according to flag bit 0x200). For example:
+
+  122 + 28 in total (QC-passed reads + QC-failed reads)
+
+Which would indicate that there are a total of 150 reads in the input file,
+122 of which are marked as QC pass and 28 of which are marked as "not passing
+quality controls"
+
+Following this, additional categories are given for reads which are:
+
+.RS 18
+.TP
+secondary
+0x100 bit set
+.TP
+supplementary
+0x800 bit set
+.TP
+duplicates
+0x400 bit set
+.TP
+mapped
+0x4 bit not set
+.TP
+paired in sequencing
+0x1 bit set
+.TP
+read1
+both 0x1 and 0x40 bits set
+.TP
+read2
+both 0x1 and 0x80 bits set
+.TP
+properly paired
+both 0x1 and 0x2 bits set and 0x4 bit not set
+.TP
+with itself and mate mapped
+0x1 bit set and neither 0x4 nor 0x8 bits set
+.TP
+singletons
+both 0x1 and 0x8 bits set and bit 0x4 not set
+.RE
+
+.RS 10
+And finally, two rows are given that additionally filter on the reference
+name (RNAME), mate reference name (MRNM), and mapping quality (MAPQ) fields:
+.RE
+
+.RS 18
+.TP
+with mate mapped to a different chr
+0x1 bit set and neither 0x4 nor 0x8 bits set and MRNM not equal to RNAME
+.TP
+with mate mapped to a different chr (mapQ>=5)
+0x1 bit set and neither 0x4 nor 0x8 bits set
+and MRNM not equal to RNAME and MAPQ >= 5
+.RE
+
+.TP \"-------- stats
+.B stats
+samtools stats
+.RI [ options ]
+.IR in.sam | in.bam | in.cram
+.RI [ region ...]
+
+samtools stats collects statistics from BAM files and outputs in a text format.
+The output can be visualized graphically using plot-bamstats.
+
+.B Options:
+.RS
+.TP 8
+.BI "-c, --coverage " MIN , MAX , STEP
+Set coverage distribution to the specified range (MIN, MAX, STEP all given as integers)
+[1,1000,1]
+.TP
+.B -d, --remove-dups
+Exclude from statistics reads marked as duplicates
+.TP
+.BI "-f, --required-flag "  STR "|" INT
+Required flag, 0 for unset. See also `samtools flags`
+[0]
+.TP
+.BI "-F, --filtering-flag " STR "|" INT
+Filtering flag, 0 for unset. See also `samtools flags`
+[0]
+.TP
+.BI "--GC-depth " FLOAT
+the size of GC-depth bins (decreasing bin size increases memory requirement)
+[2e4]
+.TP
+.B -h, --help
+This help message
+.TP
+.BI "-i, --insert-size " INT
+Maximum insert size
+[8000]
+.TP
+.BI "-I, --id " STR
+Include only listed read group or sample name
+[]
+.TP
+.BI "-l, --read-length " INT
+Include in the statistics only reads with the given read length
+[]
+.TP
+.BI "-m, --most-inserts " FLOAT
+Report only the main part of inserts
+[0.99]
+.TP
+.BI "-P, --split-prefix " STR
+A path or string prefix to prepend to filenames output when creating
+categorised statistics files with
+.BR -S / --split .
+[input filename]
+.TP
+.BI "-q, --trim-quality " INT
+The BWA trimming parameter
+[0]
+.TP
+.BI "-r, --ref-seq " FILE
+Reference sequence (required for GC-depth and mismatches-per-cycle calculation).
+[]
+.TP
+.BI "-S, --split " TAG
+In addition to the complete statistics, also output categorised statistics
+based on the tagged field
+.I TAG
+(e.g., use
+.B --split RG
+to split into read groups).
+
+Categorised statistics are written to files named
+.RI < prefix >_< value >.bamstat,
+where
+.I prefix
+is as given by
+.B --split-prefix
+(or the input filename by default) and
+.I value
+has been encountered as the specified tagged field's value in one or more
+alignment records.
+.TP
+.BI "-t, --target-regions " FILE
+Do stats in these regions only. Tab-delimited file chr,from,to, 1-based, inclusive.
+[]
+.TP
+.B "-x, --sparse"
+Suppress outputting IS rows where there are no insertions.
+.RE
+
+.TP \"-------- bedcov
+.B bedcov
+samtools bedcov
+.RI [ options ]
+.IR region.bed " " in1.sam | in1.bam | in1.cram "[...]"
+
+Reports the total read base count (i.e. the sum of per base read depths)
+for each genomic region specified in the supplied BED file.
+Counts for each alignment file supplied are reported in separate columns.
+
+.B Options:
+.RS
+.TP
+.BI "-Q " INT
+.RI "Only count reads with mapping quality greater than " INT
+.RE
+
+.TP \"-------- depth
+.B depth
+samtools depth
+.RI [ options ]
+.RI "[" in1.sam | in1.bam | in1.cram " [" in2.sam | in2.bam | in2.cram "] [...]]"
+
+Computes the depth at each position or region.
+
+.B Options:
+.RS
+.TP 8
+.B -a
+Output all positions (including those with zero depth)
+.TP
+.B -a -a, -aa
+Output absolutely all positions, including unused reference sequences.
+Note that when used in conjunction with a BED file the -a option may
+sometimes operate as if -aa was specified if the reference sequence
+has coverage outside of the region specified in the BED file.
+.TP
+.BI "-b "  FILE
+.RI "Compute depth at list of positions or regions in specified BED " FILE.
+[]
+.TP
+.BI "-f " FILE
+.RI "Use the BAM files specified in the " FILE
+(a file of filenames, one file per line)
+[]
+.TP
+.BI "-l " INT
+.RI "Ignore reads shorter than " INT
+.TP
+.BI "-m, -d " INT
+.RI "Truncate reported depth at a maximum of " INT " reads."
+[8000]
+.TP
+.BI "-q " INT
+.RI "Only count reads with base quality greater than " INT
+.TP
+.BI "-Q " INT
+.RI "Only count reads with mapping quality greater than " INT
+.TP
+.BI "-r " CHR ":" FROM "-" TO
+Only report depth in specified region.
+.RE
+
+.TP \"-------- merge
+.B merge
+samtools merge [-nur1f] [-h inh.sam] [-R reg] [-b <list>] <out.bam> <in1.bam> [<in2.bam> <in3.bam> ... <inN.bam>]
+
+Merge multiple sorted alignment files, producing a single sorted output file
+that contains all the input records and maintains the existing sort order.
+
+If
+.BR -h
+is specified the @SQ headers of input files will be merged into the specified header, otherwise they will be merged
+into a composite header created from the input headers.  If in the process of merging @SQ lines for coordinate sorted
+input files, a conflict arises as to the order (for example input1.bam has @SQ for a,b,c and input2.bam has b,a,c)
+then the resulting output file will need to be re-sorted back into coordinate order.
+
+Unless the
+.BR -c
+or
+.BR -p
+flags are specified then when merging @RG and @PG records into the output header then any IDs found to be duplicates
+of existing IDs in the output header will have a suffix appended to them to differentiate them from similar header
+records from other files and the read records will be updated to reflect this.
+
+The ordering of the records in the input files must match the usage of the
+\fB-n\fP and \fB-t\fP command-line options.  If they do not, the output
+order will be undefined.  See
+.B sort
+for information about record ordering.
+
+.B OPTIONS:
+.RS
+.TP 8
+.B -1
+Use zlib compression level 1 to compress the output.
+.TP
+.BI -b \ FILE
+List of input BAM files, one file per line.
+.TP
+.B -f
+Force to overwrite the output file if present.
+.TP 8
+.BI -h \ FILE
+Use the lines of
+.I FILE
+as `@' headers to be copied to
+.IR out.bam ,
+replacing any header lines that would otherwise be copied from
+.IR in1.bam .
+.RI ( FILE
+is actually in SAM format, though any alignment records it may contain
+are ignored.)
+.TP
+.B -n
+The input alignments are sorted by read names rather than by chromosomal
+coordinates
+.TP
+.B -t TAG
+The input alignments have been sorted by the value of TAG, then by either
+position or name (if \fB-n\fP is given).
+.TP
+.BI -R \ STR
+Merge files in the specified region indicated by
+.I STR
+[null]
+.TP
+.B -r
+Attach an RG tag to each alignment. The tag value is inferred from file names.
+.TP
+.B -u
+Uncompressed BAM output
+.TP
+.B -c
+When several input files contain @RG headers with the same ID, emit only one
+of them (namely, the header line from the first file we find that ID in) to
+the merged output file.
+Combining these similar headers is usually the right thing to do when the
+files being merged originated from the same file.
+
+Without \fB-c\fP, all @RG headers appear in the output file, with random
+suffixes added to their IDs where necessary to differentiate them.
+.TP
+.B -p
+Similarly, for each @PG ID in the set of files to merge, use the @PG line
+of the first file we find that ID in rather than adding a suffix to
+differentiate similar IDs.
+.RE
+
+.TP \"-------- faidx
+.B faidx
+samtools faidx <ref.fasta> [region1 [...]]
+
+Index reference sequence in the FASTA format or extract subsequence from
+indexed reference sequence. If no region is specified,
+.B faidx
+will index the file and create
+.I <ref.fasta>.fai
+on the disk. If regions are specified, the subsequences will be
+retrieved and printed to stdout in the FASTA format.
+
+The input file can be compressed in the
+.B BGZF
+format.
+
+The sequences in the input file should all have different names.
+If they do not, indexing will emit a warning about duplicate sequences and
+retrieval will only produce subsequences from the first sequence with the
+duplicated name.
+
+.TP \"-------- tview
+.B tview
+samtools tview
+.RB [ -p
+.IR chr:pos ]
+.RB [ -s
+.IR STR ]
+.RB [ -d
+.IR display ]
+.RI <in.sorted.bam>
+.RI [ref.fasta]
+
+Text alignment viewer (based on the ncurses library). In the viewer,
+press `?' for help and press `g' to check the alignment start from a
+region in the format like `chr10:10,000,000' or `=10,000,000' when
+viewing the same reference sequence.
+
+.B Options:
+.RS
+.TP 14
+.BI -d \ display
+Output as (H)tml or (C)urses or (T)ext
+.TP
+.BI -p \ chr:pos
+Go directly to this position
+.TP
+.BI -s \ STR
+Display only alignments from this sample or read group
+.RE
+
+.TP \"-------- split
+.B split
+samtools split
+.RI [ options ]
+.IR merged.sam | merged.bam | merged.cram
+
+Splits a file by read group.
+
+.B Options:
+.RS
+.TP 14
+.BI "-u " FILE1
+.RI "Put reads with no RG tag or an unrecognised RG tag into " FILE1
+.TP
+.BI "-u " FILE1 ":" FILE2
+.RI "As above, but assigns an RG tag as given in the header of " FILE2
+.TP
+.BI "-f " STRING
+Output filename format string (see below)
+["%*_%#.%."]
+.TP
+.B -v
+Verbose output
+.PP
+Format string expansions:
+.TS
+center;
+lb l .
+%%	%
+%*	basename
+%#	@RG index
+%!	@RG ID
+%.	output format filename extension
+.TE
+.RE
+
+.TP \"-------- quickcheck
+.B quickcheck
+samtools quickcheck
+.RI [ options ]
+.IR in.sam | in.bam | in.cram
+[ ... ]
+
+Quickly check that input files appear to be intact. Checks that beginning of the
+file contains a valid header (all formats) containing at least one target
+sequence and then seeks to the end of the file and checks that an end-of-file
+(EOF) is present and intact (BAM only).
+
+Data in the middle of the file is not read since that would be much more time
+consuming, so please note that this command will not detect internal corruption,
+but is useful for testing that files are not truncated before performing more
+intensive tasks on them.
+
+This command will exit with a non-zero exit code if any input files don't have a
+valid header or are missing an EOF block. Otherwise it will exit successfully
+(with a zero exit code).
+
+.B Options:
+.RS
+.TP 8
+.B -v
+Verbose output: will additionally print the names of all input files that don't
+pass the check to stdout. Multiple -v options will cause additional messages
+regarding check results to be printed to stderr.
+.RE
+
+.TP \"-------- dict
+.B dict
+samtools dict <ref.fasta|ref.fasta.gz>
+
+Create a sequence dictionary file from a fasta file.
+
+.B OPTIONS:
+.RS
+.TP 11
+.BI -a,\ --assembly \ STR
+Specify the assembly for the AS tag.
+.TP
+.B -H,\ --no-header
+Do not print the @HD header line.
+.TP
+.BI -o,\ --output \ FILE
+Output to
+.I FILE
+[stdout].
+.TP
+.BI -s,\ --species \ STR
+Specify the species for the SP tag.
+.TP
+.BI -u,\ --uri \ STR
+Specify the URI for the UR tag. Defaults to
+the absolute path of
+.I ref.fasta
+unless reading from stdin.
+.RE
+
+.TP \"-------- fixmate
+.B fixmate
+.na
+samtools fixmate
+.RB [ -rpc ]
+.RB [ -O
+.IR format ]
+.I in.nameSrt.bam out.bam
+.ad
+
+Fill in mate coordinates, ISIZE and mate related flags from a
+name-sorted alignment.
+
+.B OPTIONS:
+.RS
+.TP 11
+.B -r
+Remove secondary and unmapped reads.
+.TP
+.B -p
+Disable FR proper pair check.
+.TP
+.B -c
+Add template cigar ct tag.
+.TP
+.BI "-O " FORMAT
+Write the final output as
+.BR sam ", " bam ", or " cram .
+
+By default, samtools tries to select a format based on the output
+filename extension; if output is to standard output or no format can be
+deduced,
+.B bam
+is selected.
+.RE
+
+.TP \"-------- mpileup
+.B mpileup
+samtools mpileup
+.RB [ -EBugp ]
+.RB [ -C
+.IR capQcoef ]
+.RB [ -r
+.IR reg ]
+.RB [ -f
+.IR in.fa ]
+.RB [ -l
+.IR list ]
+.RB [ -Q
+.IR minBaseQ ]
+.RB [ -q
+.IR minMapQ ]
+.I in.bam
+.RI [ in2.bam
+.RI [ ... ]]
+
+Generate VCF, BCF or pileup for one or multiple BAM files. Alignment records
+are grouped by sample (SM) identifiers in @RG header lines. If sample
+identifiers are absent, each input file is regarded as one sample.
+
+In the pileup format (without
+.BR -u \ or \ -g ),
+each
+line represents a genomic position, consisting of chromosome name,
+1-based coordinate, reference base, the number of reads covering the site,
+read bases, base qualities and alignment
+mapping qualities. Information on match, mismatch, indel, strand,
+mapping quality and start and end of a read are all encoded at the read
+base column. At this column, a dot stands for a match to the reference
+base on the forward strand, a comma for a match on the reverse strand,
+a '>' or '<' for a reference skip, `ACGTN' for a mismatch on the forward
+strand and `acgtn' for a mismatch on the reverse strand. A pattern
+`\\+[0-9]+[ACGTNacgtn]+' indicates there is an insertion between this
+reference position and the next reference position. The length of the
+insertion is given by the integer in the pattern, followed by the
+inserted sequence. Similarly, a pattern `-[0-9]+[ACGTNacgtn]+'
+represents a deletion from the reference. The deleted bases will be
+presented as `*' in the following lines. Also at the read base column, a
+symbol `^' marks the start of a read. The ASCII of the character
+following `^' minus 33 gives the mapping quality. A symbol `$' marks the
+end of a read segment.
+
+Note that there are two orthogonal ways to specify locations in the
+input file; via \fB-r\fR \fIregion\fR and \fB-l\fR \fIfile\fR.  The
+former uses (and requires) an index to do random access while the
+latter streams through the file contents filtering out the specified
+regions, requiring no index.  The two may be used in conjunction.  For
+example a BED file containing locations of genes in chromosome 20
+could be specified using \fB-r 20 -l chr20.bed\fR, meaning that the
+index is used to find chromosome 20 and then it is filtered for the
+regions listed in the bed file.
+
+.B Input Options:
+.RS
+.TP 10
+.B -6, --illumina1.3+
+Assume the quality is in the Illumina 1.3+ encoding.
+.TP
+.B -A, --count-orphans
+Do not skip anomalous read pairs in variant calling.
+.TP
+.BI -b,\ --bam-list \ FILE
+List of input BAM files, one file per line [null]
+.TP
+.B -B, --no-BAQ
+Disable probabilistic realignment for the computation of base alignment
+quality (BAQ). BAQ is the Phred-scaled probability of a read base being
+misaligned. Applying this option greatly helps to reduce false SNPs
+caused by misalignments.
+.TP
+.BI -C,\ --adjust-MQ \ INT
+Coefficient for downgrading mapping quality for reads containing
+excessive mismatches. Given a read with a phred-scaled probability q of
+being generated from the mapped position, the new mapping quality is
+about sqrt((INT-q)/INT)*INT. A zero value disables this
+functionality; if enabled, the recommended value for BWA is 50. [0]
+.TP
+.BI -d,\ --max-depth \ INT
+At a position, read maximally
+.I INT
+reads per input file. Note that samtools has a minimum value of
+.I 8000/n
+where
+.I n
+is the number of input files given to mpileup.  This means the default
+is highly likely to be increased.  Once above the cross-sample minimum of
+8000 the -d parameter will have an effect. [250]
+.TP
+.B -E, --redo-BAQ
+Recalculate BAQ on the fly, ignore existing BQ tags
+.TP
+.BI -f,\ --fasta-ref \ FILE
+The
+.BR faidx -indexed
+reference file in the FASTA format. The file can be optionally compressed by
+.BR bgzip .
+[null]
+.TP
+.BI -G,\ --exclude-RG \ FILE
+Exclude reads from readgroups listed in FILE (one @RG-ID per line)
+.TP
+.BI -l,\ --positions \ FILE
+BED or position list file containing a list of regions or sites where
+pileup or BCF should be generated. Position list files contain two
+columns (chromosome and position) and start counting from 1.  BED
+files contain at least 3 columns (chromosome, start and end position)
+and are 0-based half-open.
+.br
+While it is possible to mix both position-list and BED coordinates in
+the same file, this is strongly ill advised due to the differing
+coordinate systems. [null]
+.TP
+.BI -q,\ -min-MQ \ INT
+Minimum mapping quality for an alignment to be used [0]
+.TP
+.BI -Q,\ --min-BQ \ INT
+Minimum base quality for a base to be considered [13]
+.TP
+.BI -r,\ --region \ STR
+Only generate pileup in region. Requires the BAM files to be indexed.
+If used in conjunction with -l then considers the intersection of the
+two requests.
+.I STR
+[all sites]
+.TP
+.B -R,\ --ignore-RG
+Ignore RG tags. Treat all reads in one BAM as one sample.
+.TP
+.BI --rf,\ --incl-flags \ STR|INT
+Required flags: skip reads with mask bits unset [null]
+.TP
+.BI --ff,\ --excl-flags \ STR|INT
+Filter flags: skip reads with mask bits set
+[UNMAP,SECONDARY,QCFAIL,DUP]
+.TP
+.B -x,\ --ignore-overlaps
+Disable read-pair overlap detection.
+.PP
+.B Output Options:
+.TP 10
+.BI "-o, --output " FILE
+Write pileup or VCF/BCF output to
+.IR FILE ,
+rather than the default of standard output.
+
+(The same short option is used for both
+.B --open-prob
+and
+.BR --output .
+If
+.BR -o 's
+argument contains any non-digit characters other than a leading + or - sign,
+it is interpreted as
+.BR --output .
+Usually the filename extension will take care of this, but to write to an
+entirely numeric filename use
+.B -o ./123
+or
+.BR "--output 123" .)
+.TP
+.B -g,\ --BCF
+Compute genotype likelihoods and output them in the binary call format (BCF).
+As of v1.0, this is BCF2 which is incompatible with the BCF1 format produced
+by previous (0.1.x) versions of samtools.
+.TP
+.B -v,\ --VCF
+Compute genotype likelihoods and output them in the variant call format (VCF).
+Output is bgzip-compressed VCF unless
+.B -u
+option is set.
+.PP
+.B Output Options for mpileup format (without -g or -v):
+.TP 10
+.B -O, --output-BP
+Output base positions on reads.
+.TP
+.B -s, --output-MQ
+Output mapping quality.
+.TP
+.B -a
+Output all positions, including those with zero depth.
+.TP
+.B -a -a, -aa
+Output absolutely all positions, including unused reference sequences.
+Note that when used in conjunction with a BED file the -a option may
+sometimes operate as if -aa was specified if the reference sequence
+has coverage outside of the region specified in the BED file.
+.PP
+.B Output Options for VCF/BCF format (with -g or -v):
+.TP 10
+.B -D
+Output per-sample read depth [DEPRECATED - use
+.B -t DP
+instead]
+.TP
+.B -S
+Output per-sample Phred-scaled strand bias P-value [DEPRECATED - use
+.B -t SP
+instead]
+.TP
+.BI -t,\ --output-tags \ LIST
+Comma-separated list of FORMAT and INFO tags to output (case-insensitive):
+.B AD
+(Allelic depth, FORMAT),
+.B INFO/AD
+(Total allelic depth, INFO),
+.B ADF
+(Allelic depths on the forward strand, FORMAT),
+.B INFO/ADF
+(Total allelic depths on the forward strand, INFO),
+.B ADR
+(Allelic depths on the reverse strand, FORMAT),
+.B INFO/ADR
+(Total allelic depths on the reverse strand, INFO),
+.B DP
+(Number of high-quality bases, FORMAT),
+.B DV
+(Deprecated in favor of AD; Number of high-quality non-reference bases, FORMAT),
+.B DPR
+(Deprecated in favor of AD; Number of high-quality bases for each observed allele, FORMAT),
+.B INFO/DPR
+(Number of high-quality bases for each observed allele, INFO),
+.B DP4
+(Deprecated in favor of ADF and ADR; Number of high-quality ref-forward, ref-reverse, alt-forward and alt-reverse bases, FORMAT),
+.B SP
+(Phred-scaled strand bias P-value, FORMAT)
+[null]
+.TP
+.B -u,\ --uncompressed
+Generate uncompressed VCF/BCF output, which is preferred for piping.
+.TP
+.B -V
+Output per-sample number of non-reference reads [DEPRECATED - use
+.B -t DV
+instead]
+.PP
+.B Options for SNP/INDEL Genotype Likelihood Computation (for -g or -v):
+.TP 10
+.BI -e,\ --ext-prob \ INT
+Phred-scaled gap extension sequencing error probability. Reducing
+.I INT
+leads to longer indels. [20]
+.TP
+.BI -F,\ --gap-frac \ FLOAT
+Minimum fraction of gapped reads [0.002]
+.TP
+.BI -h,\ --tandem-qual \ INT
+Coefficient for modeling homopolymer errors. Given an
+.IR l -long
+homopolymer
+run, the sequencing error of an indel of size
+.I s
+is modeled as
+.IR INT * s / l .
+[100]
+.TP
+.B -I, --skip-indels
+Do not perform INDEL calling
+.TP
+.BI -L,\ --max-idepth \ INT
+Skip INDEL calling if the average per-input-file depth is above
+.IR INT .
+[250]
+.TP
+.BI -m,\ --min-ireads \ INT
+Minimum number gapped reads for indel candidates
+.IR INT .
+[1]
+.TP
+.BI -o,\ --open-prob \ INT
+Phred-scaled gap open sequencing error probability. Reducing
+.I INT
+leads to more indel calls. [40]
+
+(The same short option is used for both
+.B --open-prob
+and
+.BR --output .
+When
+.BR -o 's
+argument contains only an optional + or - sign followed by the digits 0 to 9,
+it is interpreted as
+.BR --open-prob .)
+.TP
+.B -p, --per-sample-mF
+Apply
+.B -m
+and
+.B -F
+thresholds per sample to increase sensitivity of calling.
+By default both options are applied to reads pooled from all samples.
+.TP
+.BI -P,\ --platforms \ STR
+Comma-delimited list of platforms (determined by
+.BR @RG-PL )
+from which indel candidates are obtained. It is recommended to collect
+indel candidates from sequencing technologies that have low indel error
+rate such as ILLUMINA. [all]
+.RE
+
+.TP \"-------- flags
+.B flags
+samtools flags INT|STR[,...]
+
+Convert between textual and numeric flag representation.
+
+.B FLAGS:
+.TS
+rb l l .
+0x1	PAIRED	paired-end (or multiple-segment) sequencing technology
+0x2	PROPER_PAIR	each segment properly aligned according to the aligner
+0x4	UNMAP	segment unmapped
+0x8	MUNMAP	next segment in the template unmapped
+0x10	REVERSE	SEQ is reverse complemented
+0x20	MREVERSE	SEQ of the next segment in the template is reverse complemented
+0x40	READ1	the first segment in the template
+0x80	READ2	the last segment in the template
+0x100	SECONDARY	secondary alignment
+0x200	QCFAIL	not passing quality controls
+0x400	DUP	PCR or optical duplicate
+0x800	SUPPLEMENTARY	supplementary alignment
+.TE
+
+.TP \"-------- fastq fasta
+.B fastq/a
+samtools fastq
+.RI [ options ]
+.I in.bam
+.br
+samtools fasta
+.RI [ options ]
+.I in.bam
+
+Converts a BAM or CRAM into either FASTQ or FASTA format depending on the
+command invoked. The FASTQ files will be automatically compressed if the 
+filenames have a .gz or .bgzf extention.
+
+.B OPTIONS:
+.RS
+.TP 8
+.B -n
+By default, either '/1' or '/2' is added to the end of read names
+where the corresponding BAM_READ1 or BAM_READ2 flag is set.
+Using
+.B -n
+causes read names to be left as they are.
+.TP 8
+.B -N
+Always add either '/1' or '/2' to the end of read names
+even when put into different files.
+.TP 8
+.B -O
+Use quality values from OQ tags in preference to standard quality string
+if available.
+.TP 8
+.B -s FILE
+Write singleton reads in FASTQ format to FILE instead of outputting them.
+.TP 8
+.B -t
+Copy RG, BC and QT tags to the FASTQ header line, if they exist.
+.TP 8
+.B -T TAGLIST
+Specify a comma-separated list of tags to copy to the FASTQ header line, if they exist.
+.TP 8
+.B -1 FILE
+Write reads with the BAM_READ1 flag set to FILE instead of outputting them.
+.TP 8
+.B -2 FILE
+Write reads with the BAM_READ2 flag set to FILE instead of outputting them.
+.TP 8
+.B -0 FILE
+Write reads with both or neither of the BAM_READ1 and BAM_READ2 flags set
+to FILE instead of outputting them.
+.TP 8
+.BI "-f " INT
+Only output alignments with all bits set in
+.I INT
+present in the FLAG field.
+.I INT
+can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/)
+or in octal by beginning with `0' (i.e. /^0[0-7]+/) [0].
+.TP 8
+.BI "-F " INT
+Do not output alignments with any bits set in
+.I INT
+present in the FLAG field.
+.I INT
+can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/)
+or in octal by beginning with `0' (i.e. /^0[0-7]+/) [0].
+.TP 8
+.BI "-G " INT
+Only EXCLUDE reads with all of the bits set in
+.I INT
+present in the FLAG field.
+.I INT
+can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/)
+or in octal by beginning with `0' (i.e. /^0[0-7]+/) [0].
+.TP 8
+.B -i
+add Illumina Casava 1.8 format entry to header (eg 1:N:0:ATCACG)
+.TP 8
+.B -c [0..9]
+set compression level when writing gz or bgzf fastq files.
+.TP 8
+.B --i1 FILE
+write first index reads to FILE
+.TP 8
+.B --i2 FILE
+write second index reads to FILE
+.TP 8
+.B --barcode-tag TAG
+aux tag to find index reads in [default: BC]
+.TP 8
+.B --quality-tag TAG
+aux tag to find index quality in [default: QT]
+.TP 8
+.B --index-format STR
+string to describe how to parse the barcode and quality tags. For example:
+
+.RS
+.TP 8
+.B i14i8
+the first 14 characters are index 1, the next 8 characters are index 2
+.TP 8
+.B n8i14
+ignore the first 8 characters, and use the next 14 characters for index 1
+
+If the tag contains a separator, then the numeric part can be replaced with '*' to
+mean 'read until the separator or end of tag', for example:
+.TP 8
+.B n*i*
+ignore the left part of the tag until the separator, then use the second part
+.RE
+.RE
+
+.TP \"-------- collate
+.B collate
+samtools collate
+.RI [ options ]
+.IR in.sam | in.bam | in.cram " [" out.prefix "]"
+
+Shuffles and groups reads together by their names.
+A faster alternative to a full query name sort,
+.B collate
+ensures that reads of the same name are grouped together in contiguous groups,
+but doesn't make any guarantees about the order of read names between groups.
+
+The output from this command should be suitable for any operation that
+requires all reads from the same template to be grouped together.
+
+.B Options:
+.RS
+.TP 8
+.B -O
+Output to stdout rather than to files starting with out.prefix
+.TP
+.B -u
+Write uncompressed BAM output
+.TP
+.BI "-l "  INT
+Compression level.
+[1]
+.TP
+.BI "-n " INT
+Number of temporary files to use.
+[64]
+.RE
+
+.TP \"-------- reheader
+.B reheader
+samtools reheader
+.RB [ -iP ]
+.I in.header.sam in.bam
+
+Replace the header in
+.I in.bam
+with the header in
+.IR in.header.sam .
+This command is much faster than replacing the header with a
+BAM\(->SAM\(->BAM conversion.
+
+By default this command outputs the BAM or CRAM file to standard
+output (stdout), but for CRAM format files it has the option to
+perform an in-place edit, both reading and writing to the same file.
+No validity checking is performed on the header, nor that it is suitable
+to use with the sequence data itself.
+
+.B OPTIONS:
+.RS
+.TP 8
+.B -P, --no-PG
+Do not generate an @PG header line.
+.TP 8
+.B -i, --in-place
+Perform the header edit in-place, if possible.  This only works on CRAM
+files and only if there is sufficient room to store the new header.
+The amount of space available will differ for each CRAM file.
+.RE
+
+.TP \"-------- cat
+.B cat
+samtools cat [-b list] [-h header.sam] [-o out.bam] <in1.bam> <in2.bam> [ ... ]
+
+Concatenate BAMs or CRAMs. Although this works on either BAM or CRAM,
+all input files must be the same format as each other. The sequence
+dictionary of each input file must be identical, although this command
+does not check this. This command uses a similar trick to 
+.B reheader
+which enables fast BAM concatenation.
+
+.B OPTIONS:
+.RS
+.TP 8
+.BI "-b " FOFN
+Read the list of input BAM or CRAM files from \fIFOFN\fR.  These are
+concatenated prior to any files specified on the command line.
+Multiple \fB-b\fR \fIFOFN\fR options may be specified to concatenate
+multiple lists of BAM/CRAM files.
+.TP 8
+.BI "-h " FILE
+Uses the SAM header from \fIFILE\fR.  By default the header is taken
+from the first file to be concatenated.
+.TP 8
+.BI "-o " FILE
+Write the concatenated output to \fIFILE\fR.  By default this is sent
+to stdout.
+.RE
+
+.TP \"-------- rmdup
+.B rmdup
+samtools rmdup [-sS] <input.srt.bam> <out.bam>
+
+Remove potential PCR duplicates: if multiple read pairs have identical
+external coordinates, only retain the pair with highest mapping quality.
+In the paired-end mode, this command
+.B ONLY
+works with FR orientation and requires ISIZE is correctly set. It does
+not work for unpaired reads (e.g. two ends mapped to different
+chromosomes or orphan reads).
+
+.B OPTIONS:
+.RS
+.TP 8
+.B -s
+Remove duplicates for single-end reads. By default, the command works for
+paired-end reads only.
+.TP 8
+.B -S
+Treat paired-end reads and single-end reads.
+.RE
+
+.TP \"-------- addreplacerg
+.B addreplacerg
+samtools addreplacerg [-r rg line | -R rg ID] [-m mode] [-l level] [-o out.bam]
+<input.bam>
+
+Adds or replaces read group tags in a file.
+
+.B OPTIONS:
+.RS
+.TP 8
+.BI "-r " STRING
+Allows you to specify a read group line to append to the header and applies it
+to the reads specified by the -m option. If repeated it automatically adds in
+tabs between invocations.
+.TP 8
+.BI "-R " STRING
+Allows you to specify the read group ID of an existing @RG line and applies it
+to the reads specified.
+.TP 8
+.BI "-m " MODE
+If you choose orphan_only then existing RG tags are not overwritten, if you choose
+overwrite_all, existing RG tags are overwritten. The default is overwrite_all.
+.TP 8
+.BI "-o " STRING
+Write the final output to STRING. The default is to write to stdout.
+
+By default, samtools tries to select a format based on the output
+filename extension; if output is to standard output or no format can be
+deduced,
+.B bam
+is selected.
+.RE
+
+.TP \"-------- calmd
+.B calmd
+samtools calmd [-Eeubr] [-C capQcoef] <aln.bam> <ref.fasta>
+
+Generate the MD tag. If the MD tag is already present, this command will
+give a warning if the MD tag generated is different from the existing
+tag. Output SAM by default.
+
+Calmd can also read and write CRAM files although in most cases it is
+pointless as CRAM recalculates MD and NM tags on the fly.  The one
+exception to this case is where both input and output CRAM files
+have been / are being created with the \fIno_ref\fR option.
+
+.B OPTIONS:
+.RS
+.TP 8
+.B -A
+When used jointly with
+.B -r
+this option overwrites the original base quality.
+.TP 8
+.B -e
+Convert a the read base to = if it is identical to the aligned reference
+base. Indel caller does not support the = bases at the moment.
+.TP
+.B -u
+Output uncompressed BAM
+.TP
+.B -b
+Output compressed BAM
+.TP
+.BI -C \ INT
+Coefficient to cap mapping quality of poorly mapped reads. See the
+.B pileup
+command for details. [0]
+.TP
+.B -r
+Compute the BQ tag (without -A) or cap base quality by BAQ (with -A).
+.TP
+.B -E
+Extended BAQ calculation. This option trades specificity for sensitivity, though the
+effect is minor.
+.RE
+
+.TP \"-------- targetcut
+.B targetcut
+samtools targetcut [-Q minBaseQ] [-i inPenalty] [-0 em0] [-1 em1] [-2 em2] [-f ref] <in.bam>
+
+This command identifies target regions by examining the continuity of read depth, computes
+haploid consensus sequences of targets and outputs a SAM with each sequence corresponding
+to a target. When option
+.B -f
+is in use, BAQ will be applied. This command is
+.B only
+designed for cutting fosmid clones from fosmid pool sequencing [Ref. Kitzman et al. (2010)].
+
+.TP \"-------- phase
+.B phase
+samtools phase [-AF] [-k len] [-b prefix] [-q minLOD] [-Q minBaseQ] <in.bam>
+
+Call and phase heterozygous SNPs.
+
+.B OPTIONS:
+.RS
+.TP 8
+.B -A
+Drop reads with ambiguous phase.
+.TP 8
+.BI -b \ STR
+Prefix of BAM output. When this option is in use, phase-0 reads will be saved in file
+.BR STR .0.bam
+and phase-1 reads in
+.BR STR .1.bam.
+Phase unknown reads will be randomly allocated to one of the two files. Chimeric reads
+with switch errors will be saved in
+.BR STR .chimeric.bam.
+[null]
+.TP
+.B -F
+Do not attempt to fix chimeric reads.
+.TP
+.BI -k \ INT
+Maximum length for local phasing. [13]
+.TP
+.BI -q \ INT
+Minimum Phred-scaled LOD to call a heterozygote. [40]
+.TP
+.BI -Q \ INT
+Minimum base quality to be used in het calling. [13]
+.RE
+
+.TP \"-------- depad
+.B depad
+samtools depad [-SsCu1] [-T ref.fa] [-o output] <in.bam>
+
+Converts a BAM aligned against a padded reference to a BAM aligned
+against the depadded reference.  The padded reference may contain
+verbatim "*" bases in it, but "*" bases are also counted in the
+reference numbering.  This means that a sequence base-call aligned
+against a reference "*" is considered to be a cigar match ("M" or "X")
+operator (if the base-call is "A", "C", "G" or "T").  After depadding
+the reference "*" bases are deleted and such aligned sequence
+base-calls become insertions.  Similarly transformations apply for
+deletions and padding cigar operations.
+
+.B OPTIONS:
+.RS
+.TP
+.B -S
+Ignored for compatibility with previous samtools versions.
+Previously this option was required if input was in SAM format, but now the
+correct format is automatically detected by examining the first few characters
+of input.
+.TP
+.B -s
+Output in SAM format.  The default is BAM.
+.TP
+.B -C
+Output in CRAM format.  The default is BAM.
+.TP
+.B -u
+Do not compress the output.  Applies to either BAM or CRAM output
+format.
+.TP
+.B -1
+Enable fastest compression level.  Only works for BAM or CRAM output.
+.TP
+.BI "-T " FILE
+Provides the padded reference file.  Note that without this the @SQ
+line lengths will be incorrect, so for most use cases this option will
+be considered as mandatory.
+.TP
+.BI "-o " FILE
+Specifies the output filename.  By default output is sent to stdout.
+.RE
+
+.TP \"-------- help etc
+.BR help ,\  --help
+Display a brief usage message listing the samtools commands available.
+If the name of a command is also given, e.g.,
+.BR samtools\ help\ view ,
+the detailed usage message for that particular command is displayed.
+
+.TP
+.B --version
+Display the version numbers and copyright information for samtools and
+the important libraries used by samtools.
+
+.TP
+.B --version-only
+Display the full samtools version number in a machine-readable format.
+.PP
+.SH GLOBAL OPTIONS
+.PP
+Several long-options are shared between multiple samtools subcommands:
+\fB--input-fmt\fR, \fB--input-fmt-options\fR, \fB--output-fmt\fR,
+\fB--output-fmt-options\fR, and \fB--reference\fR.
+The input format is typically auto-detected so specifying the format
+is usually unnecessary and the option is included for completeness.
+Note that not all subcommands have all options.  Consult the subcommand
+help for more details.
+.PP
+Format strings recognised are "sam", "bam" and "cram".  They may be
+followed by a comma separated list of options as \fIkey\fR or
+\fIkey\fR=\fIvalue\fR. See below for examples.
+.PP
+The \fBfmt-options\fR arguments accept either a single \fIoption\fR or
+\fIoption\fR=\fIvalue\fR.  Note that some options only work on some
+file formats and only on read or write streams.  If value is
+unspecified for a boolean option, the value is assumed to be 1.  The
+valid options are as follows.
+.RS 0
+.\" General purpose
+.TP 4
+.BI nthreads= INT
+Specifies the number of threads to use during encoding and/or
+decoding.  For BAM this will be encoding only.  In CRAM the threads
+are dynamically shared between encoder and decoder.
+.\" CRAM specific
+.TP
+.BI reference= fasta_file
+Specifies a FASTA reference file for use in CRAM encoding or decoding.
+It usually is not required for decoding except in the situation of the
+MD5 not being obtainable via the REF_PATH or REF_CACHE environment variables.
+.TP
+.BI decode_md= 0|1
+CRAM input only; defaults to 1 (on).  CRAM does not typically store
+MD and NM tags, preferring to generate them on the fly.  This option
+controls this behaviour.
+.TP
+.BI ignore_md5= 0|1
+CRAM input only; defaults to 0 (off).  When enabled, md5 checksum
+errors on the reference sequence and block checksum errors within CRAM
+are ignored.  Use of this option is strongly discouraged.
+.TP
+.BI required_fields= bit-field
+CRAM input only; specifies which SAM columns need to be populated.
+By default all fields are used.  Limiting the decode to specific
+columns can have significant performance gains.  The bit-field is a
+numerical value constructed from the following table.
+.TS
+center;
+rb l .
+0x1	SAM_QNAME
+0x2	SAM_FLAG
+0x4	SAM_RNAME
+0x8	SAM_POS
+0x10	SAM_MAPQ
+0x20	SAM_CIGAR
+0x40	SAM_RNEXT
+0x80	SAM_PNEXT
+0x100	SAM_TLEN
+0x200	SAM_SEQ
+0x400	SAM_QUAL
+0x800	SAM_AUX
+0x1000	SAM_RGAUX
+.TE
+.TP
+.BI name_prefix= string
+CRAM input only; defaults to output filename.  Any sequences with
+auto-generated read names will use \fIstring\fR as the name prefix.
+.TP
+.BI multi_seq_per_slice= 0|1
+CRAM output only; defaults to 0 (off).  By default CRAM generates one
+container per reference sequence, except in the case of many small
+references (such as a fragmented assembly).
+.TP
+.BI version= major.minor
+CRAM output only.  Specifies the CRAM version number.  Acceptable
+values are "2.1" and "3.0".
+.TP
+.BI seqs_per_slice= INT
+CRAM output only; defaults to 10000.
+.TP
+.BI slices_per_container= INT
+CRAM output only; defaults to 1.  The effect of having multiple slices
+per container is to share the compression header block between
+multiple slices.  This is unlikely to have any significant impact
+unless the number of sequences per slice is reduced.  (Together these
+two options control the granularity of random access.)
+.TP
+.BI embed_ref= 0|1
+CRAM output only; defaults to 0 (off).  If 1, this will store portions
+of the reference sequence in each slice, permitting decode without
+having requiring an external copy of the reference sequence.
+.TP
+.BI no_ref= 0|1
+CRAM output only; defaults to 0 (off).  If 1, sequences will be stored
+verbatim with no reference encoding.  This can be useful if no
+reference is available for the file.
+.TP
+.BI use_bzip2= 0|1
+CRAM output only; defaults to 0 (off).  Permits use of bzip2 in CRAM
+block compression.
+.TP
+.BI use_lzma= 0|1
+CRAM output only; defaults to 0 (off).  Permits use of lzma in CRAM
+block compression.
+.TP
+.BI lossy_names= 0|1
+CRAM output only; defaults to 0 (off).  If 1, templates with all
+members within the same CRAM slice will have their read names
+removed.  New names will be automatically generated during decoding.
+Also see the \fBname_prefix\fR option.
+.RE
+.PP
+For example:
+.EX 4
+samtools view --input-fmt-option decode_md=0
+    --output-fmt cram,version=3.0 --output-fmt-option embed_ref
+    --output-fmt-option seqs_per_slice=2000 -o foo.cram foo.bam
+.EE
+.PP
+.SH REFERENCE SEQUENCES
+.PP
+The CRAM format requires use of a reference sequence for both reading
+and writing.
+.PP
+When reading a CRAM the \fB@SQ\fR headers are interrogated to identify
+the reference sequence MD5sum (\fBM5:\fR tag) and the local reference
+sequence filename (\fBUR:\fR tag).  Note that \fIhttp://\fR and
+\fIftp://\fR based URLs in the UR: field are not used, but local fasta
+filenames (with or without \fIfile://\fR) can be used.
+.PP
+To create a CRAM the \fB@SQ\fR headers will also be read to identify
+the reference sequences, but M5: and UR: tags may not be present. In
+this case the \fB-T\fR and \fB-t\fR options of samtools view may be
+used to specify the fasta or fasta.fai filenames respectively
+(provided the .fasta.fai file is also backed up by a .fasta file).
+.PP
+The search order to obtain a reference is:
+.IP 1. 3
+Use any local file specified by the command line options (eg -T).
+.IP 2. 3
+Look for MD5 via REF_CACHE environment variable.
+.IP 3. 3
+Look for MD5 in each element of the REF_PATH environment variable.
+.IP 4. 3
+Look for a local file listed in the UR: header tag.
+.PP
+.SH ENVIRONMENT VARIABLES
+.PP
+.TP
+.B HTS_PATH
+A colon-separated list of directories in which to search for HTSlib plugins.
+If $HTS_PATH starts or ends with a colon or contains a double colon (\fB::\fP),
+the built-in list of directories is searched at that point in the search.
+
+If no HTS_PATH variable is defined, the built-in list of directories
+specified when HTSlib was built is used, which typically includes
+\fB/usr/local/libexec/htslib\fP and similar directories.
+
+.TP
+.B REF_PATH
+A colon separated (semi-colon on Windows) list of locations in which
+to look for sequences identified by their MD5sums.  This can be either
+a list of directories or URLs. Note that if a URL is included then the
+colon in http:// and ftp:// and the optional port number will be
+treated as part of the URL and not a PATH field separator.
+For URLs, the text \fB%s\fR will be replaced by the MD5sum being
+read.
+
+If no REF_PATH has been specified it will default to
+\fBhttp://www.ebi.ac.uk/ena/cram/md5/%s\fR and if REF_CACHE is also unset,
+it will be set to \fB$XDG_CACHE_HOME/hts-ref/%2s/%2s/%s\fR.
+If \fB$XDG_CACHE_HOME\fR is unset, \fB$HOME/.cache\fR (or a local system
+temporary directory if no home directory is found) will be used similarly.
+
+.TP
+.B REF_CACHE
+This can be defined to a single directory housing a local cache of
+references.  Upon downloading a reference it will be stored in the
+location pointed to by REF_CACHE.  When reading a reference it will be
+looked for in this directory before searching REF_PATH.  To avoid many
+files being stored in the same directory, a pathname may be
+constructed using %\fInum\fRs and %s notation, consuming \fInum\fR
+characters of the MD5sum.  For example
+\fB/local/ref_cache/%2s/%2s/%s\fR will create 2 nested subdirectories
+with the filenames in the deepest directory being the last 28
+characters of the md5sum.
+
+The REF_CACHE directory will be searched for before attempting to load
+via the REF_PATH search list.  If no REF_PATH is defined, both
+REF_PATH and REF_CACHE will be automatically set (see above), but if
+REF_PATH is defined and REF_CACHE not then no local cache is used.
+
+To aid population of the REF_CACHE directory a script
+\fBmisc/seq_cache_populate.pl\fR is provided in the Samtools
+distribution. This takes a fasta file or a directory of fasta files
+and generates the MD5sum named files.
+.PP
+.SH EXAMPLES
+.IP o 2
+Import SAM to BAM when
+.B @SQ
+lines are present in the header:
+.EX 2
+samtools view -bS aln.sam > aln.bam
+.EE
+If
+.B @SQ
+lines are absent:
+.EX 2
+samtools faidx ref.fa
+samtools view -bt ref.fa.fai aln.sam > aln.bam
+.EE
+where
+.I ref.fa.fai
+is generated automatically by the
+.B faidx
+command.
+
+.IP o 2
+Convert a BAM file to a CRAM file using a local reference sequence.
+.EX 2
+samtools view -C -T ref.fa aln.bam > aln.cram
+.EE
+.IP o 2
+Attach the
+.B RG
+tag while merging sorted alignments:
+.EX 2
+perl -e 'print "@RG\\tID:ga\\tSM:hs\\tLB:ga\\tPL:Illumina\\n@RG\\tID:454\\tSM:hs\\tLB:454\\tPL:454\\n"' > rg.txt
+samtools merge -rh rg.txt merged.bam ga.bam 454.bam
+.EE
+The value in a
+.B RG
+tag is determined by the file name the read is coming from. In this
+example, in the
+.IR merged.bam ,
+reads from
+.I ga.bam
+will be attached
+.IR RG:Z:ga ,
+while reads from
+.I 454.bam
+will be attached
+.IR RG:Z:454 .
+
+.IP o 2
+Call SNPs and short INDELs:
+.EX 2
+samtools mpileup -uf ref.fa aln.bam | bcftools call -mv > var.raw.vcf
+bcftools filter -s LowQual -e '%QUAL<20 || DP>100' var.raw.vcf  > var.flt.vcf
+.EE
+The
+.B bcftools filter
+command marks low quality sites and sites with the read depth exceeding
+a limit, which should be adjusted to about twice the average read depth
+(bigger read depths usually indicate problematic regions which are
+often enriched for artefacts).  One may consider to add
+.B -C50
+to
+.B mpileup
+if mapping quality is overestimated for reads containing excessive
+mismatches. Applying this option usually helps
+.B BWA-short
+but may not other mappers.
+
+Individuals are identified from the
+.B SM
+tags in the
+.B @RG
+header lines. Individuals can be pooled in one alignment file; one
+individual can also be separated into multiple files. The
+.B -P
+option specifies that indel candidates should be collected only from
+read groups with the
+.B @RG-PL
+tag set to
+.IR ILLUMINA .
+Collecting indel candidates from reads sequenced by an indel-prone
+technology may affect the performance of indel calling.
+
+.IP o 2
+Generate the consensus sequence for one diploid individual:
+.EX 2
+samtools mpileup -uf ref.fa aln.bam | bcftools call -c | vcfutils.pl vcf2fq > cns.fq
+.EE
+.IP o 2
+Phase one individual:
+.EX 2
+samtools calmd -AEur aln.bam ref.fa | samtools phase -b prefix - > phase.out
+.EE
+The
+.B calmd
+command is used to reduce false heterozygotes around INDELs.
+
+
+.IP o 2
+Dump BAQ applied alignment for other SNP callers:
+.EX 2
+samtools calmd -bAr aln.bam > aln.baq.bam
+.EE
+It adds and corrects the
+.B NM
+and
+.B MD
+tags at the same time. The
+.B calmd
+command also comes with the
+.B -C
+option, the same as the one in
+.B pileup
+and
+.BR mpileup .
+Apply if it helps.
+
+.SH LIMITATIONS
+.PP
+.IP o 2
+Unaligned words used in bam_import.c, bam_endian.h, bam.c and bam_aux.c.
+.IP o 2
+Samtools paired-end rmdup does not work for unpaired reads (e.g. orphan
+reads or ends mapped to different chromosomes). If this is a concern,
+please use Picard's MarkDuplicates which correctly handles these cases,
+although a little slower.
+
+.SH AUTHOR
+.PP
+Heng Li from the Sanger Institute wrote the original C version of samtools.
+Bob Handsaker from the Broad Institute implemented the BGZF library.
+James Bonfield from the Sanger Institute developed the CRAM implementation.
+John Marshall and Petr Danecek contribute to the source code and various
+people from the 1000 Genomes Project have contributed to the SAM format
+specification.
+
+.SH SEE ALSO
+.IR bcftools (1),
+.IR sam (5),
+.IR tabix (1)
+.PP
+Samtools website: <http://www.htslib.org/>
+.br
+File format specification of SAM/BAM,CRAM,VCF/BCF: <http://samtools.github.io/hts-specs>
+.br
+Samtools latest source: <https://github.com/samtools/samtools>
+.br
+HTSlib latest source: <https://github.com/samtools/htslib>
+.br
+Bcftools website: <http://samtools.github.io/bcftools>
author	jpayne
date	Tue, 18 Mar 2025 16:23:26 -0400
parents
children