Mercurial > repos > rliterman > csp2
comparison CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/share/man/man1/samtools.1 @ 68:5028fdace37b
planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author | jpayne |
---|---|
date | Tue, 18 Mar 2025 16:23:26 -0400 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
67:0e9998148a16 | 68:5028fdace37b |
---|---|
1 '\" t | |
2 .TH samtools 1 "21 June 2017" "samtools-1.5" "Bioinformatics tools" | |
3 .SH NAME | |
4 samtools \- Utilities for the Sequence Alignment/Map (SAM) format | |
5 .\" | |
6 .\" Copyright (C) 2008-2011, 2013-2017 Genome Research Ltd. | |
7 .\" Portions copyright (C) 2010, 2011 Broad Institute. | |
8 .\" | |
9 .\" Author: Heng Li <lh3@sanger.ac.uk> | |
10 .\" Author: Joshua C. Randall <jcrandall@alum.mit.edu> | |
11 .\" | |
12 .\" Permission is hereby granted, free of charge, to any person obtaining a | |
13 .\" copy of this software and associated documentation files (the "Software"), | |
14 .\" to deal in the Software without restriction, including without limitation | |
15 .\" the rights to use, copy, modify, merge, publish, distribute, sublicense, | |
16 .\" and/or sell copies of the Software, and to permit persons to whom the | |
17 .\" Software is furnished to do so, subject to the following conditions: | |
18 .\" | |
19 .\" The above copyright notice and this permission notice shall be included in | |
20 .\" all copies or substantial portions of the Software. | |
21 .\" | |
22 .\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | |
23 .\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | |
24 .\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL | |
25 .\" THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | |
26 .\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING | |
27 .\" FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER | |
28 .\" DEALINGS IN THE SOFTWARE. | |
29 . | |
30 .\" For code blocks and examples (cf groff's Ultrix-specific man macros) | |
31 .de EX | |
32 | |
33 . in +\\$1 | |
34 . nf | |
35 . ft CR | |
36 .. | |
37 .de EE | |
38 . ft | |
39 . fi | |
40 . in | |
41 | |
42 .. | |
43 . | |
44 .SH SYNOPSIS | |
45 .PP | |
46 samtools view -bt ref_list.txt -o aln.bam aln.sam.gz | |
47 .PP | |
48 samtools sort -T /tmp/aln.sorted -o aln.sorted.bam aln.bam | |
49 .PP | |
50 samtools index aln.sorted.bam | |
51 .PP | |
52 samtools idxstats aln.sorted.bam | |
53 .PP | |
54 samtools flagstat aln.sorted.bam | |
55 .PP | |
56 samtools stats aln.sorted.bam | |
57 .PP | |
58 samtools bedcov aln.sorted.bam | |
59 .PP | |
60 samtools depth aln.sorted.bam | |
61 .PP | |
62 samtools view aln.sorted.bam chr2:20,100,000-20,200,000 | |
63 .PP | |
64 samtools merge out.bam in1.bam in2.bam in3.bam | |
65 .PP | |
66 samtools faidx ref.fasta | |
67 .PP | |
68 samtools tview aln.sorted.bam ref.fasta | |
69 .PP | |
70 samtools split merged.bam | |
71 .PP | |
72 samtools quickcheck in1.bam in2.cram | |
73 .PP | |
74 samtools dict -a GRCh38 -s "Homo sapiens" ref.fasta | |
75 .PP | |
76 samtools fixmate in.namesorted.sam out.bam | |
77 .PP | |
78 samtools mpileup -C50 -gf ref.fasta -r chr3:1,000-2,000 in1.bam in2.bam | |
79 .PP | |
80 samtools flags PAIRED,UNMAP,MUNMAP | |
81 .PP | |
82 samtools fastq input.bam > output.fastq | |
83 .PP | |
84 samtools fasta input.bam > output.fasta | |
85 .PP | |
86 samtools addreplacerg -r 'ID:fish' -r 'LB:1334' -r 'SM:alpha' -o output.bam input.bam | |
87 .PP | |
88 samtools collate aln.sorted.bam aln.name_collated.bam | |
89 .PP | |
90 samtools depad input.bam | |
91 | |
92 .SH DESCRIPTION | |
93 .PP | |
94 Samtools is a set of utilities that manipulate alignments in the BAM | |
95 format. It imports from and exports to the SAM (Sequence Alignment/Map) | |
96 format, does sorting, merging and indexing, and allows to retrieve reads | |
97 in any regions swiftly. | |
98 | |
99 Samtools is designed to work on a stream. It regards an input file `-' | |
100 as the standard input (stdin) and an output file `-' as the standard | |
101 output (stdout). Several commands can thus be combined with Unix | |
102 pipes. Samtools always output warning and error messages to the standard | |
103 error output (stderr). | |
104 | |
105 Samtools is also able to open a BAM (not SAM) file on a remote FTP or | |
106 HTTP server if the BAM file name starts with `ftp://' or `http://'. | |
107 Samtools checks the current working directory for the index file and | |
108 will download the index upon absence. Samtools does not retrieve the | |
109 entire alignment file unless it is asked to do so. | |
110 | |
111 .SH COMMANDS AND OPTIONS | |
112 | |
113 .TP 10 \"-------- view | |
114 .B view | |
115 samtools view | |
116 .RI [ options ] | |
117 .IR in.sam | in.bam | in.cram | |
118 .RI [ region ...] | |
119 | |
120 With no options or regions specified, prints all alignments in the specified | |
121 input alignment file (in SAM, BAM, or CRAM format) to standard output | |
122 in SAM format (with no header). | |
123 | |
124 You may specify one or more space-separated region specifications after the | |
125 input filename to restrict output to only those alignments which overlap the | |
126 specified region(s). Use of region specifications requires a coordinate-sorted | |
127 and indexed input file (in BAM or CRAM format). | |
128 | |
129 The | |
130 .BR -b , | |
131 .BR -C , | |
132 .BR -1 , | |
133 .BR -u , | |
134 .BR -h , | |
135 .BR -H , | |
136 and | |
137 .B -c | |
138 options change the output format from the default of headerless SAM, and the | |
139 .B -o | |
140 and | |
141 .B -U | |
142 options set the output file name(s). | |
143 | |
144 The | |
145 .B -t | |
146 and | |
147 .B -T | |
148 options provide additional reference data. One of these two options is required | |
149 when SAM input does not contain @SQ headers, and the | |
150 .B -T | |
151 option is required whenever writing CRAM output. | |
152 | |
153 The | |
154 .BR -L , | |
155 .BR -r , | |
156 .BR -R , | |
157 .BR -s , | |
158 .BR -q , | |
159 .BR -l , | |
160 .BR -m , | |
161 .BR -f , | |
162 .BR -F , | |
163 and | |
164 .B -G | |
165 options filter the alignments that will be included in the output to only those | |
166 alignments that match certain criteria. | |
167 | |
168 The | |
169 .B -x | |
170 and | |
171 .B -B | |
172 options modify the data which is contained in each alignment. | |
173 | |
174 Finally, the | |
175 .B -@ | |
176 option can be used to allocate additional threads to be used for compression, and the | |
177 .B -? | |
178 option requests a long help message. | |
179 | |
180 .TP | |
181 .B REGIONS: | |
182 .RS | |
183 Regions can be specified as: RNAME[:STARTPOS[-ENDPOS]] and all position | |
184 coordinates are 1-based. | |
185 | |
186 Important note: when multiple regions are given, some alignments may be output | |
187 multiple times if they overlap more than one of the specified regions. | |
188 | |
189 Examples of region specifications: | |
190 .TP 10 | |
191 .B chr1 | |
192 Output all alignments mapped to the reference sequence named `chr1' (i.e. @SQ SN:chr1). | |
193 .TP | |
194 .B chr2:1000000 | |
195 The region on chr2 beginning at base position 1,000,000 and ending at the | |
196 end of the chromosome. | |
197 .TP | |
198 .B chr3:1000-2000 | |
199 The 1001bp region on chr3 beginning at base position 1,000 and ending at base | |
200 position 2,000 (including both end positions). | |
201 .TP | |
202 .B '*' | |
203 Output the unmapped reads at the end of the file. | |
204 (This does not include any unmapped reads placed on a reference sequence | |
205 alongside their mapped mates.) | |
206 .TP | |
207 .B . | |
208 Output all alignments. | |
209 (Mostly unnecessary as not specifying a region at all has the same effect.) | |
210 .RE | |
211 | |
212 .B OPTIONS: | |
213 .RS | |
214 .TP 10 | |
215 .B -b | |
216 Output in the BAM format. | |
217 .TP | |
218 .B -C | |
219 Output in the CRAM format (requires -T). | |
220 .TP | |
221 .B -1 | |
222 Enable fast BAM compression (implies -b). | |
223 .TP | |
224 .B -u | |
225 Output uncompressed BAM. This option saves time spent on | |
226 compression/decompression and is thus preferred when the output is piped | |
227 to another samtools command. | |
228 .TP | |
229 .B -h | |
230 Include the header in the output. | |
231 .TP | |
232 .B -H | |
233 Output the header only. | |
234 .TP | |
235 .B -c | |
236 Instead of printing the alignments, only count them and print the | |
237 total number. All filter options, such as | |
238 .BR -f , | |
239 .BR -F , | |
240 and | |
241 .BR -q , | |
242 are taken into account. | |
243 .TP | |
244 .B -? | |
245 Output long help and exit immediately. | |
246 .TP | |
247 .BI "-o " FILE | |
248 Output to | |
249 .I FILE [stdout]. | |
250 .TP | |
251 .BI "-U " FILE | |
252 Write alignments that are | |
253 .I not | |
254 selected by the various filter options to | |
255 .IR FILE . | |
256 When this option is used, all alignments (or all alignments intersecting the | |
257 .I regions | |
258 specified) are written to either the output file or this file, but never both. | |
259 .TP | |
260 .BI "-t " FILE | |
261 A tab-delimited | |
262 .IR FILE . | |
263 Each line must contain the reference name in the first column and the length of | |
264 the reference in the second column, with one line for each distinct reference. | |
265 Any additional fields beyond the second column are ignored. This file also | |
266 defines the order of the reference sequences in sorting. If you run: | |
267 `samtools faidx <ref.fa>', the resulting index file | |
268 .I <ref.fa>.fai | |
269 can be used as this | |
270 .IR FILE . | |
271 .TP | |
272 .BI "-T " FILE | |
273 A FASTA format reference | |
274 .IR FILE , | |
275 optionally compressed by | |
276 .B bgzip | |
277 and ideally indexed by | |
278 .B samtools | |
279 .BR faidx . | |
280 If an index is not present, one will be generated for you. | |
281 .TP | |
282 .BI "-L " FILE | |
283 Only output alignments overlapping the input BED | |
284 .I FILE | |
285 [null]. | |
286 .TP | |
287 .BI "-r " STR | |
288 Only output alignments in read group | |
289 .I STR | |
290 [null]. | |
291 .TP | |
292 .BI "-R " FILE | |
293 Output alignments in read groups listed in | |
294 .I FILE | |
295 [null]. | |
296 .TP | |
297 .BI "-q " INT | |
298 Skip alignments with MAPQ smaller than | |
299 .I INT | |
300 [0]. | |
301 .TP | |
302 .BI "-l " STR | |
303 Only output alignments in library | |
304 .I STR | |
305 [null]. | |
306 .TP | |
307 .BI "-m " INT | |
308 Only output alignments with number of CIGAR bases consuming query | |
309 sequence \(>= | |
310 .I INT | |
311 [0] | |
312 .TP | |
313 .BI "-f " INT | |
314 Only output alignments with all bits set in | |
315 .I INT | |
316 present in the FLAG field. | |
317 .I INT | |
318 can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/) | |
319 or in octal by beginning with `0' (i.e. /^0[0-7]+/) [0]. | |
320 .TP | |
321 .BI "-F " INT | |
322 Do not output alignments with any bits set in | |
323 .I INT | |
324 present in the FLAG field. | |
325 .I INT | |
326 can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/) | |
327 or in octal by beginning with `0' (i.e. /^0[0-7]+/) [0]. | |
328 .TP | |
329 .BI "-G " INT | |
330 Do not output alignments with all bits set in | |
331 .I INT | |
332 present in the FLAG field. This is the opposite of \fI-f\fR such | |
333 that \fI-f12 -G12\fR is the same as no filtering at all. | |
334 .I INT | |
335 can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/) | |
336 or in octal by beginning with `0' (i.e. /^0[0-7]+/) [0]. | |
337 .TP | |
338 .BI "-x " STR | |
339 Read tag to exclude from output (repeatable) [null] | |
340 .TP | |
341 .B -B | |
342 Collapse the backward CIGAR operation. | |
343 .TP | |
344 .BI "-s " FLOAT | |
345 Output only a proportion of the input alignments. | |
346 This subsampling acts in the same way on all of the alignment records in | |
347 the same template or read pair, so it never keeps a read but not its mate. | |
348 .IP | |
349 The integer and fractional parts of the | |
350 .BI "-s " INT . FRAC | |
351 option are used separately: the part after the | |
352 decimal point sets the fraction of templates/pairs to be kept, | |
353 while the integer part is used as a seed that influences | |
354 .I which | |
355 subset of reads is kept. | |
356 .IP | |
357 .\" Reads are retained based on a score computed by hashing their QNAME | |
358 .\" field and the seed value. | |
359 When subsampling data that has previously been subsampled, be sure to use | |
360 a different seed value from those used previously; otherwise more reads | |
361 will be retained than expected. | |
362 .TP | |
363 .BI "-@ " INT | |
364 Number of BAM compression threads to use in addition to main thread [0]. | |
365 .TP | |
366 .B -S | |
367 Ignored for compatibility with previous samtools versions. | |
368 Previously this option was required if input was in SAM format, but now the | |
369 correct format is automatically detected by examining the first few characters | |
370 of input. | |
371 .RE | |
372 | |
373 .TP \"-------- sort | |
374 .B sort | |
375 .na | |
376 samtools sort | |
377 .RB [ -l | |
378 .IR level ] | |
379 .RB [ -m | |
380 .IR maxMem ] | |
381 .RB [ -o | |
382 .IR out.bam ] | |
383 .RB [ -O | |
384 .IR format ] | |
385 .RB [ -n ] | |
386 .RB [ -t | |
387 .IR tag ] | |
388 .RB [ -T | |
389 .IR tmpprefix ] | |
390 .RB [ -@ | |
391 .IR threads "] [" in.sam | in.bam | in.cram ] | |
392 .ad | |
393 | |
394 Sort alignments by leftmost coordinates, or by read name when | |
395 .B -n | |
396 is used. | |
397 An appropriate | |
398 .B @HD-SO | |
399 sort order header tag will be added or an existing one updated if necessary. | |
400 | |
401 The sorted output is written to standard output by default, or to the | |
402 specified file | |
403 .RI ( out.bam ) | |
404 when | |
405 .B -o | |
406 is used. | |
407 This command will also create temporary files | |
408 .IB tmpprefix . %d .bam | |
409 as needed when the entire alignment data cannot fit into memory | |
410 (as controlled via the | |
411 .B -m | |
412 option). | |
413 | |
414 .B Options: | |
415 .RS | |
416 .TP 11 | |
417 .BI "-l " INT | |
418 Set the desired compression level for the final output file, ranging from 0 | |
419 (uncompressed) or 1 (fastest but minimal compression) to 9 (best compression | |
420 but slowest to write), similarly to | |
421 .BR gzip (1)'s | |
422 compression level setting. | |
423 .IP | |
424 If | |
425 .B -l | |
426 is not used, the default compression level will apply. | |
427 .TP | |
428 .BI "-m " INT | |
429 Approximately the maximum required memory per thread, specified either in bytes | |
430 or with a | |
431 .BR K ", " M ", or " G | |
432 suffix. | |
433 [768 MiB] | |
434 .IP | |
435 To prevent sort from creating a huge number of temporary files, it enforces a | |
436 minimum value of 1M for this setting. | |
437 .TP | |
438 .B -n | |
439 Sort by read names (i.e., the | |
440 .B QNAME | |
441 field) rather than by chromosomal coordinates. | |
442 .TP | |
443 .BI "-t " TAG | |
444 Sort first by the value in the alignment tag TAG, then by position or name (if | |
445 also using \fB-n\fP). | |
446 .BI "-o " FILE | |
447 Write the final sorted output to | |
448 .IR FILE , | |
449 rather than to standard output. | |
450 .TP | |
451 .BI "-O " FORMAT | |
452 Write the final output as | |
453 .BR sam ", " bam ", or " cram . | |
454 | |
455 By default, samtools tries to select a format based on the | |
456 .B -o | |
457 filename extension; if output is to standard output or no format can be | |
458 deduced, | |
459 .B bam | |
460 is selected. | |
461 .TP | |
462 .BI "-T " PREFIX | |
463 Write temporary files to | |
464 .IB PREFIX . nnnn .bam, | |
465 or if the specified | |
466 .I PREFIX | |
467 is an existing directory, to | |
468 .IB PREFIX /samtools. mmm . mmm .tmp. nnnn .bam, | |
469 where | |
470 .I mmm | |
471 is unique to this invocation of the | |
472 .B sort | |
473 command. | |
474 .IP | |
475 By default, any temporary files are written alongside the output file, as | |
476 .IB out.bam .tmp. nnnn .bam, | |
477 or if output is to standard output, in the current directory as | |
478 .BI samtools. mmm . mmm .tmp. nnnn .bam. | |
479 .TP | |
480 .BI "-@ " INT | |
481 Set number of sorting and compression threads. | |
482 By default, operation is single-threaded. | |
483 .PP | |
484 .B Ordering Rules | |
485 | |
486 The following rules are used for ordering records. | |
487 | |
488 If option \fB-t\fP is in use, records are first sorted by the value of | |
489 the given alignment tag, and then by position or name (if using \fB-n\fP). | |
490 For example, \*(lq-t RG\*(rq will make read group the primary sort key. The | |
491 rules for ordering by tag are: | |
492 | |
493 .IP \(bu 4 | |
494 Records that do not have the tag are sorted before ones that do. | |
495 .IP \(bu 4 | |
496 If the types of the tags are different, they will be sorted so | |
497 that single character tags (type A) come before array tags (type B), then | |
498 string tags (types H and Z), then numeric tags (types f and i). | |
499 .IP \(bu 4 | |
500 Numeric tags (types f and i) are compared by value. Note that comparisons | |
501 of floating-point values are subject to issues of rounding and precision. | |
502 .IP \(bu 4 | |
503 String tags (types H and Z) are compared based on the binary | |
504 contents of the tag using the C | |
505 .BR strcmp (3) | |
506 function. | |
507 .IP \(bu 4 | |
508 Character tags (type A) are compared by binary character value. | |
509 .IP \(bu 4 | |
510 No attempt is made to compare tags of other types \(em notably type B | |
511 array values will not be compared. | |
512 .PP | |
513 When the \fB-n\fP option is present, records are sorted by name. Names are | |
514 compared so as to give a \*(lqnatural\*(rq ordering \(em i.e. sections | |
515 consisting of digits are compared numerically while all other sections are | |
516 compared based on their binary representation. This means \*(lqa1\*(rq will | |
517 come before \*(lqb1\*(rq and \*(lqa9\*(rq will come before \*(lqa10\*(rq. | |
518 Records with the same name will be ordered according to the values of | |
519 the READ1 and READ2 flags (see | |
520 .BR flags ). | |
521 | |
522 When the \fB-n\fP option is | |
523 .B not | |
524 present, reads are sorted by reference (according to the order of the @SQ | |
525 header records), then by position in the reference, and then by the REVERSE | |
526 flag. | |
527 | |
528 .B Note | |
529 | |
530 .PP | |
531 Historically | |
532 .B samtools sort | |
533 also accepted a less flexible way of specifying the final and | |
534 temporary output filenames: | |
535 .IP | |
536 samtools sort | |
537 .RB [ -f "] [" -o ] | |
538 .I in.bam out.prefix | |
539 .PP | |
540 This has now been removed. | |
541 The previous \fIout.prefix\fP argument (and \fB-f\fP option, if any) | |
542 should be changed to an appropriate combination of \fB-T\fP \fIPREFIX\fP | |
543 and \fB-o\fP \fIFILE\fP. The previous \fB-o\fP option should be removed, | |
544 as output defaults to standard output. | |
545 .RE | |
546 | |
547 .TP \"-------- index | |
548 .B index | |
549 samtools index | |
550 .RB [ -bc ] | |
551 .RB [ -m | |
552 .IR INT ] | |
553 .IR aln.bam | aln.cram | |
554 .RI [ out.index ] | |
555 | |
556 Index a coordinate-sorted BAM or CRAM file for fast random access. | |
557 (Note that this does not work with SAM files even if they are bgzip | |
558 compressed \(em to index such files, use tabix(1) instead.) | |
559 | |
560 This index is needed when | |
561 .I region | |
562 arguments are used to limit | |
563 .B samtools view | |
564 and similar commands to particular regions of interest. | |
565 | |
566 If an output filename is given, the index file will be written to | |
567 .IR out.index . | |
568 Otherwise, for a CRAM file | |
569 .IR aln.cram , | |
570 index file | |
571 .IB aln.cram .crai | |
572 will be created; for a BAM file | |
573 .IR aln.bam , | |
574 either | |
575 .IB aln.bam .bai | |
576 or | |
577 .IB aln.bam .csi | |
578 will be created, depending on the index format selected. | |
579 | |
580 .B Options: | |
581 .RS | |
582 .TP 8 | |
583 .B -b | |
584 Create a BAI index. | |
585 This is currently the default when no format options are used. | |
586 .TP | |
587 .B -c | |
588 Create a CSI index. | |
589 By default, the minimum interval size for the index is 2^14, which is the same | |
590 as the fixed value used by the BAI format. | |
591 .TP | |
592 .BI "-m " INT | |
593 Create a CSI index, with a minimum interval size of 2^INT. | |
594 .RE | |
595 | |
596 .TP \"-------- idxstats | |
597 .B idxstats | |
598 samtools idxstats | |
599 .IR in.sam | in.bam | in.cram | |
600 | |
601 Retrieve and print stats in the index file corresponding to the input file. | |
602 Before calling idxstats, the input BAM file must be indexed by samtools index. | |
603 | |
604 The output is TAB-delimited with each line consisting of reference sequence | |
605 name, sequence length, # mapped reads and # unmapped reads. It is written to | |
606 stdout. | |
607 | |
608 .TP \"-------- flagstat | |
609 .B flagstat | |
610 samtools flagstat | |
611 .IR in.sam | in.bam | in.cram | |
612 | |
613 Does a full pass through the input file to calculate and print statistics | |
614 to stdout. | |
615 | |
616 Provides counts for each of 13 categories based primarily on bit flags in | |
617 the FLAG field. Each category in the output is broken down into QC pass and | |
618 QC fail, which is presented as "#PASS + #FAIL" followed by a description of | |
619 the category. | |
620 | |
621 The first row of output gives the total number of reads that are QC pass and | |
622 fail (according to flag bit 0x200). For example: | |
623 | |
624 122 + 28 in total (QC-passed reads + QC-failed reads) | |
625 | |
626 Which would indicate that there are a total of 150 reads in the input file, | |
627 122 of which are marked as QC pass and 28 of which are marked as "not passing | |
628 quality controls" | |
629 | |
630 Following this, additional categories are given for reads which are: | |
631 | |
632 .RS 18 | |
633 .TP | |
634 secondary | |
635 0x100 bit set | |
636 .TP | |
637 supplementary | |
638 0x800 bit set | |
639 .TP | |
640 duplicates | |
641 0x400 bit set | |
642 .TP | |
643 mapped | |
644 0x4 bit not set | |
645 .TP | |
646 paired in sequencing | |
647 0x1 bit set | |
648 .TP | |
649 read1 | |
650 both 0x1 and 0x40 bits set | |
651 .TP | |
652 read2 | |
653 both 0x1 and 0x80 bits set | |
654 .TP | |
655 properly paired | |
656 both 0x1 and 0x2 bits set and 0x4 bit not set | |
657 .TP | |
658 with itself and mate mapped | |
659 0x1 bit set and neither 0x4 nor 0x8 bits set | |
660 .TP | |
661 singletons | |
662 both 0x1 and 0x8 bits set and bit 0x4 not set | |
663 .RE | |
664 | |
665 .RS 10 | |
666 And finally, two rows are given that additionally filter on the reference | |
667 name (RNAME), mate reference name (MRNM), and mapping quality (MAPQ) fields: | |
668 .RE | |
669 | |
670 .RS 18 | |
671 .TP | |
672 with mate mapped to a different chr | |
673 0x1 bit set and neither 0x4 nor 0x8 bits set and MRNM not equal to RNAME | |
674 .TP | |
675 with mate mapped to a different chr (mapQ>=5) | |
676 0x1 bit set and neither 0x4 nor 0x8 bits set | |
677 and MRNM not equal to RNAME and MAPQ >= 5 | |
678 .RE | |
679 | |
680 .TP \"-------- stats | |
681 .B stats | |
682 samtools stats | |
683 .RI [ options ] | |
684 .IR in.sam | in.bam | in.cram | |
685 .RI [ region ...] | |
686 | |
687 samtools stats collects statistics from BAM files and outputs in a text format. | |
688 The output can be visualized graphically using plot-bamstats. | |
689 | |
690 .B Options: | |
691 .RS | |
692 .TP 8 | |
693 .BI "-c, --coverage " MIN , MAX , STEP | |
694 Set coverage distribution to the specified range (MIN, MAX, STEP all given as integers) | |
695 [1,1000,1] | |
696 .TP | |
697 .B -d, --remove-dups | |
698 Exclude from statistics reads marked as duplicates | |
699 .TP | |
700 .BI "-f, --required-flag " STR "|" INT | |
701 Required flag, 0 for unset. See also `samtools flags` | |
702 [0] | |
703 .TP | |
704 .BI "-F, --filtering-flag " STR "|" INT | |
705 Filtering flag, 0 for unset. See also `samtools flags` | |
706 [0] | |
707 .TP | |
708 .BI "--GC-depth " FLOAT | |
709 the size of GC-depth bins (decreasing bin size increases memory requirement) | |
710 [2e4] | |
711 .TP | |
712 .B -h, --help | |
713 This help message | |
714 .TP | |
715 .BI "-i, --insert-size " INT | |
716 Maximum insert size | |
717 [8000] | |
718 .TP | |
719 .BI "-I, --id " STR | |
720 Include only listed read group or sample name | |
721 [] | |
722 .TP | |
723 .BI "-l, --read-length " INT | |
724 Include in the statistics only reads with the given read length | |
725 [] | |
726 .TP | |
727 .BI "-m, --most-inserts " FLOAT | |
728 Report only the main part of inserts | |
729 [0.99] | |
730 .TP | |
731 .BI "-P, --split-prefix " STR | |
732 A path or string prefix to prepend to filenames output when creating | |
733 categorised statistics files with | |
734 .BR -S / --split . | |
735 [input filename] | |
736 .TP | |
737 .BI "-q, --trim-quality " INT | |
738 The BWA trimming parameter | |
739 [0] | |
740 .TP | |
741 .BI "-r, --ref-seq " FILE | |
742 Reference sequence (required for GC-depth and mismatches-per-cycle calculation). | |
743 [] | |
744 .TP | |
745 .BI "-S, --split " TAG | |
746 In addition to the complete statistics, also output categorised statistics | |
747 based on the tagged field | |
748 .I TAG | |
749 (e.g., use | |
750 .B --split RG | |
751 to split into read groups). | |
752 | |
753 Categorised statistics are written to files named | |
754 .RI < prefix >_< value >.bamstat, | |
755 where | |
756 .I prefix | |
757 is as given by | |
758 .B --split-prefix | |
759 (or the input filename by default) and | |
760 .I value | |
761 has been encountered as the specified tagged field's value in one or more | |
762 alignment records. | |
763 .TP | |
764 .BI "-t, --target-regions " FILE | |
765 Do stats in these regions only. Tab-delimited file chr,from,to, 1-based, inclusive. | |
766 [] | |
767 .TP | |
768 .B "-x, --sparse" | |
769 Suppress outputting IS rows where there are no insertions. | |
770 .RE | |
771 | |
772 .TP \"-------- bedcov | |
773 .B bedcov | |
774 samtools bedcov | |
775 .RI [ options ] | |
776 .IR region.bed " " in1.sam | in1.bam | in1.cram "[...]" | |
777 | |
778 Reports the total read base count (i.e. the sum of per base read depths) | |
779 for each genomic region specified in the supplied BED file. | |
780 Counts for each alignment file supplied are reported in separate columns. | |
781 | |
782 .B Options: | |
783 .RS | |
784 .TP | |
785 .BI "-Q " INT | |
786 .RI "Only count reads with mapping quality greater than " INT | |
787 .RE | |
788 | |
789 .TP \"-------- depth | |
790 .B depth | |
791 samtools depth | |
792 .RI [ options ] | |
793 .RI "[" in1.sam | in1.bam | in1.cram " [" in2.sam | in2.bam | in2.cram "] [...]]" | |
794 | |
795 Computes the depth at each position or region. | |
796 | |
797 .B Options: | |
798 .RS | |
799 .TP 8 | |
800 .B -a | |
801 Output all positions (including those with zero depth) | |
802 .TP | |
803 .B -a -a, -aa | |
804 Output absolutely all positions, including unused reference sequences. | |
805 Note that when used in conjunction with a BED file the -a option may | |
806 sometimes operate as if -aa was specified if the reference sequence | |
807 has coverage outside of the region specified in the BED file. | |
808 .TP | |
809 .BI "-b " FILE | |
810 .RI "Compute depth at list of positions or regions in specified BED " FILE. | |
811 [] | |
812 .TP | |
813 .BI "-f " FILE | |
814 .RI "Use the BAM files specified in the " FILE | |
815 (a file of filenames, one file per line) | |
816 [] | |
817 .TP | |
818 .BI "-l " INT | |
819 .RI "Ignore reads shorter than " INT | |
820 .TP | |
821 .BI "-m, -d " INT | |
822 .RI "Truncate reported depth at a maximum of " INT " reads." | |
823 [8000] | |
824 .TP | |
825 .BI "-q " INT | |
826 .RI "Only count reads with base quality greater than " INT | |
827 .TP | |
828 .BI "-Q " INT | |
829 .RI "Only count reads with mapping quality greater than " INT | |
830 .TP | |
831 .BI "-r " CHR ":" FROM "-" TO | |
832 Only report depth in specified region. | |
833 .RE | |
834 | |
835 .TP \"-------- merge | |
836 .B merge | |
837 samtools merge [-nur1f] [-h inh.sam] [-R reg] [-b <list>] <out.bam> <in1.bam> [<in2.bam> <in3.bam> ... <inN.bam>] | |
838 | |
839 Merge multiple sorted alignment files, producing a single sorted output file | |
840 that contains all the input records and maintains the existing sort order. | |
841 | |
842 If | |
843 .BR -h | |
844 is specified the @SQ headers of input files will be merged into the specified header, otherwise they will be merged | |
845 into a composite header created from the input headers. If in the process of merging @SQ lines for coordinate sorted | |
846 input files, a conflict arises as to the order (for example input1.bam has @SQ for a,b,c and input2.bam has b,a,c) | |
847 then the resulting output file will need to be re-sorted back into coordinate order. | |
848 | |
849 Unless the | |
850 .BR -c | |
851 or | |
852 .BR -p | |
853 flags are specified then when merging @RG and @PG records into the output header then any IDs found to be duplicates | |
854 of existing IDs in the output header will have a suffix appended to them to differentiate them from similar header | |
855 records from other files and the read records will be updated to reflect this. | |
856 | |
857 The ordering of the records in the input files must match the usage of the | |
858 \fB-n\fP and \fB-t\fP command-line options. If they do not, the output | |
859 order will be undefined. See | |
860 .B sort | |
861 for information about record ordering. | |
862 | |
863 .B OPTIONS: | |
864 .RS | |
865 .TP 8 | |
866 .B -1 | |
867 Use zlib compression level 1 to compress the output. | |
868 .TP | |
869 .BI -b \ FILE | |
870 List of input BAM files, one file per line. | |
871 .TP | |
872 .B -f | |
873 Force to overwrite the output file if present. | |
874 .TP 8 | |
875 .BI -h \ FILE | |
876 Use the lines of | |
877 .I FILE | |
878 as `@' headers to be copied to | |
879 .IR out.bam , | |
880 replacing any header lines that would otherwise be copied from | |
881 .IR in1.bam . | |
882 .RI ( FILE | |
883 is actually in SAM format, though any alignment records it may contain | |
884 are ignored.) | |
885 .TP | |
886 .B -n | |
887 The input alignments are sorted by read names rather than by chromosomal | |
888 coordinates | |
889 .TP | |
890 .B -t TAG | |
891 The input alignments have been sorted by the value of TAG, then by either | |
892 position or name (if \fB-n\fP is given). | |
893 .TP | |
894 .BI -R \ STR | |
895 Merge files in the specified region indicated by | |
896 .I STR | |
897 [null] | |
898 .TP | |
899 .B -r | |
900 Attach an RG tag to each alignment. The tag value is inferred from file names. | |
901 .TP | |
902 .B -u | |
903 Uncompressed BAM output | |
904 .TP | |
905 .B -c | |
906 When several input files contain @RG headers with the same ID, emit only one | |
907 of them (namely, the header line from the first file we find that ID in) to | |
908 the merged output file. | |
909 Combining these similar headers is usually the right thing to do when the | |
910 files being merged originated from the same file. | |
911 | |
912 Without \fB-c\fP, all @RG headers appear in the output file, with random | |
913 suffixes added to their IDs where necessary to differentiate them. | |
914 .TP | |
915 .B -p | |
916 Similarly, for each @PG ID in the set of files to merge, use the @PG line | |
917 of the first file we find that ID in rather than adding a suffix to | |
918 differentiate similar IDs. | |
919 .RE | |
920 | |
921 .TP \"-------- faidx | |
922 .B faidx | |
923 samtools faidx <ref.fasta> [region1 [...]] | |
924 | |
925 Index reference sequence in the FASTA format or extract subsequence from | |
926 indexed reference sequence. If no region is specified, | |
927 .B faidx | |
928 will index the file and create | |
929 .I <ref.fasta>.fai | |
930 on the disk. If regions are specified, the subsequences will be | |
931 retrieved and printed to stdout in the FASTA format. | |
932 | |
933 The input file can be compressed in the | |
934 .B BGZF | |
935 format. | |
936 | |
937 The sequences in the input file should all have different names. | |
938 If they do not, indexing will emit a warning about duplicate sequences and | |
939 retrieval will only produce subsequences from the first sequence with the | |
940 duplicated name. | |
941 | |
942 .TP \"-------- tview | |
943 .B tview | |
944 samtools tview | |
945 .RB [ -p | |
946 .IR chr:pos ] | |
947 .RB [ -s | |
948 .IR STR ] | |
949 .RB [ -d | |
950 .IR display ] | |
951 .RI <in.sorted.bam> | |
952 .RI [ref.fasta] | |
953 | |
954 Text alignment viewer (based on the ncurses library). In the viewer, | |
955 press `?' for help and press `g' to check the alignment start from a | |
956 region in the format like `chr10:10,000,000' or `=10,000,000' when | |
957 viewing the same reference sequence. | |
958 | |
959 .B Options: | |
960 .RS | |
961 .TP 14 | |
962 .BI -d \ display | |
963 Output as (H)tml or (C)urses or (T)ext | |
964 .TP | |
965 .BI -p \ chr:pos | |
966 Go directly to this position | |
967 .TP | |
968 .BI -s \ STR | |
969 Display only alignments from this sample or read group | |
970 .RE | |
971 | |
972 .TP \"-------- split | |
973 .B split | |
974 samtools split | |
975 .RI [ options ] | |
976 .IR merged.sam | merged.bam | merged.cram | |
977 | |
978 Splits a file by read group. | |
979 | |
980 .B Options: | |
981 .RS | |
982 .TP 14 | |
983 .BI "-u " FILE1 | |
984 .RI "Put reads with no RG tag or an unrecognised RG tag into " FILE1 | |
985 .TP | |
986 .BI "-u " FILE1 ":" FILE2 | |
987 .RI "As above, but assigns an RG tag as given in the header of " FILE2 | |
988 .TP | |
989 .BI "-f " STRING | |
990 Output filename format string (see below) | |
991 ["%*_%#.%."] | |
992 .TP | |
993 .B -v | |
994 Verbose output | |
995 .PP | |
996 Format string expansions: | |
997 .TS | |
998 center; | |
999 lb l . | |
1000 %% % | |
1001 %* basename | |
1002 %# @RG index | |
1003 %! @RG ID | |
1004 %. output format filename extension | |
1005 .TE | |
1006 .RE | |
1007 | |
1008 .TP \"-------- quickcheck | |
1009 .B quickcheck | |
1010 samtools quickcheck | |
1011 .RI [ options ] | |
1012 .IR in.sam | in.bam | in.cram | |
1013 [ ... ] | |
1014 | |
1015 Quickly check that input files appear to be intact. Checks that beginning of the | |
1016 file contains a valid header (all formats) containing at least one target | |
1017 sequence and then seeks to the end of the file and checks that an end-of-file | |
1018 (EOF) is present and intact (BAM only). | |
1019 | |
1020 Data in the middle of the file is not read since that would be much more time | |
1021 consuming, so please note that this command will not detect internal corruption, | |
1022 but is useful for testing that files are not truncated before performing more | |
1023 intensive tasks on them. | |
1024 | |
1025 This command will exit with a non-zero exit code if any input files don't have a | |
1026 valid header or are missing an EOF block. Otherwise it will exit successfully | |
1027 (with a zero exit code). | |
1028 | |
1029 .B Options: | |
1030 .RS | |
1031 .TP 8 | |
1032 .B -v | |
1033 Verbose output: will additionally print the names of all input files that don't | |
1034 pass the check to stdout. Multiple -v options will cause additional messages | |
1035 regarding check results to be printed to stderr. | |
1036 .RE | |
1037 | |
1038 .TP \"-------- dict | |
1039 .B dict | |
1040 samtools dict <ref.fasta|ref.fasta.gz> | |
1041 | |
1042 Create a sequence dictionary file from a fasta file. | |
1043 | |
1044 .B OPTIONS: | |
1045 .RS | |
1046 .TP 11 | |
1047 .BI -a,\ --assembly \ STR | |
1048 Specify the assembly for the AS tag. | |
1049 .TP | |
1050 .B -H,\ --no-header | |
1051 Do not print the @HD header line. | |
1052 .TP | |
1053 .BI -o,\ --output \ FILE | |
1054 Output to | |
1055 .I FILE | |
1056 [stdout]. | |
1057 .TP | |
1058 .BI -s,\ --species \ STR | |
1059 Specify the species for the SP tag. | |
1060 .TP | |
1061 .BI -u,\ --uri \ STR | |
1062 Specify the URI for the UR tag. Defaults to | |
1063 the absolute path of | |
1064 .I ref.fasta | |
1065 unless reading from stdin. | |
1066 .RE | |
1067 | |
1068 .TP \"-------- fixmate | |
1069 .B fixmate | |
1070 .na | |
1071 samtools fixmate | |
1072 .RB [ -rpc ] | |
1073 .RB [ -O | |
1074 .IR format ] | |
1075 .I in.nameSrt.bam out.bam | |
1076 .ad | |
1077 | |
1078 Fill in mate coordinates, ISIZE and mate related flags from a | |
1079 name-sorted alignment. | |
1080 | |
1081 .B OPTIONS: | |
1082 .RS | |
1083 .TP 11 | |
1084 .B -r | |
1085 Remove secondary and unmapped reads. | |
1086 .TP | |
1087 .B -p | |
1088 Disable FR proper pair check. | |
1089 .TP | |
1090 .B -c | |
1091 Add template cigar ct tag. | |
1092 .TP | |
1093 .BI "-O " FORMAT | |
1094 Write the final output as | |
1095 .BR sam ", " bam ", or " cram . | |
1096 | |
1097 By default, samtools tries to select a format based on the output | |
1098 filename extension; if output is to standard output or no format can be | |
1099 deduced, | |
1100 .B bam | |
1101 is selected. | |
1102 .RE | |
1103 | |
1104 .TP \"-------- mpileup | |
1105 .B mpileup | |
1106 samtools mpileup | |
1107 .RB [ -EBugp ] | |
1108 .RB [ -C | |
1109 .IR capQcoef ] | |
1110 .RB [ -r | |
1111 .IR reg ] | |
1112 .RB [ -f | |
1113 .IR in.fa ] | |
1114 .RB [ -l | |
1115 .IR list ] | |
1116 .RB [ -Q | |
1117 .IR minBaseQ ] | |
1118 .RB [ -q | |
1119 .IR minMapQ ] | |
1120 .I in.bam | |
1121 .RI [ in2.bam | |
1122 .RI [ ... ]] | |
1123 | |
1124 Generate VCF, BCF or pileup for one or multiple BAM files. Alignment records | |
1125 are grouped by sample (SM) identifiers in @RG header lines. If sample | |
1126 identifiers are absent, each input file is regarded as one sample. | |
1127 | |
1128 In the pileup format (without | |
1129 .BR -u \ or \ -g ), | |
1130 each | |
1131 line represents a genomic position, consisting of chromosome name, | |
1132 1-based coordinate, reference base, the number of reads covering the site, | |
1133 read bases, base qualities and alignment | |
1134 mapping qualities. Information on match, mismatch, indel, strand, | |
1135 mapping quality and start and end of a read are all encoded at the read | |
1136 base column. At this column, a dot stands for a match to the reference | |
1137 base on the forward strand, a comma for a match on the reverse strand, | |
1138 a '>' or '<' for a reference skip, `ACGTN' for a mismatch on the forward | |
1139 strand and `acgtn' for a mismatch on the reverse strand. A pattern | |
1140 `\\+[0-9]+[ACGTNacgtn]+' indicates there is an insertion between this | |
1141 reference position and the next reference position. The length of the | |
1142 insertion is given by the integer in the pattern, followed by the | |
1143 inserted sequence. Similarly, a pattern `-[0-9]+[ACGTNacgtn]+' | |
1144 represents a deletion from the reference. The deleted bases will be | |
1145 presented as `*' in the following lines. Also at the read base column, a | |
1146 symbol `^' marks the start of a read. The ASCII of the character | |
1147 following `^' minus 33 gives the mapping quality. A symbol `$' marks the | |
1148 end of a read segment. | |
1149 | |
1150 Note that there are two orthogonal ways to specify locations in the | |
1151 input file; via \fB-r\fR \fIregion\fR and \fB-l\fR \fIfile\fR. The | |
1152 former uses (and requires) an index to do random access while the | |
1153 latter streams through the file contents filtering out the specified | |
1154 regions, requiring no index. The two may be used in conjunction. For | |
1155 example a BED file containing locations of genes in chromosome 20 | |
1156 could be specified using \fB-r 20 -l chr20.bed\fR, meaning that the | |
1157 index is used to find chromosome 20 and then it is filtered for the | |
1158 regions listed in the bed file. | |
1159 | |
1160 .B Input Options: | |
1161 .RS | |
1162 .TP 10 | |
1163 .B -6, --illumina1.3+ | |
1164 Assume the quality is in the Illumina 1.3+ encoding. | |
1165 .TP | |
1166 .B -A, --count-orphans | |
1167 Do not skip anomalous read pairs in variant calling. | |
1168 .TP | |
1169 .BI -b,\ --bam-list \ FILE | |
1170 List of input BAM files, one file per line [null] | |
1171 .TP | |
1172 .B -B, --no-BAQ | |
1173 Disable probabilistic realignment for the computation of base alignment | |
1174 quality (BAQ). BAQ is the Phred-scaled probability of a read base being | |
1175 misaligned. Applying this option greatly helps to reduce false SNPs | |
1176 caused by misalignments. | |
1177 .TP | |
1178 .BI -C,\ --adjust-MQ \ INT | |
1179 Coefficient for downgrading mapping quality for reads containing | |
1180 excessive mismatches. Given a read with a phred-scaled probability q of | |
1181 being generated from the mapped position, the new mapping quality is | |
1182 about sqrt((INT-q)/INT)*INT. A zero value disables this | |
1183 functionality; if enabled, the recommended value for BWA is 50. [0] | |
1184 .TP | |
1185 .BI -d,\ --max-depth \ INT | |
1186 At a position, read maximally | |
1187 .I INT | |
1188 reads per input file. Note that samtools has a minimum value of | |
1189 .I 8000/n | |
1190 where | |
1191 .I n | |
1192 is the number of input files given to mpileup. This means the default | |
1193 is highly likely to be increased. Once above the cross-sample minimum of | |
1194 8000 the -d parameter will have an effect. [250] | |
1195 .TP | |
1196 .B -E, --redo-BAQ | |
1197 Recalculate BAQ on the fly, ignore existing BQ tags | |
1198 .TP | |
1199 .BI -f,\ --fasta-ref \ FILE | |
1200 The | |
1201 .BR faidx -indexed | |
1202 reference file in the FASTA format. The file can be optionally compressed by | |
1203 .BR bgzip . | |
1204 [null] | |
1205 .TP | |
1206 .BI -G,\ --exclude-RG \ FILE | |
1207 Exclude reads from readgroups listed in FILE (one @RG-ID per line) | |
1208 .TP | |
1209 .BI -l,\ --positions \ FILE | |
1210 BED or position list file containing a list of regions or sites where | |
1211 pileup or BCF should be generated. Position list files contain two | |
1212 columns (chromosome and position) and start counting from 1. BED | |
1213 files contain at least 3 columns (chromosome, start and end position) | |
1214 and are 0-based half-open. | |
1215 .br | |
1216 While it is possible to mix both position-list and BED coordinates in | |
1217 the same file, this is strongly ill advised due to the differing | |
1218 coordinate systems. [null] | |
1219 .TP | |
1220 .BI -q,\ -min-MQ \ INT | |
1221 Minimum mapping quality for an alignment to be used [0] | |
1222 .TP | |
1223 .BI -Q,\ --min-BQ \ INT | |
1224 Minimum base quality for a base to be considered [13] | |
1225 .TP | |
1226 .BI -r,\ --region \ STR | |
1227 Only generate pileup in region. Requires the BAM files to be indexed. | |
1228 If used in conjunction with -l then considers the intersection of the | |
1229 two requests. | |
1230 .I STR | |
1231 [all sites] | |
1232 .TP | |
1233 .B -R,\ --ignore-RG | |
1234 Ignore RG tags. Treat all reads in one BAM as one sample. | |
1235 .TP | |
1236 .BI --rf,\ --incl-flags \ STR|INT | |
1237 Required flags: skip reads with mask bits unset [null] | |
1238 .TP | |
1239 .BI --ff,\ --excl-flags \ STR|INT | |
1240 Filter flags: skip reads with mask bits set | |
1241 [UNMAP,SECONDARY,QCFAIL,DUP] | |
1242 .TP | |
1243 .B -x,\ --ignore-overlaps | |
1244 Disable read-pair overlap detection. | |
1245 .PP | |
1246 .B Output Options: | |
1247 .TP 10 | |
1248 .BI "-o, --output " FILE | |
1249 Write pileup or VCF/BCF output to | |
1250 .IR FILE , | |
1251 rather than the default of standard output. | |
1252 | |
1253 (The same short option is used for both | |
1254 .B --open-prob | |
1255 and | |
1256 .BR --output . | |
1257 If | |
1258 .BR -o 's | |
1259 argument contains any non-digit characters other than a leading + or - sign, | |
1260 it is interpreted as | |
1261 .BR --output . | |
1262 Usually the filename extension will take care of this, but to write to an | |
1263 entirely numeric filename use | |
1264 .B -o ./123 | |
1265 or | |
1266 .BR "--output 123" .) | |
1267 .TP | |
1268 .B -g,\ --BCF | |
1269 Compute genotype likelihoods and output them in the binary call format (BCF). | |
1270 As of v1.0, this is BCF2 which is incompatible with the BCF1 format produced | |
1271 by previous (0.1.x) versions of samtools. | |
1272 .TP | |
1273 .B -v,\ --VCF | |
1274 Compute genotype likelihoods and output them in the variant call format (VCF). | |
1275 Output is bgzip-compressed VCF unless | |
1276 .B -u | |
1277 option is set. | |
1278 .PP | |
1279 .B Output Options for mpileup format (without -g or -v): | |
1280 .TP 10 | |
1281 .B -O, --output-BP | |
1282 Output base positions on reads. | |
1283 .TP | |
1284 .B -s, --output-MQ | |
1285 Output mapping quality. | |
1286 .TP | |
1287 .B -a | |
1288 Output all positions, including those with zero depth. | |
1289 .TP | |
1290 .B -a -a, -aa | |
1291 Output absolutely all positions, including unused reference sequences. | |
1292 Note that when used in conjunction with a BED file the -a option may | |
1293 sometimes operate as if -aa was specified if the reference sequence | |
1294 has coverage outside of the region specified in the BED file. | |
1295 .PP | |
1296 .B Output Options for VCF/BCF format (with -g or -v): | |
1297 .TP 10 | |
1298 .B -D | |
1299 Output per-sample read depth [DEPRECATED - use | |
1300 .B -t DP | |
1301 instead] | |
1302 .TP | |
1303 .B -S | |
1304 Output per-sample Phred-scaled strand bias P-value [DEPRECATED - use | |
1305 .B -t SP | |
1306 instead] | |
1307 .TP | |
1308 .BI -t,\ --output-tags \ LIST | |
1309 Comma-separated list of FORMAT and INFO tags to output (case-insensitive): | |
1310 .B AD | |
1311 (Allelic depth, FORMAT), | |
1312 .B INFO/AD | |
1313 (Total allelic depth, INFO), | |
1314 .B ADF | |
1315 (Allelic depths on the forward strand, FORMAT), | |
1316 .B INFO/ADF | |
1317 (Total allelic depths on the forward strand, INFO), | |
1318 .B ADR | |
1319 (Allelic depths on the reverse strand, FORMAT), | |
1320 .B INFO/ADR | |
1321 (Total allelic depths on the reverse strand, INFO), | |
1322 .B DP | |
1323 (Number of high-quality bases, FORMAT), | |
1324 .B DV | |
1325 (Deprecated in favor of AD; Number of high-quality non-reference bases, FORMAT), | |
1326 .B DPR | |
1327 (Deprecated in favor of AD; Number of high-quality bases for each observed allele, FORMAT), | |
1328 .B INFO/DPR | |
1329 (Number of high-quality bases for each observed allele, INFO), | |
1330 .B DP4 | |
1331 (Deprecated in favor of ADF and ADR; Number of high-quality ref-forward, ref-reverse, alt-forward and alt-reverse bases, FORMAT), | |
1332 .B SP | |
1333 (Phred-scaled strand bias P-value, FORMAT) | |
1334 [null] | |
1335 .TP | |
1336 .B -u,\ --uncompressed | |
1337 Generate uncompressed VCF/BCF output, which is preferred for piping. | |
1338 .TP | |
1339 .B -V | |
1340 Output per-sample number of non-reference reads [DEPRECATED - use | |
1341 .B -t DV | |
1342 instead] | |
1343 .PP | |
1344 .B Options for SNP/INDEL Genotype Likelihood Computation (for -g or -v): | |
1345 .TP 10 | |
1346 .BI -e,\ --ext-prob \ INT | |
1347 Phred-scaled gap extension sequencing error probability. Reducing | |
1348 .I INT | |
1349 leads to longer indels. [20] | |
1350 .TP | |
1351 .BI -F,\ --gap-frac \ FLOAT | |
1352 Minimum fraction of gapped reads [0.002] | |
1353 .TP | |
1354 .BI -h,\ --tandem-qual \ INT | |
1355 Coefficient for modeling homopolymer errors. Given an | |
1356 .IR l -long | |
1357 homopolymer | |
1358 run, the sequencing error of an indel of size | |
1359 .I s | |
1360 is modeled as | |
1361 .IR INT * s / l . | |
1362 [100] | |
1363 .TP | |
1364 .B -I, --skip-indels | |
1365 Do not perform INDEL calling | |
1366 .TP | |
1367 .BI -L,\ --max-idepth \ INT | |
1368 Skip INDEL calling if the average per-input-file depth is above | |
1369 .IR INT . | |
1370 [250] | |
1371 .TP | |
1372 .BI -m,\ --min-ireads \ INT | |
1373 Minimum number gapped reads for indel candidates | |
1374 .IR INT . | |
1375 [1] | |
1376 .TP | |
1377 .BI -o,\ --open-prob \ INT | |
1378 Phred-scaled gap open sequencing error probability. Reducing | |
1379 .I INT | |
1380 leads to more indel calls. [40] | |
1381 | |
1382 (The same short option is used for both | |
1383 .B --open-prob | |
1384 and | |
1385 .BR --output . | |
1386 When | |
1387 .BR -o 's | |
1388 argument contains only an optional + or - sign followed by the digits 0 to 9, | |
1389 it is interpreted as | |
1390 .BR --open-prob .) | |
1391 .TP | |
1392 .B -p, --per-sample-mF | |
1393 Apply | |
1394 .B -m | |
1395 and | |
1396 .B -F | |
1397 thresholds per sample to increase sensitivity of calling. | |
1398 By default both options are applied to reads pooled from all samples. | |
1399 .TP | |
1400 .BI -P,\ --platforms \ STR | |
1401 Comma-delimited list of platforms (determined by | |
1402 .BR @RG-PL ) | |
1403 from which indel candidates are obtained. It is recommended to collect | |
1404 indel candidates from sequencing technologies that have low indel error | |
1405 rate such as ILLUMINA. [all] | |
1406 .RE | |
1407 | |
1408 .TP \"-------- flags | |
1409 .B flags | |
1410 samtools flags INT|STR[,...] | |
1411 | |
1412 Convert between textual and numeric flag representation. | |
1413 | |
1414 .B FLAGS: | |
1415 .TS | |
1416 rb l l . | |
1417 0x1 PAIRED paired-end (or multiple-segment) sequencing technology | |
1418 0x2 PROPER_PAIR each segment properly aligned according to the aligner | |
1419 0x4 UNMAP segment unmapped | |
1420 0x8 MUNMAP next segment in the template unmapped | |
1421 0x10 REVERSE SEQ is reverse complemented | |
1422 0x20 MREVERSE SEQ of the next segment in the template is reverse complemented | |
1423 0x40 READ1 the first segment in the template | |
1424 0x80 READ2 the last segment in the template | |
1425 0x100 SECONDARY secondary alignment | |
1426 0x200 QCFAIL not passing quality controls | |
1427 0x400 DUP PCR or optical duplicate | |
1428 0x800 SUPPLEMENTARY supplementary alignment | |
1429 .TE | |
1430 | |
1431 .TP \"-------- fastq fasta | |
1432 .B fastq/a | |
1433 samtools fastq | |
1434 .RI [ options ] | |
1435 .I in.bam | |
1436 .br | |
1437 samtools fasta | |
1438 .RI [ options ] | |
1439 .I in.bam | |
1440 | |
1441 Converts a BAM or CRAM into either FASTQ or FASTA format depending on the | |
1442 command invoked. The FASTQ files will be automatically compressed if the | |
1443 filenames have a .gz or .bgzf extention. | |
1444 | |
1445 .B OPTIONS: | |
1446 .RS | |
1447 .TP 8 | |
1448 .B -n | |
1449 By default, either '/1' or '/2' is added to the end of read names | |
1450 where the corresponding BAM_READ1 or BAM_READ2 flag is set. | |
1451 Using | |
1452 .B -n | |
1453 causes read names to be left as they are. | |
1454 .TP 8 | |
1455 .B -N | |
1456 Always add either '/1' or '/2' to the end of read names | |
1457 even when put into different files. | |
1458 .TP 8 | |
1459 .B -O | |
1460 Use quality values from OQ tags in preference to standard quality string | |
1461 if available. | |
1462 .TP 8 | |
1463 .B -s FILE | |
1464 Write singleton reads in FASTQ format to FILE instead of outputting them. | |
1465 .TP 8 | |
1466 .B -t | |
1467 Copy RG, BC and QT tags to the FASTQ header line, if they exist. | |
1468 .TP 8 | |
1469 .B -T TAGLIST | |
1470 Specify a comma-separated list of tags to copy to the FASTQ header line, if they exist. | |
1471 .TP 8 | |
1472 .B -1 FILE | |
1473 Write reads with the BAM_READ1 flag set to FILE instead of outputting them. | |
1474 .TP 8 | |
1475 .B -2 FILE | |
1476 Write reads with the BAM_READ2 flag set to FILE instead of outputting them. | |
1477 .TP 8 | |
1478 .B -0 FILE | |
1479 Write reads with both or neither of the BAM_READ1 and BAM_READ2 flags set | |
1480 to FILE instead of outputting them. | |
1481 .TP 8 | |
1482 .BI "-f " INT | |
1483 Only output alignments with all bits set in | |
1484 .I INT | |
1485 present in the FLAG field. | |
1486 .I INT | |
1487 can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/) | |
1488 or in octal by beginning with `0' (i.e. /^0[0-7]+/) [0]. | |
1489 .TP 8 | |
1490 .BI "-F " INT | |
1491 Do not output alignments with any bits set in | |
1492 .I INT | |
1493 present in the FLAG field. | |
1494 .I INT | |
1495 can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/) | |
1496 or in octal by beginning with `0' (i.e. /^0[0-7]+/) [0]. | |
1497 .TP 8 | |
1498 .BI "-G " INT | |
1499 Only EXCLUDE reads with all of the bits set in | |
1500 .I INT | |
1501 present in the FLAG field. | |
1502 .I INT | |
1503 can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/) | |
1504 or in octal by beginning with `0' (i.e. /^0[0-7]+/) [0]. | |
1505 .TP 8 | |
1506 .B -i | |
1507 add Illumina Casava 1.8 format entry to header (eg 1:N:0:ATCACG) | |
1508 .TP 8 | |
1509 .B -c [0..9] | |
1510 set compression level when writing gz or bgzf fastq files. | |
1511 .TP 8 | |
1512 .B --i1 FILE | |
1513 write first index reads to FILE | |
1514 .TP 8 | |
1515 .B --i2 FILE | |
1516 write second index reads to FILE | |
1517 .TP 8 | |
1518 .B --barcode-tag TAG | |
1519 aux tag to find index reads in [default: BC] | |
1520 .TP 8 | |
1521 .B --quality-tag TAG | |
1522 aux tag to find index quality in [default: QT] | |
1523 .TP 8 | |
1524 .B --index-format STR | |
1525 string to describe how to parse the barcode and quality tags. For example: | |
1526 | |
1527 .RS | |
1528 .TP 8 | |
1529 .B i14i8 | |
1530 the first 14 characters are index 1, the next 8 characters are index 2 | |
1531 .TP 8 | |
1532 .B n8i14 | |
1533 ignore the first 8 characters, and use the next 14 characters for index 1 | |
1534 | |
1535 If the tag contains a separator, then the numeric part can be replaced with '*' to | |
1536 mean 'read until the separator or end of tag', for example: | |
1537 .TP 8 | |
1538 .B n*i* | |
1539 ignore the left part of the tag until the separator, then use the second part | |
1540 .RE | |
1541 .RE | |
1542 | |
1543 .TP \"-------- collate | |
1544 .B collate | |
1545 samtools collate | |
1546 .RI [ options ] | |
1547 .IR in.sam | in.bam | in.cram " [" out.prefix "]" | |
1548 | |
1549 Shuffles and groups reads together by their names. | |
1550 A faster alternative to a full query name sort, | |
1551 .B collate | |
1552 ensures that reads of the same name are grouped together in contiguous groups, | |
1553 but doesn't make any guarantees about the order of read names between groups. | |
1554 | |
1555 The output from this command should be suitable for any operation that | |
1556 requires all reads from the same template to be grouped together. | |
1557 | |
1558 .B Options: | |
1559 .RS | |
1560 .TP 8 | |
1561 .B -O | |
1562 Output to stdout rather than to files starting with out.prefix | |
1563 .TP | |
1564 .B -u | |
1565 Write uncompressed BAM output | |
1566 .TP | |
1567 .BI "-l " INT | |
1568 Compression level. | |
1569 [1] | |
1570 .TP | |
1571 .BI "-n " INT | |
1572 Number of temporary files to use. | |
1573 [64] | |
1574 .RE | |
1575 | |
1576 .TP \"-------- reheader | |
1577 .B reheader | |
1578 samtools reheader | |
1579 .RB [ -iP ] | |
1580 .I in.header.sam in.bam | |
1581 | |
1582 Replace the header in | |
1583 .I in.bam | |
1584 with the header in | |
1585 .IR in.header.sam . | |
1586 This command is much faster than replacing the header with a | |
1587 BAM\(->SAM\(->BAM conversion. | |
1588 | |
1589 By default this command outputs the BAM or CRAM file to standard | |
1590 output (stdout), but for CRAM format files it has the option to | |
1591 perform an in-place edit, both reading and writing to the same file. | |
1592 No validity checking is performed on the header, nor that it is suitable | |
1593 to use with the sequence data itself. | |
1594 | |
1595 .B OPTIONS: | |
1596 .RS | |
1597 .TP 8 | |
1598 .B -P, --no-PG | |
1599 Do not generate an @PG header line. | |
1600 .TP 8 | |
1601 .B -i, --in-place | |
1602 Perform the header edit in-place, if possible. This only works on CRAM | |
1603 files and only if there is sufficient room to store the new header. | |
1604 The amount of space available will differ for each CRAM file. | |
1605 .RE | |
1606 | |
1607 .TP \"-------- cat | |
1608 .B cat | |
1609 samtools cat [-b list] [-h header.sam] [-o out.bam] <in1.bam> <in2.bam> [ ... ] | |
1610 | |
1611 Concatenate BAMs or CRAMs. Although this works on either BAM or CRAM, | |
1612 all input files must be the same format as each other. The sequence | |
1613 dictionary of each input file must be identical, although this command | |
1614 does not check this. This command uses a similar trick to | |
1615 .B reheader | |
1616 which enables fast BAM concatenation. | |
1617 | |
1618 .B OPTIONS: | |
1619 .RS | |
1620 .TP 8 | |
1621 .BI "-b " FOFN | |
1622 Read the list of input BAM or CRAM files from \fIFOFN\fR. These are | |
1623 concatenated prior to any files specified on the command line. | |
1624 Multiple \fB-b\fR \fIFOFN\fR options may be specified to concatenate | |
1625 multiple lists of BAM/CRAM files. | |
1626 .TP 8 | |
1627 .BI "-h " FILE | |
1628 Uses the SAM header from \fIFILE\fR. By default the header is taken | |
1629 from the first file to be concatenated. | |
1630 .TP 8 | |
1631 .BI "-o " FILE | |
1632 Write the concatenated output to \fIFILE\fR. By default this is sent | |
1633 to stdout. | |
1634 .RE | |
1635 | |
1636 .TP \"-------- rmdup | |
1637 .B rmdup | |
1638 samtools rmdup [-sS] <input.srt.bam> <out.bam> | |
1639 | |
1640 Remove potential PCR duplicates: if multiple read pairs have identical | |
1641 external coordinates, only retain the pair with highest mapping quality. | |
1642 In the paired-end mode, this command | |
1643 .B ONLY | |
1644 works with FR orientation and requires ISIZE is correctly set. It does | |
1645 not work for unpaired reads (e.g. two ends mapped to different | |
1646 chromosomes or orphan reads). | |
1647 | |
1648 .B OPTIONS: | |
1649 .RS | |
1650 .TP 8 | |
1651 .B -s | |
1652 Remove duplicates for single-end reads. By default, the command works for | |
1653 paired-end reads only. | |
1654 .TP 8 | |
1655 .B -S | |
1656 Treat paired-end reads and single-end reads. | |
1657 .RE | |
1658 | |
1659 .TP \"-------- addreplacerg | |
1660 .B addreplacerg | |
1661 samtools addreplacerg [-r rg line | -R rg ID] [-m mode] [-l level] [-o out.bam] | |
1662 <input.bam> | |
1663 | |
1664 Adds or replaces read group tags in a file. | |
1665 | |
1666 .B OPTIONS: | |
1667 .RS | |
1668 .TP 8 | |
1669 .BI "-r " STRING | |
1670 Allows you to specify a read group line to append to the header and applies it | |
1671 to the reads specified by the -m option. If repeated it automatically adds in | |
1672 tabs between invocations. | |
1673 .TP 8 | |
1674 .BI "-R " STRING | |
1675 Allows you to specify the read group ID of an existing @RG line and applies it | |
1676 to the reads specified. | |
1677 .TP 8 | |
1678 .BI "-m " MODE | |
1679 If you choose orphan_only then existing RG tags are not overwritten, if you choose | |
1680 overwrite_all, existing RG tags are overwritten. The default is overwrite_all. | |
1681 .TP 8 | |
1682 .BI "-o " STRING | |
1683 Write the final output to STRING. The default is to write to stdout. | |
1684 | |
1685 By default, samtools tries to select a format based on the output | |
1686 filename extension; if output is to standard output or no format can be | |
1687 deduced, | |
1688 .B bam | |
1689 is selected. | |
1690 .RE | |
1691 | |
1692 .TP \"-------- calmd | |
1693 .B calmd | |
1694 samtools calmd [-Eeubr] [-C capQcoef] <aln.bam> <ref.fasta> | |
1695 | |
1696 Generate the MD tag. If the MD tag is already present, this command will | |
1697 give a warning if the MD tag generated is different from the existing | |
1698 tag. Output SAM by default. | |
1699 | |
1700 Calmd can also read and write CRAM files although in most cases it is | |
1701 pointless as CRAM recalculates MD and NM tags on the fly. The one | |
1702 exception to this case is where both input and output CRAM files | |
1703 have been / are being created with the \fIno_ref\fR option. | |
1704 | |
1705 .B OPTIONS: | |
1706 .RS | |
1707 .TP 8 | |
1708 .B -A | |
1709 When used jointly with | |
1710 .B -r | |
1711 this option overwrites the original base quality. | |
1712 .TP 8 | |
1713 .B -e | |
1714 Convert a the read base to = if it is identical to the aligned reference | |
1715 base. Indel caller does not support the = bases at the moment. | |
1716 .TP | |
1717 .B -u | |
1718 Output uncompressed BAM | |
1719 .TP | |
1720 .B -b | |
1721 Output compressed BAM | |
1722 .TP | |
1723 .BI -C \ INT | |
1724 Coefficient to cap mapping quality of poorly mapped reads. See the | |
1725 .B pileup | |
1726 command for details. [0] | |
1727 .TP | |
1728 .B -r | |
1729 Compute the BQ tag (without -A) or cap base quality by BAQ (with -A). | |
1730 .TP | |
1731 .B -E | |
1732 Extended BAQ calculation. This option trades specificity for sensitivity, though the | |
1733 effect is minor. | |
1734 .RE | |
1735 | |
1736 .TP \"-------- targetcut | |
1737 .B targetcut | |
1738 samtools targetcut [-Q minBaseQ] [-i inPenalty] [-0 em0] [-1 em1] [-2 em2] [-f ref] <in.bam> | |
1739 | |
1740 This command identifies target regions by examining the continuity of read depth, computes | |
1741 haploid consensus sequences of targets and outputs a SAM with each sequence corresponding | |
1742 to a target. When option | |
1743 .B -f | |
1744 is in use, BAQ will be applied. This command is | |
1745 .B only | |
1746 designed for cutting fosmid clones from fosmid pool sequencing [Ref. Kitzman et al. (2010)]. | |
1747 | |
1748 .TP \"-------- phase | |
1749 .B phase | |
1750 samtools phase [-AF] [-k len] [-b prefix] [-q minLOD] [-Q minBaseQ] <in.bam> | |
1751 | |
1752 Call and phase heterozygous SNPs. | |
1753 | |
1754 .B OPTIONS: | |
1755 .RS | |
1756 .TP 8 | |
1757 .B -A | |
1758 Drop reads with ambiguous phase. | |
1759 .TP 8 | |
1760 .BI -b \ STR | |
1761 Prefix of BAM output. When this option is in use, phase-0 reads will be saved in file | |
1762 .BR STR .0.bam | |
1763 and phase-1 reads in | |
1764 .BR STR .1.bam. | |
1765 Phase unknown reads will be randomly allocated to one of the two files. Chimeric reads | |
1766 with switch errors will be saved in | |
1767 .BR STR .chimeric.bam. | |
1768 [null] | |
1769 .TP | |
1770 .B -F | |
1771 Do not attempt to fix chimeric reads. | |
1772 .TP | |
1773 .BI -k \ INT | |
1774 Maximum length for local phasing. [13] | |
1775 .TP | |
1776 .BI -q \ INT | |
1777 Minimum Phred-scaled LOD to call a heterozygote. [40] | |
1778 .TP | |
1779 .BI -Q \ INT | |
1780 Minimum base quality to be used in het calling. [13] | |
1781 .RE | |
1782 | |
1783 .TP \"-------- depad | |
1784 .B depad | |
1785 samtools depad [-SsCu1] [-T ref.fa] [-o output] <in.bam> | |
1786 | |
1787 Converts a BAM aligned against a padded reference to a BAM aligned | |
1788 against the depadded reference. The padded reference may contain | |
1789 verbatim "*" bases in it, but "*" bases are also counted in the | |
1790 reference numbering. This means that a sequence base-call aligned | |
1791 against a reference "*" is considered to be a cigar match ("M" or "X") | |
1792 operator (if the base-call is "A", "C", "G" or "T"). After depadding | |
1793 the reference "*" bases are deleted and such aligned sequence | |
1794 base-calls become insertions. Similarly transformations apply for | |
1795 deletions and padding cigar operations. | |
1796 | |
1797 .B OPTIONS: | |
1798 .RS | |
1799 .TP | |
1800 .B -S | |
1801 Ignored for compatibility with previous samtools versions. | |
1802 Previously this option was required if input was in SAM format, but now the | |
1803 correct format is automatically detected by examining the first few characters | |
1804 of input. | |
1805 .TP | |
1806 .B -s | |
1807 Output in SAM format. The default is BAM. | |
1808 .TP | |
1809 .B -C | |
1810 Output in CRAM format. The default is BAM. | |
1811 .TP | |
1812 .B -u | |
1813 Do not compress the output. Applies to either BAM or CRAM output | |
1814 format. | |
1815 .TP | |
1816 .B -1 | |
1817 Enable fastest compression level. Only works for BAM or CRAM output. | |
1818 .TP | |
1819 .BI "-T " FILE | |
1820 Provides the padded reference file. Note that without this the @SQ | |
1821 line lengths will be incorrect, so for most use cases this option will | |
1822 be considered as mandatory. | |
1823 .TP | |
1824 .BI "-o " FILE | |
1825 Specifies the output filename. By default output is sent to stdout. | |
1826 .RE | |
1827 | |
1828 .TP \"-------- help etc | |
1829 .BR help ,\ --help | |
1830 Display a brief usage message listing the samtools commands available. | |
1831 If the name of a command is also given, e.g., | |
1832 .BR samtools\ help\ view , | |
1833 the detailed usage message for that particular command is displayed. | |
1834 | |
1835 .TP | |
1836 .B --version | |
1837 Display the version numbers and copyright information for samtools and | |
1838 the important libraries used by samtools. | |
1839 | |
1840 .TP | |
1841 .B --version-only | |
1842 Display the full samtools version number in a machine-readable format. | |
1843 .PP | |
1844 .SH GLOBAL OPTIONS | |
1845 .PP | |
1846 Several long-options are shared between multiple samtools subcommands: | |
1847 \fB--input-fmt\fR, \fB--input-fmt-options\fR, \fB--output-fmt\fR, | |
1848 \fB--output-fmt-options\fR, and \fB--reference\fR. | |
1849 The input format is typically auto-detected so specifying the format | |
1850 is usually unnecessary and the option is included for completeness. | |
1851 Note that not all subcommands have all options. Consult the subcommand | |
1852 help for more details. | |
1853 .PP | |
1854 Format strings recognised are "sam", "bam" and "cram". They may be | |
1855 followed by a comma separated list of options as \fIkey\fR or | |
1856 \fIkey\fR=\fIvalue\fR. See below for examples. | |
1857 .PP | |
1858 The \fBfmt-options\fR arguments accept either a single \fIoption\fR or | |
1859 \fIoption\fR=\fIvalue\fR. Note that some options only work on some | |
1860 file formats and only on read or write streams. If value is | |
1861 unspecified for a boolean option, the value is assumed to be 1. The | |
1862 valid options are as follows. | |
1863 .RS 0 | |
1864 .\" General purpose | |
1865 .TP 4 | |
1866 .BI nthreads= INT | |
1867 Specifies the number of threads to use during encoding and/or | |
1868 decoding. For BAM this will be encoding only. In CRAM the threads | |
1869 are dynamically shared between encoder and decoder. | |
1870 .\" CRAM specific | |
1871 .TP | |
1872 .BI reference= fasta_file | |
1873 Specifies a FASTA reference file for use in CRAM encoding or decoding. | |
1874 It usually is not required for decoding except in the situation of the | |
1875 MD5 not being obtainable via the REF_PATH or REF_CACHE environment variables. | |
1876 .TP | |
1877 .BI decode_md= 0|1 | |
1878 CRAM input only; defaults to 1 (on). CRAM does not typically store | |
1879 MD and NM tags, preferring to generate them on the fly. This option | |
1880 controls this behaviour. | |
1881 .TP | |
1882 .BI ignore_md5= 0|1 | |
1883 CRAM input only; defaults to 0 (off). When enabled, md5 checksum | |
1884 errors on the reference sequence and block checksum errors within CRAM | |
1885 are ignored. Use of this option is strongly discouraged. | |
1886 .TP | |
1887 .BI required_fields= bit-field | |
1888 CRAM input only; specifies which SAM columns need to be populated. | |
1889 By default all fields are used. Limiting the decode to specific | |
1890 columns can have significant performance gains. The bit-field is a | |
1891 numerical value constructed from the following table. | |
1892 .TS | |
1893 center; | |
1894 rb l . | |
1895 0x1 SAM_QNAME | |
1896 0x2 SAM_FLAG | |
1897 0x4 SAM_RNAME | |
1898 0x8 SAM_POS | |
1899 0x10 SAM_MAPQ | |
1900 0x20 SAM_CIGAR | |
1901 0x40 SAM_RNEXT | |
1902 0x80 SAM_PNEXT | |
1903 0x100 SAM_TLEN | |
1904 0x200 SAM_SEQ | |
1905 0x400 SAM_QUAL | |
1906 0x800 SAM_AUX | |
1907 0x1000 SAM_RGAUX | |
1908 .TE | |
1909 .TP | |
1910 .BI name_prefix= string | |
1911 CRAM input only; defaults to output filename. Any sequences with | |
1912 auto-generated read names will use \fIstring\fR as the name prefix. | |
1913 .TP | |
1914 .BI multi_seq_per_slice= 0|1 | |
1915 CRAM output only; defaults to 0 (off). By default CRAM generates one | |
1916 container per reference sequence, except in the case of many small | |
1917 references (such as a fragmented assembly). | |
1918 .TP | |
1919 .BI version= major.minor | |
1920 CRAM output only. Specifies the CRAM version number. Acceptable | |
1921 values are "2.1" and "3.0". | |
1922 .TP | |
1923 .BI seqs_per_slice= INT | |
1924 CRAM output only; defaults to 10000. | |
1925 .TP | |
1926 .BI slices_per_container= INT | |
1927 CRAM output only; defaults to 1. The effect of having multiple slices | |
1928 per container is to share the compression header block between | |
1929 multiple slices. This is unlikely to have any significant impact | |
1930 unless the number of sequences per slice is reduced. (Together these | |
1931 two options control the granularity of random access.) | |
1932 .TP | |
1933 .BI embed_ref= 0|1 | |
1934 CRAM output only; defaults to 0 (off). If 1, this will store portions | |
1935 of the reference sequence in each slice, permitting decode without | |
1936 having requiring an external copy of the reference sequence. | |
1937 .TP | |
1938 .BI no_ref= 0|1 | |
1939 CRAM output only; defaults to 0 (off). If 1, sequences will be stored | |
1940 verbatim with no reference encoding. This can be useful if no | |
1941 reference is available for the file. | |
1942 .TP | |
1943 .BI use_bzip2= 0|1 | |
1944 CRAM output only; defaults to 0 (off). Permits use of bzip2 in CRAM | |
1945 block compression. | |
1946 .TP | |
1947 .BI use_lzma= 0|1 | |
1948 CRAM output only; defaults to 0 (off). Permits use of lzma in CRAM | |
1949 block compression. | |
1950 .TP | |
1951 .BI lossy_names= 0|1 | |
1952 CRAM output only; defaults to 0 (off). If 1, templates with all | |
1953 members within the same CRAM slice will have their read names | |
1954 removed. New names will be automatically generated during decoding. | |
1955 Also see the \fBname_prefix\fR option. | |
1956 .RE | |
1957 .PP | |
1958 For example: | |
1959 .EX 4 | |
1960 samtools view --input-fmt-option decode_md=0 | |
1961 --output-fmt cram,version=3.0 --output-fmt-option embed_ref | |
1962 --output-fmt-option seqs_per_slice=2000 -o foo.cram foo.bam | |
1963 .EE | |
1964 .PP | |
1965 .SH REFERENCE SEQUENCES | |
1966 .PP | |
1967 The CRAM format requires use of a reference sequence for both reading | |
1968 and writing. | |
1969 .PP | |
1970 When reading a CRAM the \fB@SQ\fR headers are interrogated to identify | |
1971 the reference sequence MD5sum (\fBM5:\fR tag) and the local reference | |
1972 sequence filename (\fBUR:\fR tag). Note that \fIhttp://\fR and | |
1973 \fIftp://\fR based URLs in the UR: field are not used, but local fasta | |
1974 filenames (with or without \fIfile://\fR) can be used. | |
1975 .PP | |
1976 To create a CRAM the \fB@SQ\fR headers will also be read to identify | |
1977 the reference sequences, but M5: and UR: tags may not be present. In | |
1978 this case the \fB-T\fR and \fB-t\fR options of samtools view may be | |
1979 used to specify the fasta or fasta.fai filenames respectively | |
1980 (provided the .fasta.fai file is also backed up by a .fasta file). | |
1981 .PP | |
1982 The search order to obtain a reference is: | |
1983 .IP 1. 3 | |
1984 Use any local file specified by the command line options (eg -T). | |
1985 .IP 2. 3 | |
1986 Look for MD5 via REF_CACHE environment variable. | |
1987 .IP 3. 3 | |
1988 Look for MD5 in each element of the REF_PATH environment variable. | |
1989 .IP 4. 3 | |
1990 Look for a local file listed in the UR: header tag. | |
1991 .PP | |
1992 .SH ENVIRONMENT VARIABLES | |
1993 .PP | |
1994 .TP | |
1995 .B HTS_PATH | |
1996 A colon-separated list of directories in which to search for HTSlib plugins. | |
1997 If $HTS_PATH starts or ends with a colon or contains a double colon (\fB::\fP), | |
1998 the built-in list of directories is searched at that point in the search. | |
1999 | |
2000 If no HTS_PATH variable is defined, the built-in list of directories | |
2001 specified when HTSlib was built is used, which typically includes | |
2002 \fB/usr/local/libexec/htslib\fP and similar directories. | |
2003 | |
2004 .TP | |
2005 .B REF_PATH | |
2006 A colon separated (semi-colon on Windows) list of locations in which | |
2007 to look for sequences identified by their MD5sums. This can be either | |
2008 a list of directories or URLs. Note that if a URL is included then the | |
2009 colon in http:// and ftp:// and the optional port number will be | |
2010 treated as part of the URL and not a PATH field separator. | |
2011 For URLs, the text \fB%s\fR will be replaced by the MD5sum being | |
2012 read. | |
2013 | |
2014 If no REF_PATH has been specified it will default to | |
2015 \fBhttp://www.ebi.ac.uk/ena/cram/md5/%s\fR and if REF_CACHE is also unset, | |
2016 it will be set to \fB$XDG_CACHE_HOME/hts-ref/%2s/%2s/%s\fR. | |
2017 If \fB$XDG_CACHE_HOME\fR is unset, \fB$HOME/.cache\fR (or a local system | |
2018 temporary directory if no home directory is found) will be used similarly. | |
2019 | |
2020 .TP | |
2021 .B REF_CACHE | |
2022 This can be defined to a single directory housing a local cache of | |
2023 references. Upon downloading a reference it will be stored in the | |
2024 location pointed to by REF_CACHE. When reading a reference it will be | |
2025 looked for in this directory before searching REF_PATH. To avoid many | |
2026 files being stored in the same directory, a pathname may be | |
2027 constructed using %\fInum\fRs and %s notation, consuming \fInum\fR | |
2028 characters of the MD5sum. For example | |
2029 \fB/local/ref_cache/%2s/%2s/%s\fR will create 2 nested subdirectories | |
2030 with the filenames in the deepest directory being the last 28 | |
2031 characters of the md5sum. | |
2032 | |
2033 The REF_CACHE directory will be searched for before attempting to load | |
2034 via the REF_PATH search list. If no REF_PATH is defined, both | |
2035 REF_PATH and REF_CACHE will be automatically set (see above), but if | |
2036 REF_PATH is defined and REF_CACHE not then no local cache is used. | |
2037 | |
2038 To aid population of the REF_CACHE directory a script | |
2039 \fBmisc/seq_cache_populate.pl\fR is provided in the Samtools | |
2040 distribution. This takes a fasta file or a directory of fasta files | |
2041 and generates the MD5sum named files. | |
2042 .PP | |
2043 .SH EXAMPLES | |
2044 .IP o 2 | |
2045 Import SAM to BAM when | |
2046 .B @SQ | |
2047 lines are present in the header: | |
2048 .EX 2 | |
2049 samtools view -bS aln.sam > aln.bam | |
2050 .EE | |
2051 If | |
2052 .B @SQ | |
2053 lines are absent: | |
2054 .EX 2 | |
2055 samtools faidx ref.fa | |
2056 samtools view -bt ref.fa.fai aln.sam > aln.bam | |
2057 .EE | |
2058 where | |
2059 .I ref.fa.fai | |
2060 is generated automatically by the | |
2061 .B faidx | |
2062 command. | |
2063 | |
2064 .IP o 2 | |
2065 Convert a BAM file to a CRAM file using a local reference sequence. | |
2066 .EX 2 | |
2067 samtools view -C -T ref.fa aln.bam > aln.cram | |
2068 .EE | |
2069 .IP o 2 | |
2070 Attach the | |
2071 .B RG | |
2072 tag while merging sorted alignments: | |
2073 .EX 2 | |
2074 perl -e 'print "@RG\\tID:ga\\tSM:hs\\tLB:ga\\tPL:Illumina\\n@RG\\tID:454\\tSM:hs\\tLB:454\\tPL:454\\n"' > rg.txt | |
2075 samtools merge -rh rg.txt merged.bam ga.bam 454.bam | |
2076 .EE | |
2077 The value in a | |
2078 .B RG | |
2079 tag is determined by the file name the read is coming from. In this | |
2080 example, in the | |
2081 .IR merged.bam , | |
2082 reads from | |
2083 .I ga.bam | |
2084 will be attached | |
2085 .IR RG:Z:ga , | |
2086 while reads from | |
2087 .I 454.bam | |
2088 will be attached | |
2089 .IR RG:Z:454 . | |
2090 | |
2091 .IP o 2 | |
2092 Call SNPs and short INDELs: | |
2093 .EX 2 | |
2094 samtools mpileup -uf ref.fa aln.bam | bcftools call -mv > var.raw.vcf | |
2095 bcftools filter -s LowQual -e '%QUAL<20 || DP>100' var.raw.vcf > var.flt.vcf | |
2096 .EE | |
2097 The | |
2098 .B bcftools filter | |
2099 command marks low quality sites and sites with the read depth exceeding | |
2100 a limit, which should be adjusted to about twice the average read depth | |
2101 (bigger read depths usually indicate problematic regions which are | |
2102 often enriched for artefacts). One may consider to add | |
2103 .B -C50 | |
2104 to | |
2105 .B mpileup | |
2106 if mapping quality is overestimated for reads containing excessive | |
2107 mismatches. Applying this option usually helps | |
2108 .B BWA-short | |
2109 but may not other mappers. | |
2110 | |
2111 Individuals are identified from the | |
2112 .B SM | |
2113 tags in the | |
2114 .B @RG | |
2115 header lines. Individuals can be pooled in one alignment file; one | |
2116 individual can also be separated into multiple files. The | |
2117 .B -P | |
2118 option specifies that indel candidates should be collected only from | |
2119 read groups with the | |
2120 .B @RG-PL | |
2121 tag set to | |
2122 .IR ILLUMINA . | |
2123 Collecting indel candidates from reads sequenced by an indel-prone | |
2124 technology may affect the performance of indel calling. | |
2125 | |
2126 .IP o 2 | |
2127 Generate the consensus sequence for one diploid individual: | |
2128 .EX 2 | |
2129 samtools mpileup -uf ref.fa aln.bam | bcftools call -c | vcfutils.pl vcf2fq > cns.fq | |
2130 .EE | |
2131 .IP o 2 | |
2132 Phase one individual: | |
2133 .EX 2 | |
2134 samtools calmd -AEur aln.bam ref.fa | samtools phase -b prefix - > phase.out | |
2135 .EE | |
2136 The | |
2137 .B calmd | |
2138 command is used to reduce false heterozygotes around INDELs. | |
2139 | |
2140 | |
2141 .IP o 2 | |
2142 Dump BAQ applied alignment for other SNP callers: | |
2143 .EX 2 | |
2144 samtools calmd -bAr aln.bam > aln.baq.bam | |
2145 .EE | |
2146 It adds and corrects the | |
2147 .B NM | |
2148 and | |
2149 .B MD | |
2150 tags at the same time. The | |
2151 .B calmd | |
2152 command also comes with the | |
2153 .B -C | |
2154 option, the same as the one in | |
2155 .B pileup | |
2156 and | |
2157 .BR mpileup . | |
2158 Apply if it helps. | |
2159 | |
2160 .SH LIMITATIONS | |
2161 .PP | |
2162 .IP o 2 | |
2163 Unaligned words used in bam_import.c, bam_endian.h, bam.c and bam_aux.c. | |
2164 .IP o 2 | |
2165 Samtools paired-end rmdup does not work for unpaired reads (e.g. orphan | |
2166 reads or ends mapped to different chromosomes). If this is a concern, | |
2167 please use Picard's MarkDuplicates which correctly handles these cases, | |
2168 although a little slower. | |
2169 | |
2170 .SH AUTHOR | |
2171 .PP | |
2172 Heng Li from the Sanger Institute wrote the original C version of samtools. | |
2173 Bob Handsaker from the Broad Institute implemented the BGZF library. | |
2174 James Bonfield from the Sanger Institute developed the CRAM implementation. | |
2175 John Marshall and Petr Danecek contribute to the source code and various | |
2176 people from the 1000 Genomes Project have contributed to the SAM format | |
2177 specification. | |
2178 | |
2179 .SH SEE ALSO | |
2180 .IR bcftools (1), | |
2181 .IR sam (5), | |
2182 .IR tabix (1) | |
2183 .PP | |
2184 Samtools website: <http://www.htslib.org/> | |
2185 .br | |
2186 File format specification of SAM/BAM,CRAM,VCF/BCF: <http://samtools.github.io/hts-specs> | |
2187 .br | |
2188 Samtools latest source: <https://github.com/samtools/samtools> | |
2189 .br | |
2190 HTSlib latest source: <https://github.com/samtools/htslib> | |
2191 .br | |
2192 Bcftools website: <http://samtools.github.io/bcftools> |