comparison SeqSero2/README.md @ 17:03f7b358d57f

planemo upload
author jpayne
date Tue, 25 Mar 2025 23:22:38 -0400
parents 87c7eebc6797
children
comparison
equal deleted inserted replaced
16:3b6d5b60968f 17:03f7b358d57f
1 # SeqSero2 v1.0.0 1 # SeqSero2
2 Salmonella serotype prediction from genome sequencing data 2 Salmonella serotype prediction from genome sequencing data.
3
4 Online version: http://www.denglab.info/SeqSero2
3 5
4 # Introduction 6 # Introduction
5 SeqSero2 is a pipeline for Salmonella serotype prediction from raw sequencing reads or genome assemblies 7 SeqSero2 is a pipeline for Salmonella serotype prediction from raw sequencing reads or genome assemblies
6 8
7 # Dependencies 9 # Dependencies
8 SeqSero has three workflows: 10 SeqSero2 has three workflows:
9 11
10 (A) Allele micro-assembly (default). This workflow takes raw reads as input and performs targeted assembly of serotype determinant alleles. Assembled alleles are used to predict serotype and flag potential inter-serotype contamination in sequencing data (i.e., presence of reads from multiple serotypes due to, for example, cross or carryover contamination during sequencing). 12 (A) Allele micro-assembly (default). This workflow takes raw reads as input and performs targeted assembly of serotype determinant alleles. Assembled alleles are used to predict serotype and flag potential inter-serotype contamination in sequencing data (i.e., presence of reads from multiple serotypes due to, for example, cross or carryover contamination during sequencing).
11 13
12 Allele micro-assembly workflow depends on: 14 Allele micro-assembly workflow depends on:
13 15
14 1. Python 3; 16 1. Python 3;
15 17
16 2. [Burrows-Wheeler Aligner v0.7.12](http://sourceforge.net/projects/bio-bwa/files/); 18 2. Biopython 1.73;
17 19
18 3. [Samtools v1.8](http://sourceforge.net/projects/samtools/files/samtools/); 20 3. [Burrows-Wheeler Aligner v0.7.12](http://sourceforge.net/projects/bio-bwa/files/);
19 21
20 4. [NCBI BLAST v2.2.28+](https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download); 22 4. [Samtools v1.8](http://sourceforge.net/projects/samtools/files/samtools/);
21 23
22 5. [SRA Toolkit v2.8.0](http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=show&f=software&m=software&s=software); 24 5. [NCBI BLAST v2.2.28+](https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download);
23 25
24 6. [SPAdes v3.9.0](http://bioinf.spbau.ru/spades); 26 6. [SRA Toolkit v2.8.0](http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=show&f=software&m=software&s=software);
25 27
26 7. [Bedtools v2.17.0](http://bedtools.readthedocs.io/en/latest/); 28 7. [SPAdes v3.15.5](http://bioinf.spbau.ru/spades);
27 29
28 8. [SalmID v0.11](https://github.com/hcdenbakker/SalmID). 30 8. [Bedtools v2.17.0](http://bedtools.readthedocs.io/en/latest/);
31
32 9. [SalmID v0.11](https://github.com/hcdenbakker/SalmID).
29 33
30 34
31 (B) Raw reads k-mer. This workflow takes raw reads as input and performs rapid serotype prediction based on unique k-mers of serotype determinants. 35 (B) Raw reads k-mer. This workflow takes raw reads as input and performs rapid serotype prediction based on unique k-mers of serotype determinants.
32 36
33 Raw reads k-mer workflow (originally SeqSeroK) depends on: 37 Raw reads k-mer workflow (originally SeqSeroK) depends on:
36 2. [SRA Toolkit](http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=show&f=software&m=software&s=software) (optional, just used to fastq-dump sra files); 40 2. [SRA Toolkit](http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=show&f=software&m=software&s=software) (optional, just used to fastq-dump sra files);
37 41
38 42
39 (C) Genome assembly k-mer. This workflow takes genome assemblies as input and the rest of the workflow largely overlaps with the raw reads k-mer workflow 43 (C) Genome assembly k-mer. This workflow takes genome assemblies as input and the rest of the workflow largely overlaps with the raw reads k-mer workflow
40 44
45 # Installation
46 ### Conda
47 To install the latest SeqSero2 Conda package (recommended):
48 ```
49 conda install -c bioconda seqsero2=1.3.1
50 ```
51 ### Git
52 To install the SeqSero2 git repository locally:
53 ```
54 git clone https://github.com/denglab/SeqSero2.git
55 cd SeqSero2
56 python3 -m pip install --user .
57 ```
58 ### Other options
59 Third party SeqSero2 installations (may not be the latest version of SeqSero2): \
60 https://github.com/B-UMMI/docker-images/tree/master/seqsero2 \
61 https://github.com/denglab/SeqSero2/issues/13
62
41 63
42 # Executing the code 64 # Executing the code
43 Make sure all SeqSero2 and its dependency executables are added to your path (e.g. to ~/.bashrc). Then type SeqSero2_package.py to get detailed instructions. 65 Make sure all SeqSero2 and its dependency executables are added to your path (e.g. to ~/.bashrc). Then type SeqSero2_package.py to get detailed instructions.
44 66
45 Usage: SeqSero2_package.py 67 Usage: SeqSero2_package.py
46 68
47 -m <string> (which workflow to apply, 'a'(raw reads allele micro-assembly), 'k'(raw reads and genome assembly k-mer), default=a) 69 -m <string> (which workflow to apply, 'a'(raw reads allele micro-assembly), 'k'(raw reads and genome assembly k-mer), default=a)
48 70
49 -t <string> (input data type, '1' for interleaved paired-end reads, '2' for separated paired-end reads, '3' for single reads, '4' for genome assembly, '5' for nanopore fasta, '6'for nanopore fastq) 71 -t <string> (input data type, '1' for interleaved paired-end reads, '2' for separated paired-end reads, '3' for single reads, '4' for genome assembly, '5' for nanopore reads (fasta/fastq))
50 72
51 -i <file> (/path/to/input/file) 73 -i <file> (/path/to/input/file)
52 74
53 -p <int> (number of threads for allele mode, if p >4, only 4 threads will be used for assembly since the amount of extracted reads is small, default=1) 75 -p <int> (number of threads for allele mode, if p >4, only 4 threads will be used for assembly since the amount of extracted reads is small, default=1)
54 76
55 -b <string> (algorithms for bwa mapping for allele mode; 'mem' for mem, 'sam' for samse/sampe; default=mem; optional; for now we only optimized for default "mem" mode) 77 -b <string> (algorithms for bwa mapping for allele mode; 'mem' for mem, 'sam' for samse/sampe; default=mem; optional; for now we only optimized for default "mem" mode)
56 78
57 -d <string> (output directory name, if not set, the output directory would be 'SeqSero_result_'+time stamp+one random number) 79 -d <string> (output directory name, if not set, the output directory would be 'SeqSero_result_'+time stamp+one random number)
58 80
59 -c <flag> (if '-c' was flagged, SeqSero2 will only output serotype prediction without the directory containing log files) 81 -c <flag> (if '-c' was flagged, SeqSero2 will only output serotype prediction without the directory containing log files)
82
83 -n <string> (optional, to specify a sample name in the report output)
84
85 -s <flag> (if '-s' was flagged, SeqSero2 will not output header in SeqSero_result.tsv)
86
87 --check <flag> (use '--check' flag to check the required dependencies)
88
89 -v, --version (show program's version number and exit)
60 90
61 91
62 # Examples 92 # Examples
63 Allele mode: 93 Allele mode:
64 94
72 102
73 # Genome assembly k-mer ("-t 4", genome assemblies only predicted by the k-mer workflow, "-m k") 103 # Genome assembly k-mer ("-t 4", genome assemblies only predicted by the k-mer workflow, "-m k")
74 SeqSero2_package.py -m k -t 4 -i assembly.fasta 104 SeqSero2_package.py -m k -t 4 -i assembly.fasta
75 105
76 # Output 106 # Output
77 Upon executing the command, a directory named 'SeqSero_result_Time_your_run' will be created. Your result will be stored in 'Seqsero_result.txt' in that directory. And the assembled alleles can also be found in the directory if using "-m a" (allele mode). 107 Upon executing the command, a directory named 'SeqSero_result_Time_your_run' will be created. Your result will be stored in 'SeqSero_result.txt' in that directory. And the assembled alleles can also be found in the directory if using "-m a" (allele mode).
78 108
79 109
80 # Citation 110 # Citation
111 Zhang S, Den-Bakker HC, Li S, Dinsmore BA, Lane C, Lauer AC, Fields PI, Deng X.
112 SeqSero2: rapid and improved Salmonella serotype determination using whole genome sequencing data.
113 **Appl Environ Microbiology. 2019 Sep; 85(23):e01746-19.** [PMID: 31540993](https://aem.asm.org/content/early/2019/09/17/AEM.01746-19.long)
114
81 Zhang S, Yin Y, Jones MB, Zhang Z, Deatherage Kaiser BL, Dinsmore BA, Fitzgerald C, Fields PI, Deng X. 115 Zhang S, Yin Y, Jones MB, Zhang Z, Deatherage Kaiser BL, Dinsmore BA, Fitzgerald C, Fields PI, Deng X.
82 Salmonella serotype determination utilizing high-throughput genome sequencing data. 116 Salmonella serotype determination utilizing high-throughput genome sequencing data.
83 **J Clin Microbiol.** 2015 May;53(5):1685-92.[PMID:25762776](http://jcm.asm.org/content/early/2015/03/05/JCM.00323-15) 117 **J Clin Microbiol. 2015 May;53(5):1685-92.** [PMID: 25762776](http://jcm.asm.org/content/early/2015/03/05/JCM.00323-15)