annotate 0.2.0/readme/cronology.md @ 0:9e8b1c747a6a draft default tip

planemo upload
author galaxytrakr
date Fri, 29 May 2026 13:32:17 +0000
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
1 # cronology
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
2
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
3 `cronology` is an automated workflow for **_Cronobacter_** whole genome sequence assembly, subtyping and traceback based on [NCBI Pathogen Detection](https://www.ncbi.nlm.nih.gov/pathogens) Project for [Cronobacter](https://www.ncbi.nlm.nih.gov/pathogens/isolates/#taxgroup_name:%22Cronobacter%22). It uses `fastp` for read quality control, `shovill` and `polypolish` for **_de novo_** assembly and genome polishing, `prokka` for gene prediction and annotation, and `quast.py` for assembly quality metrics. User(s) can choose a gold standard reference genome as a model during gene prediction step with `prokka`. By default, `GCF_003516125` (**_Cronobacter sakazakii_**) is used.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
4
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
5 In parallel, for each isolate, whole genome based (genome distances) traceback analysis is performed using `mash` and `mashtree` and the results are saved as a phylogenetic tree in `newick` format. Accompanying metadata generated can be uploaded to [iTOL](https://itol.embl.de/) for tree visualization.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
6
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
7 User(s) can also run pangenome analysis using `pirate` but this will considerably increase the run time of the pipeline if the input has more than ~50 samples.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
8
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
9 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
10  
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
11
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
12 <!-- TOC -->
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
13
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
14 - [Minimum Requirements](#minimum-requirements)
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
15 - [CFSAN GalaxyTrakr](#cfsan-galaxytrakr)
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
16 - [Usage and Examples](#usage-and-examples)
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
17 - [Database](#database)
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
18 - [Input](#input)
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
19 - [Output](#output)
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
20 - [Sample Clustering](#sample-clustering)
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
21 - [Computational resources](#computational-resources)
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
22 - [Runtime profiles](#runtime-profiles)
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
23 - [your_institution.config](#your_institutionconfig)
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
24 - [Cloud computing](#cloud-computing)
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
25 - [Example data](#example-data)
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
26 - [cronology CLI Help](#cronology-cli-help)
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
27
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
28 <!-- /TOC -->
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
29
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
30 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
31 &nbsp;
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
32
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
33 ## Minimum Requirements
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
34
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
35 1. [Nextflow version 23.04.3](https://github.com/nextflow-io/nextflow/releases/download/v23.04.3/nextflow).
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
36 - Make the `nextflow` binary executable (`chmod 755 nextflow`) and also make sure that it is made available in your `$PATH`.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
37 - If your existing `JAVA` install does not support the newest **Nextflow** version, you can try **Amazon**'s `JAVA` (OpenJDK): [Corretto](https://corretto.aws/downloads/latest/amazon-corretto-17-x64-linux-jdk.tar.gz).
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
38 2. Either of `micromamba` (version `1.0.0`) or `docker` or `singularity` installed and made available in your `$PATH`.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
39 - Running the workflow via `micromamba` software provisioning is **preferred** as it does not require any `sudo` or `admin` privileges or any other configurations with respect to the various container providers.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
40 - To install `micromamba` for your system type, please follow these [installation steps](https://mamba.readthedocs.io/en/latest/installation/micromamba-installation.html#linux-and-macos) and make sure that the `micromamba` binary is made available in your `$PATH`.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
41 - Just the `curl` step is sufficient to download the binary as far as running the workflows are concerned.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
42 - Once you have finished the installation, **it is important that you downgrade `micromamba` to version `1.0.0`**.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
43
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
44 ```bash
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
45 micromamba self-update --version 1.0.0
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
46 ```
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
47
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
48 3. Minimum of 10 CPU cores and about 60 GBs for main workflow steps. More memory may be required if your **FASTQ** files are big.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
49
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
50 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
51 &nbsp;
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
52
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
53 ## CFSAN GalaxyTrakr
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
54
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
55 The `cronology` pipeline is also available for use on the [Galaxy instance supported by CFSAN, FDA](https://galaxytrakr.org/). If you wish to run the analysis using **Galaxy**, please register for an account, after which you can run the workflow by selecting `cronology` under [`Metagenomics:CPIPES`](../assets/cronology_on_galaxytrakr.PNG) tool section.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
56
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
57 Please note that the pipeline on [CFSAN GalaxyTrakr](https://galaxytrakr.org) in most cases may be a version older than the one on **GitHub** due to testing prioritization.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
58
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
59 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
60 &nbsp;
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
61
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
62 ## Usage and Examples
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
63
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
64 Clone or download this repository and then call `cpipes`.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
65
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
66 ```bash
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
67 cpipes --pipeline cronology [options]
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
68 ```
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
69
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
70 Alternatively, you can use `nextflow` to directly pull and run the pipeline.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
71
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
72 ```bash
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
73 nextflow pull CFSAN-Biostatistics/cronology
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
74 nextflow list
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
75 nextflow info CFSAN-Biostatistics/cronology
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
76 nextflow run CFSAN-Biostatistics/cronology --pipeline cronology_db --help
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
77 nextflow run CFSAN-Biostatistics/cronology --pipeline cronology --help
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
78 ```
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
79
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
80 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
81 &nbsp;
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
82
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
83 **Example**: Run the default `cronology` pipeline in single-end mode.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
84
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
85 ```bash
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
86 cd /data/scratch/$USER
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
87 mkdir nf-cpipes
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
88 cd nf-cpipes
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
89 cpipes
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
90 --pipeline cronology \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
91 --input /path/to/illumina/fastq/dir \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
92 --output /path/to/output \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
93 --cronology_root_dbdir /data/Kranti_Konganti/cronology_db/PDG000000043.213 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
94 --fq_single_end true
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
95 ```
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
96
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
97 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
98 &nbsp;
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
99
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
100 **Example**: Run the `cronology` pipeline in paired-end mode.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
101
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
102 ```bash
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
103 cd /data/scratch/$USER
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
104 mkdir nf-cpipes
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
105 cd nf-cpipes
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
106 cpipes \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
107 --pipeline cronology \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
108 --input /path/to/illumina/fastq/dir \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
109 --output /path/to/output \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
110 --cronology_root_dbdir /data/Kranti_Konganti/cronology_db/PDG000000043.213 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
111 --fq_single_end false
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
112 ```
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
113
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
114 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
115 &nbsp;
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
116
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
117 ### Database
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
118
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
119 ---
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
120
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
121 Although users can choose to run the `cronology_db` pipeline, it requires access to HPC Cluster or a similar cloud setting. Since `GUNC` and `CheckM2` tools are used to filter out low quality assemblies, which require its own databases, the runtime is longer than usual. Therefore, the pre-formatted databases will be provided for download.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
122
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
123 - Download the `PDG000000043.213` version of **NCBI Pathogens release** for **_Cronobacter_**: <https://research.foodsafetyrisk.org/cronology/PDG000000043.213.tar.bz2>.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
124
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
125 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
126 &nbsp;
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
127
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
128 ### Input
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
129
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
130 ---
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
131
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
132 The input to the workflow is a folder containing compressed (`.gz`) FASTQ files. Please note that the sample grouping happens automatically by the file name of the FASTQ file. If for example, a single sample is sequenced across multiple sequencing lanes, you can choose to group those FASTQ files into one sample by using the `--fq_filename_delim` and `--fq_filename_delim_idx` options. By default, `--fq_filename_delim` is set to `_` (underscore) and `--fq_filename_delim_idx` is set to 1.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
133
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
134 For example, if the directory contains FASTQ files as shown below:
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
135
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
136 - KB-01_apple_L001_R1.fastq.gz
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
137 - KB-01_apple_L001_R2.fastq.gz
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
138 - KB-01_apple_L002_R1.fastq.gz
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
139 - KB-01_apple_L002_R2.fastq.gz
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
140 - KB-02_mango_L001_R1.fastq.gz
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
141 - KB-02_mango_L001_R2.fastq.gz
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
142 - KB-02_mango_L002_R1.fastq.gz
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
143 - KB-02_mango_L002_R2.fastq.gz
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
144
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
145 Then, to create 2 sample groups, `apple` and `mango`, we split the file name by the delimitor (underscore in the case, which is default) and group by the first 2 words (`--fq_filename_delim_idx 2`).
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
146
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
147 This goes without saying that all the FASTQ files should have uniform naming patterns so that `--fq_filename_delim` and `--fq_filename_delim_idx` options do not have any adverse effect in collecting and creating a sample metadata sheet.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
148
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
149 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
150 &nbsp;
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
151
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
152 ### Output
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
153
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
154 ---
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
155
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
156 All the outputs for each step are stored inside the folder mentioned with the `--output` option. A `multiqc_report.html` file inside the `cronology-multiqc` folder can be opened in any browser on your local workstation which contains a consolidated brief report. The tree metadata which can be uploaded to [iTOL](https://itol.embl.de/) for visualization will be located in the `cat_unique` folder.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
157
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
158 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
159 &nbsp;
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
160
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
161 ### Sample clustering
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
162
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
163 ---
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
164 Since `v0.2.0`, `cronology` can automatically upload the `mashtree` generated output to [microreact.org](https://microreact.org). For this to work, create an account and [obtain your API access token from microreact.org](https://docs.microreact.org/api/access-tokens#obtain-your-api-access-token), and put it in a file named `microreact_api.key` and save it inside the [assets](../assets/) folder. If you do not wish to automatically upload the tree to [microreact.org](https://microreact.org), you can turn it off during the command call with `--upload_microreact false` CLI option.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
165
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
166 The tree URL generated will be stored inside the `upload_microreact` output folder.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
167
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
168 Example: [https://microreact.org/project/c9GcC9pJ622FeX27f2LFRT-cronologyruntree](https://microreact.org/project/c9GcC9pJ622FeX27f2LFRT-cronologyruntree)
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
169
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
170 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
171 &nbsp;
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
172
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
173 ### Computational resources
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
174
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
175 ---
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
176
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
177 The workflow `cronology` requires at least a minimum of 60 GBs of memory to successfully finish the workflow. By default, `cronology` uses 10 CPU cores where possible. You can change this behavior and adjust the CPU cores with `--max_cpus` option.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
178
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
179 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
180 &nbsp;
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
181
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
182 Example:
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
183
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
184 ```bash
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
185 cpipes \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
186 --pipeline cronology \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
187 --input /path/to/cronology_sim_reads \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
188 --output /path/to/cronology_sim_reads_output \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
189 --cronology_root_dbdir /path/to/PDG000000043.213
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
190 --max_cpus 5 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
191 -profile stdkondagac \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
192 -resume
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
193 ```
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
194
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
195 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
196 &nbsp;
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
197
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
198 ### Runtime profiles
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
199
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
200 ---
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
201
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
202 You can use different run time profiles that suit your specific compute environments i.e., you can run the workflow locally on your machine or in a grid computing infrastructure.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
203
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
204 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
205 &nbsp;
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
206
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
207 Example:
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
208
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
209 ```bash
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
210 cd /data/scratch/$USER
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
211 mkdir nf-cpipes
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
212 cd nf-cpipes
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
213 cpipes \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
214 --pipeline cronology \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
215 --input /path/to/fastq_pass_dir \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
216 --output /path/to/where/output/should/go \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
217 -profile your_institution
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
218 ```
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
219
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
220 The above command would run the pipeline and store the output at the location per the `--output` flag and the **NEXTFLOW** reports are always stored in the current working directory from where `cpipes` is run. For example, for the above command, a directory called `CPIPES-cronology` would hold all the **NEXTFLOW** related logs, reports and trace files.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
221
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
222 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
223 &nbsp;
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
224
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
225 ### `your_institution.config`
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
226
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
227 ---
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
228
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
229 In the above example, we can see that we have mentioned the run time profile as `your_institution`. For this to work, add the following lines at the end of [`computeinfra.config`](../conf/computeinfra.config) file which should be located inside the `conf` folder. For example, if your institution uses **SGE** or **UNIVA** for grid computing instead of **SLURM** and has a job queue named `normal.q`, then add these lines:
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
230
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
231 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
232 &nbsp;
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
233
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
234 ```groovy
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
235 your_institution {
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
236 process.executor = 'sge'
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
237 process.queue = 'normal.q'
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
238 singularity.enabled = false
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
239 singularity.autoMounts = true
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
240 docker.enabled = false
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
241 params.enable_conda = true
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
242 conda.enabled = true
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
243 conda.useMicromamba = true
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
244 params.enable_module = false
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
245 }
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
246 ```
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
247
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
248 In the above example, by default, all the software provisioning choices are disabled except `conda`. You can also choose to remove the `process.queue` line altogether and the `cronology` workflow will request the appropriate memory and number of CPU cores automatically, which ranges from 1 CPU, 1 GB and 1 hour for job completion up to 10 CPU cores, 1 TB and 120 hours for job completion.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
249
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
250 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
251 &nbsp;
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
252
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
253 ### Cloud computing
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
254
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
255 ---
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
256
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
257 You can run the workflow in the cloud (works only with proper set up of AWS resources). Add new run time profiles with required parameters per [Nextflow docs](https://www.nextflow.io/docs/latest/executor.html):
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
258
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
259 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
260 &nbsp;
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
261
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
262 Example:
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
263
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
264 ```groovy
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
265 my_aws_batch {
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
266 executor = 'awsbatch'
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
267 queue = 'my-batch-queue'
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
268 aws.batch.cliPath = '/home/ec2-user/miniconda/bin/aws'
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
269 aws.batch.region = 'us-east-1'
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
270 singularity.enabled = false
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
271 singularity.autoMounts = true
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
272 docker.enabled = true
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
273 params.conda_enabled = false
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
274 params.enable_module = false
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
275 }
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
276 ```
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
277
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
278 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
279 &nbsp;
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
280
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
281 ### Example data
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
282
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
283 ---
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
284
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
285 `cronology` was tested on multiple internal sequencing runs and also on publicly available WGS run data. Please make sure that you have all the [minimum requirements](#minimum-requirements) to run the workflow.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
286
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
287 - Download public SRA data for **_Cronobacter_**: [SRR List](../assets/runs_public_cronobacter.txt). You can download a minimized set of sequencing runs for testing purposes.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
288 - Download pre-formatted full database for **NCBI Pathogens release**: [PDG000000043.213](https://research.foodsafetyrisk.org/cronology/PDG000000043.213.tar.bz2) (~500 MB).
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
289 - After succesful run of the workflow, your **MultiQC** report should look something like [this](https://research.foodsafetyrisk.org/cronology/627_crono_multiqc_report.html).
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
290 - It is always a best practice to use absolute UNIX paths and real destinations of symbolic links during pipeline execution. For example, find out the real path(s) of your absolute UNIX path(s) and use that for the `--input` and `--output` options of the pipeline.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
291
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
292 ```bash
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
293 realpath /hpc/scratch/user/input
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
294 ```
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
295
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
296 Now, run the workflow:
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
297
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
298 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
299 &nbsp;
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
300
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
301 ```bash
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
302 cpipes \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
303 --pipeline cronology \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
304 --input /path/to/sra_reads \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
305 --output /path/to/sra_reads_output \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
306 --cronology_root_dbdir /path/to/PDG000000043.213 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
307 --fq_single_end false \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
308 --fq_suffix '_1.fastq.gz' --fq2_suffix '_2.fastq.gz' \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
309 -profile stdkondagac \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
310 -resume
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
311 ```
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
312
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
313 Please note that the run time profile `stdkondagac` will run jobs locally using `micromamba` for software provisioning. The first time you run the command, a new folder called `kondagac_cache` will be created and subsequent runs should use this `conda` cache.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
314
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
315 \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
316 &nbsp;
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
317
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
318 ## `cronology` CLI Help
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
319
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
320 ```text
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
321 [Kranti_Konganti@my-unix-box ]$ cpipes --pipeline cronology --help
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
322 N E X T F L O W ~ version 23.04.3
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
323 Launching `./cronology/cpipes` [jovial_colden] DSL2 - revision: 79ea031fad
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
324 ================================================================================
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
325 (o)
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
326 ___ _ __ _ _ __ ___ ___
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
327 / __|| '_ \ | || '_ \ / _ \/ __|
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
328 | (__ | |_) || || |_) || __/\__ \
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
329 \___|| .__/ |_|| .__/ \___||___/
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
330 | | | |
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
331 |_| |_|
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
332 --------------------------------------------------------------------------------
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
333 A collection of modular pipelines at CFSAN, FDA.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
334 --------------------------------------------------------------------------------
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
335 Name : CPIPES
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
336 Author : Kranti.Konganti@fda.hhs.gov
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
337 Version : 0.7.0
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
338 Center : CFSAN, FDA.
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
339 ================================================================================
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
340
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
341
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
342 --------------------------------------------------------------------------------
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
343 Show configurable CLI options for each tool within cronology
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
344 --------------------------------------------------------------------------------
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
345 Ex: cpipes --pipeline cronology --help
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
346 Ex: cpipes --pipeline cronology --help fastp
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
347 Ex: cpipes --pipeline cronology --help fastp,polypolish
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
348 --------------------------------------------------------------------------------
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
349 --help dpubmlstpy : Show dl_pubmlst_profiles_and_schemes.py CLI
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
350 options CLI options
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
351 --help fastp : Show fastp CLI options
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
352 --help spades : Show spades CLI options
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
353 --help shovill : Show shovill CLI options
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
354 --help polypolish : Show polypolish CLI options
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
355 --help quast : Show quast.py CLI options
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
356 --help prodigal : Show prodigal CLI options
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
357 --help prokka : Show prokka CLI options
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
358 --help pirate : Show priate CLI options
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
359 --help mlst : Show mlst CLI options
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
360 --help mash : Show mash `screen` CLI options
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
361 --help tree : Show mashtree CLI options
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
362 --help abricate : Show abricate CLI options
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
363
9e8b1c747a6a planemo upload
galaxytrakr
parents:
diff changeset
364 ```