Mercurial > repos > kkonganti > hfp_nowayout
changeset 0:97cd2f532efe
planemo upload
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/LICENSE.md Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,98 @@ +# CPIPES (CFSAN PIPELINES) + +## The modular pipeline repository at CFSAN, FDA + +**CPIPES** (CFSAN PIPELINES) is a collection of modular pipelines based on **NEXTFLOW**, +mostly for bioinformatics data analysis at **CFSAN, FDA.** + +--- + +### **LICENSES** + +\ + + +**CPIPES** is licensed under: + +```text +MIT License + +In the U.S.A. Public Domain; elsewhere Copyright (c) 2022 U.S. Food and Drug Administration + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. +``` + +\ + + +Portions of **CPIPES** are built on modified versions of many tools, scripts and libraries from [nf-core/modules](https://github.com/nf-core/modules) and [nf-core/rnaseq](https://github.com/nf-core/rna-seq) which are originally licensed under: + +```text +MIT License + +Copyright (c) Philip Ewels +Copyright (c) Phil Ewels, Rickard Hammarén + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. +``` + +\ + + +The **MultiQC** report, in addition uses [DataTables](https://datatables.net), which is licensed under: + +```text +MIT License + +Copyright (C) 2008-2022, SpryMedia Ltd. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. +```
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/README.md Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,48 @@ +# CPIPES (CFSAN PIPELINES) + +## The modular pipeline repository at CFSAN, FDA + +**CPIPES** (CFSAN PIPELINES) is a collection of modular pipelines based on **NEXTFLOW**, +mostly for bioinformatics data analysis at **CFSAN, FDA.** + +--- + +### **Pipelines** + +--- +**CPIPES**: + + 1. `centriflaken` : [README](./readme/centriflaken.md). + 2. `centriflaken_hy` : [README](./readme/centriflaken_hy.md). + +#### Workflow Usage + +Following is the example of how to run the `centriflaken` pipeline on the **CFSAN** raven cluster. + +```bash +module load cpipes/0.4.0 + +cpipes --pipeline centriflaken [options] +``` + +Example: + +```bash +cd /hpc/scratch/$USER +mkdir nf-cpipes +cd nf-cpipes +cpipes \ + --pipeline centriflaken \ + --input /path/to/fastq_pass_dir \ + --output /path/to/where/output/should/go \ + --user_email First.Last@fda.hhs.gov \ + -profile raven +``` + +The above command would run the pipeline and store the output wherever the author of the workflow decided it to be and the **NEXTFLOW** reports are always stored in the current working directory from where `cpipes` is run. For example, for the above command, a directory called `CPIPES-centriflaken` would hold all the **NEXTFLOW** +related logs, reports and trace files. + +### **BETA** + +--- +The development of the modular structure and flow is an ongoing effort and may change depending on assessment of various computational topics and other considerations.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/assets/adaptors.fa Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,1194 @@ +>gnl|uv|NGB00360.1 Illumina PCR Primer +AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT +>gnl|uv|NGB00362.1 Illumina Paired End PCR Primer 2.0 +CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT +>gnl|uv|NGB00363.1 Illumina Multiplexing PCR Primer 2.0 +GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00364.1 Illumina Multiplexing PCR Primer Index 1 +CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTTC +>gnl|uv|NGB00365.1 Illumina Multiplexing PCR Primer Index 2 +CAAGCAGAAGACGGCATACGAGATACATCGGTGACTGGAGTTC +>gnl|uv|NGB00366.1 Illumina Multiplexing PCR Primer Index 3 +CAAGCAGAAGACGGCATACGAGATGCCTAAGTGACTGGAGTTC +>gnl|uv|NGB00367.1 Illumina Multiplexing PCR Primer Index 4 +CAAGCAGAAGACGGCATACGAGATTGGTCAGTGACTGGAGTTC +>gnl|uv|NGB00368.1 Illumina Multiplexing PCR Primer Index 5 +CAAGCAGAAGACGGCATACGAGATCACTGTGTGACTGGAGTTC +>gnl|uv|NGB00369.1 Illumina Multiplexing PCR Primer Index 6 +CAAGCAGAAGACGGCATACGAGATATTGGCGTGACTGGAGTTC +>gnl|uv|NGB00370.1 Illumina Multiplexing PCR Primer Index 7 +CAAGCAGAAGACGGCATACGAGATGATCTGGTGACTGGAGTTC +>gnl|uv|NGB00371.1 Illumina Multiplexing PCR Primer Index 8 +CAAGCAGAAGACGGCATACGAGATTCAAGTGTGACTGGAGTTC +>gnl|uv|NGB00372.1 Illumina Multiplexing PCR Primer Index 9 +CAAGCAGAAGACGGCATACGAGATCTGATCGTGACTGGAGTTC +>gnl|uv|NGB00373.1 Illumina Multiplexing PCR Primer Index 10 +CAAGCAGAAGACGGCATACGAGATAAGCTAGTGACTGGAGTTC +>gnl|uv|NGB00374.1 Illumina Multiplexing PCR Primer Index 11 +CAAGCAGAAGACGGCATACGAGATGTAGCCGTGACTGGAGTTC +>gnl|uv|NGB00375.1 Illumina Multiplexing PCR Primer Index 12 +CAAGCAGAAGACGGCATACGAGATTACAAGGTGACTGGAGTTC +>gnl|uv|NGB00376.1 Illumina Gex PCR Primer 2 +AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGA +>gnl|uv|NGB00377.1 Illumina DpnII Gex Sequencing Primer +CGACAGGTTCAGAGTTCTACAGTCCGACGATC +>gnl|uv|NGB00378.1 Illumina NlaIII Gex Sequencing Primer +CCGACAGGTTCAGAGTTCTACAGTCCGACATG +>gnl|uv|NGB00379.1 Illumina 3' RNA Adapter +TCGTATGCCGTCTTCTGCTTGTT +>gnl|uv|NGB00380.1 Illumina Small RNA 3' Adapter +AATCTCGTATGCCGTCTTCTGCTTGC +>gnl|uv|NGB00385.1 454 FLX linker +GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC +>gnl|uv|NGB00414.1 454 Life Sciences GS FLX Titanium Primer A-key +CGTATCGCCTCCCTCGCGCCATCAG +>gnl|uv|NGB00415.1 454 Life Sciences GS FLX Titanium Primer B-key +CTATGCGCCTTGCCAGCCCGCTCAG +>gnl|uv|NGB00416.1 454 Life Sciences GS FLX Titanium MID Adaptor B +CCTATCCCCTGTGTGCCTTGGCAGTCTCAG +>gnl|uv|NGB00417.1 454 Life Sciences GS FLX Titanium MID-1 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGACGAGTGCGT +>gnl|uv|NGB00418.1 454 Life Sciences GS FLX Titanium MID-2 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGACGCTCGACA +>gnl|uv|NGB00419.1 454 Life Sciences GS FLX Titanium MID-3 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGAGACGCACTC +>gnl|uv|NGB00420.1 454 Life Sciences GS FLX Titanium MID-4 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGAGCACTGTAG +>gnl|uv|NGB00421.1 454 Life Sciences GS FLX Titanium MID-5 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGATCAGACACG +>gnl|uv|NGB00422.1 454 Life Sciences GS FLX Titanium MID-6 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGATATCGCGAG +>gnl|uv|NGB00423.1 454 Life Sciences GS FLX Titanium MID-7 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGTGTCTCTA +>gnl|uv|NGB00424.1 454 Life Sciences GS FLX Titanium MID-8 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTCGCGTGTC +>gnl|uv|NGB00425.1 454 Life Sciences GS FLX Titanium MID-10 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCTCTATGCG +>gnl|uv|NGB00426.1 454 Life Sciences GS FLX Titanium MID-11 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGATACGTCT +>gnl|uv|NGB00427.1 454 Life Sciences GS FLX Titanium MID-13 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCATAGTAGTG +>gnl|uv|NGB00428.1 454 Life Sciences GS FLX Titanium MID-14 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGAGAGATAC +>gnl|uv|NGB00429.1 454 Life Sciences GS FLX Titanium MID-15 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGATACGACGTA +>gnl|uv|NGB00430.1 454 Life Sciences GS FLX Titanium MID-16 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCACGTACTA +>gnl|uv|NGB00431.1 454 Life Sciences GS FLX Titanium MID-17 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGTCTAGTAC +>gnl|uv|NGB00432.1 454 Life Sciences GS FLX Titanium MID-18 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCTACGTAGC +>gnl|uv|NGB00433.1 454 Life Sciences GS FLX Titanium MID-19 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGTACTACTC +>gnl|uv|NGB00434.1 454 Life Sciences GS FLX Titanium MID-20 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGACGACTACAG +>gnl|uv|NGB00435.1 454 Life Sciences GS FLX Titanium MID-21 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGTAGACTAG +>gnl|uv|NGB00436.1 454 Life Sciences GS FLX Titanium MID-22 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTACGAGTATG +>gnl|uv|NGB00437.1 454 Life Sciences GS FLX Titanium MID-23 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTACTCTCGTG +>gnl|uv|NGB00438.1 454 Life Sciences GS FLX Titanium MID-24 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTAGAGACGAG +>gnl|uv|NGB00439.1 454 Life Sciences GS FLX Titanium MID-25 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCGTCGCTCG +>gnl|uv|NGB00440.1 454 Life Sciences GS FLX Titanium MID-26 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGACATACGCGT +>gnl|uv|NGB00441.1 454 Life Sciences GS FLX Titanium MID-27 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGACGCGAGTAT +>gnl|uv|NGB00442.1 454 Life Sciences GS FLX Titanium MID-28 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGACTACTATGT +>gnl|uv|NGB00443.1 454 Life Sciences GS FLX Titanium MID-29 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGACTGTACAGT +>gnl|uv|NGB00444.1 454 Life Sciences GS FLX Titanium MID-30 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGAGACTATACT +>gnl|uv|NGB00445.1 454 Life Sciences GS FLX Titanium MID-31 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGAGCGTCGTCT +>gnl|uv|NGB00446.1 454 Life Sciences GS FLX Titanium MID-32 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGAGTACGCTAT +>gnl|uv|NGB00447.1 454 Life Sciences GS FLX Titanium MID-33 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGATAGAGTACT +>gnl|uv|NGB00448.1 454 Life Sciences GS FLX Titanium MID-34 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCACGCTACGT +>gnl|uv|NGB00449.1 454 Life Sciences GS FLX Titanium MID-35 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCAGTAGACGT +>gnl|uv|NGB00450.1 454 Life Sciences GS FLX Titanium MID-36 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGACGTGACT +>gnl|uv|NGB00451.1 454 Life Sciences GS FLX Titanium MID-37 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTACACACACT +>gnl|uv|NGB00452.1 454 Life Sciences GS FLX Titanium MID-38 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTACACGTGAT +>gnl|uv|NGB00453.1 454 Life Sciences GS FLX Titanium MID-39 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTACAGATCGT +>gnl|uv|NGB00454.1 454 Life Sciences GS FLX Titanium MID-40 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTACGCTGTCT +>gnl|uv|NGB00455.1 454 Life Sciences GS FLX Titanium MID-41 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTAGTGTAGAT +>gnl|uv|NGB00456.1 454 Life Sciences GS FLX Titanium MID-42 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCGATCACGT +>gnl|uv|NGB00457.1 454 Life Sciences GS FLX Titanium MID-43 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCGCACTAGT +>gnl|uv|NGB00458.1 454 Life Sciences GS FLX Titanium MID-44 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCTAGCGACT +>gnl|uv|NGB00459.1 454 Life Sciences GS FLX Titanium MID-45 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCTATACTAT +>gnl|uv|NGB00460.1 454 Life Sciences GS FLX Titanium MID-46 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGACGTATGT +>gnl|uv|NGB00461.1 454 Life Sciences GS FLX Titanium MID-47 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGTGAGTAGT +>gnl|uv|NGB00462.1 454 Life Sciences GS FLX Titanium MID-48 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGACAGTATATA +>gnl|uv|NGB00463.1 454 Life Sciences GS FLX Titanium MID-49 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGACGCGATCGA +>gnl|uv|NGB00464.1 454 Life Sciences GS FLX Titanium MID-50 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGACTAGCAGTA +>gnl|uv|NGB00465.1 454 Life Sciences GS FLX Titanium MID-51 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGAGCTCACGTA +>gnl|uv|NGB00466.1 454 Life Sciences GS FLX Titanium MID-52 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGAGTATACATA +>gnl|uv|NGB00467.1 454 Life Sciences GS FLX Titanium MID-53 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGAGTCGAGAGA +>gnl|uv|NGB00468.1 454 Life Sciences GS FLX Titanium MID-54 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGAGTGCTACGA +>gnl|uv|NGB00469.1 454 Life Sciences GS FLX Titanium MID-55 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGATCGTATA +>gnl|uv|NGB00470.1 454 Life Sciences GS FLX Titanium MID-56 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGCAGTACGA +>gnl|uv|NGB00471.1 454 Life Sciences GS FLX Titanium MID-57 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGCGTATACA +>gnl|uv|NGB00472.1 454 Life Sciences GS FLX Titanium MID-58 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGTACAGTCA +>gnl|uv|NGB00473.1 454 Life Sciences GS FLX Titanium MID-59 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGTACTCAGA +>gnl|uv|NGB00474.1 454 Life Sciences GS FLX Titanium MID-60 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTACGCTCTA +>gnl|uv|NGB00475.1 454 Life Sciences GS FLX Titanium MID-61 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTATAGCGTA +>gnl|uv|NGB00476.1 454 Life Sciences GS FLX Titanium MID-62 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTACGTCATCA +>gnl|uv|NGB00477.1 454 Life Sciences GS FLX Titanium MID-63 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTAGTCGCATA +>gnl|uv|NGB00478.1 454 Life Sciences GS FLX Titanium MID-64 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTATATATACA +>gnl|uv|NGB00479.1 454 Life Sciences GS FLX Titanium MID-65 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTATGCTAGTA +>gnl|uv|NGB00480.1 454 Life Sciences GS FLX Titanium MID-66 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCACGCGAGA +>gnl|uv|NGB00481.1 454 Life Sciences GS FLX Titanium MID-67 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCGATAGTGA +>gnl|uv|NGB00482.1 454 Life Sciences GS FLX Titanium MID-68 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCGCTGCGTA +>gnl|uv|NGB00483.1 454 Life Sciences GS FLX Titanium MID-69 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCTGACGTCA +>gnl|uv|NGB00484.1 454 Life Sciences GS FLX Titanium MID-70 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGAGTCAGTA +>gnl|uv|NGB00485.1 454 Life Sciences GS FLX Titanium MID-71 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGTAGTGTGA +>gnl|uv|NGB00486.1 454 Life Sciences GS FLX Titanium MID-72 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGTCACACGA +>gnl|uv|NGB00487.1 454 Life Sciences GS FLX Titanium MID-73 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGTCGTCGCA +>gnl|uv|NGB00488.1 454 Life Sciences GS FLX Titanium MID-74 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGACACATACGC +>gnl|uv|NGB00489.1 454 Life Sciences GS FLX Titanium MID-75 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGACAGTCGTGC +>gnl|uv|NGB00490.1 454 Life Sciences GS FLX Titanium MID-76 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGACATGACGAC +>gnl|uv|NGB00491.1 454 Life Sciences GS FLX Titanium MID-77 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGACGACAGCTC +>gnl|uv|NGB00492.1 454 Life Sciences GS FLX Titanium MID-78 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGACGTCTCATC +>gnl|uv|NGB00493.1 454 Life Sciences GS FLX Titanium MID-79 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGACTCATCTAC +>gnl|uv|NGB00494.1 454 Life Sciences GS FLX Titanium MID-80 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGACTCGCGCAC +>gnl|uv|NGB00495.1 454 Life Sciences GS FLX Titanium MID-81 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGAGAGCGTCAC +>gnl|uv|NGB00496.1 454 Life Sciences GS FLX Titanium MID-82 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGAGCGACTAGC +>gnl|uv|NGB00497.1 454 Life Sciences GS FLX Titanium MID-83 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGAGTAGTGATC +>gnl|uv|NGB00498.1 454 Life Sciences GS FLX Titanium MID-84 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGAGTGACACAC +>gnl|uv|NGB00499.1 454 Life Sciences GS FLX Titanium MID-85 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGAGTGTATGTC +>gnl|uv|NGB00500.1 454 Life Sciences GS FLX Titanium MID-86 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGATAGATAGAC +>gnl|uv|NGB00501.1 454 Life Sciences GS FLX Titanium MID-87 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGATATAGTCGC +>gnl|uv|NGB00502.1 454 Life Sciences GS FLX Titanium MID-88 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGATCTACTGAC +>gnl|uv|NGB00503.1 454 Life Sciences GS FLX Titanium MID-89 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCACGTAGATC +>gnl|uv|NGB00504.1 454 Life Sciences GS FLX Titanium MID-90 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCACGTGTCGC +>gnl|uv|NGB00505.1 454 Life Sciences GS FLX Titanium MID-91 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCATACTCTAC +>gnl|uv|NGB00506.1 454 Life Sciences GS FLX Titanium MID-92 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGACACTATC +>gnl|uv|NGB00507.1 454 Life Sciences GS FLX Titanium MID-93 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGAGACGCGC +>gnl|uv|NGB00508.1 454 Life Sciences GS FLX Titanium MID-94 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGTATGCGAC +>gnl|uv|NGB00509.1 454 Life Sciences GS FLX Titanium MID-95 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGTCGATCTC +>gnl|uv|NGB00510.1 454 Life Sciences GS FLX Titanium MID-96 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTACGACTGC +>gnl|uv|NGB00511.1 454 Life Sciences GS FLX Titanium MID-97 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTAGTCACTC +>gnl|uv|NGB00512.1 454 Life Sciences GS FLX Titanium MID-98 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTCTACGCTC +>gnl|uv|NGB00513.1 454 Life Sciences GS FLX Titanium MID-99 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTGTACATAC +>gnl|uv|NGB00514.1 454 Life Sciences GS FLX Titanium MID-100 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTAGACTGCAC +>gnl|uv|NGB00515.1 454 Life Sciences GS FLX Titanium MID-101 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTAGCGCGCGC +>gnl|uv|NGB00516.1 454 Life Sciences GS FLX Titanium MID-102 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTAGCTCTATC +>gnl|uv|NGB00517.1 454 Life Sciences GS FLX Titanium MID-103 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTATAGACATC +>gnl|uv|NGB00518.1 454 Life Sciences GS FLX Titanium MID-104 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTATGATACGC +>gnl|uv|NGB00519.1 454 Life Sciences GS FLX Titanium MID-105 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCACTCATAC +>gnl|uv|NGB00520.1 454 Life Sciences GS FLX Titanium MID-106 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCATCGAGTC +>gnl|uv|NGB00521.1 454 Life Sciences GS FLX Titanium MID-107 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCGAGCTCTC +>gnl|uv|NGB00522.1 454 Life Sciences GS FLX Titanium MID-108 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCGCAGACAC +>gnl|uv|NGB00523.1 454 Life Sciences GS FLX Titanium MID-109 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCTGTCTCGC +>gnl|uv|NGB00524.1 454 Life Sciences GS FLX Titanium MID-110 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGAGTGACGC +>gnl|uv|NGB00525.1 454 Life Sciences GS FLX Titanium MID-111 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGATGTGTAC +>gnl|uv|NGB00526.1 454 Life Sciences GS FLX Titanium MID-112 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGCTATAGAC +>gnl|uv|NGB00527.1 454 Life Sciences GS FLX Titanium MID-113 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGCTCGCTAC +>gnl|uv|NGB00528.1 454 Life Sciences GS FLX Titanium MID-114 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGACGTGCAGCG +>gnl|uv|NGB00529.1 454 Life Sciences GS FLX Titanium MID-115 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGACTCACAGAG +>gnl|uv|NGB00530.1 454 Life Sciences GS FLX Titanium MID-116 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGAGACTCAGCG +>gnl|uv|NGB00531.1 454 Life Sciences GS FLX Titanium MID-117 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGAGAGAGTGTG +>gnl|uv|NGB00532.1 454 Life Sciences GS FLX Titanium MID-118 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGAGCTATCGCG +>gnl|uv|NGB00533.1 454 Life Sciences GS FLX Titanium MID-119 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGAGTCTGACTG +>gnl|uv|NGB00534.1 454 Life Sciences GS FLX Titanium MID-120 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGAGTGAGCTCG +>gnl|uv|NGB00535.1 454 Life Sciences GS FLX Titanium MID-121 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGATAGCTCTCG +>gnl|uv|NGB00536.1 454 Life Sciences GS FLX Titanium MID-122 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGATCACGTGCG +>gnl|uv|NGB00537.1 454 Life Sciences GS FLX Titanium MID-123 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGATCGTAGCAG +>gnl|uv|NGB00538.1 454 Life Sciences GS FLX Titanium MID-124 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGATCGTCTGTG +>gnl|uv|NGB00539.1 454 Life Sciences GS FLX Titanium MID-125 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGATGTACGATG +>gnl|uv|NGB00540.1 454 Life Sciences GS FLX Titanium MID-126 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGATGTGTCTAG +>gnl|uv|NGB00541.1 454 Life Sciences GS FLX Titanium MID-127 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCACACGATAG +>gnl|uv|NGB00542.1 454 Life Sciences GS FLX Titanium MID-128 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCACTCGCACG +>gnl|uv|NGB00543.1 454 Life Sciences GS FLX Titanium MID-129 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCAGACGTCTG +>gnl|uv|NGB00544.1 454 Life Sciences GS FLX Titanium MID-130 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCAGTACTGCG +>gnl|uv|NGB00545.1 454 Life Sciences GS FLX Titanium MID-131 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGACAGCGAG +>gnl|uv|NGB00546.1 454 Life Sciences GS FLX Titanium MID-132 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGATCTGTCG +>gnl|uv|NGB00547.1 454 Life Sciences GS FLX Titanium MID-133 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGCGTGCTAG +>gnl|uv|NGB00548.1 454 Life Sciences GS FLX Titanium MID-134 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGCTCGAGTG +>gnl|uv|NGB00549.1 454 Life Sciences GS FLX Titanium MID-135 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGTGATGACG +>gnl|uv|NGB00550.1 454 Life Sciences GS FLX Titanium MID-136 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTATGTACAG +>gnl|uv|NGB00551.1 454 Life Sciences GS FLX Titanium MID-137 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTCGATATAG +>gnl|uv|NGB00552.1 454 Life Sciences GS FLX Titanium MID-138 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTCGCACGCG +>gnl|uv|NGB00553.1 454 Life Sciences GS FLX Titanium MID-139 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTGCGTCACG +>gnl|uv|NGB00554.1 454 Life Sciences GS FLX Titanium MID-140 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTGTGCGTCG +>gnl|uv|NGB00555.1 454 Life Sciences GS FLX Titanium MID-141 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTAGCATACTG +>gnl|uv|NGB00556.1 454 Life Sciences GS FLX Titanium MID-142 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTATACATGTG +>gnl|uv|NGB00557.1 454 Life Sciences GS FLX Titanium MID-143 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTATCACTCAG +>gnl|uv|NGB00558.1 454 Life Sciences GS FLX Titanium MID-144 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTATCTGATAG +>gnl|uv|NGB00559.1 454 Life Sciences GS FLX Titanium MID-145 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCGTGACATG +>gnl|uv|NGB00560.1 454 Life Sciences GS FLX Titanium MID-146 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCTGATCGAG +>gnl|uv|NGB00561.1 454 Life Sciences GS FLX Titanium MID-147 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGACATCTCG +>gnl|uv|NGB00562.1 454 Life Sciences GS FLX Titanium MID-148 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGAGCTAGAG +>gnl|uv|NGB00563.1 454 Life Sciences GS FLX Titanium MID-149 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGATAGAGCG +>gnl|uv|NGB00564.1 454 Life Sciences GS FLX Titanium MID-150 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGCGTGTGCG +>gnl|uv|NGB00565.1 454 Life Sciences GS FLX Titanium MID-151 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGCTAGTCAG +>gnl|uv|NGB00566.1 454 Life Sciences GS FLX Titanium MID-152 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGTATCACAG +>gnl|uv|NGB00567.1 454 Life Sciences GS FLX Titanium MID-153 Adaptor A +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGTGCGCGTG +>gnl|uv|NGB00568.1 454 GS FLX Titanium Rapid Library Adaptor A universal segment +CCATCTCATCCCTGCGTGTCTCCGACGACT +>gnl|uv|NGB00569.1 454 GS FLX Titanium Rapid Library Adaptor B universal segment +NGTCGNCGTCTCTCAAGGCACACAGGGGATAGG +>gnl|uv|NGB00099.1 CLONTECH GenomeWalker Adaptor +GTAATACGACTCACTATAGGGCACGCGTGGTCGACGGCCCGGGCTGGT +>gnl|uv|NGB00361.2 Illumina PCR Primer (Oligonucleotide sequence copyright 2007-2009 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT +>gnl|uv|NGB00623.1 ABI SOLiD P1 Adaptor +AACCACTACGCCTCCGCTTTCCTCTCTATGGGCAGTCGGTGAT +>gnl|uv|NGB00624.1 ABI SOLiD P2 Adaptor +AGAGAATGAGGAACCCGGGGCAGTT +>gnl|uv|NGB00625.1 ABI SOLiD P2-T Adaptor +AGAGAATGAGGAACCCGGGGCAGCC +>gnl|uv|NGB00626.1 ABI SOLiD Internal Adaptor +CTGCTGTACCGTACATCCGCCTTGGCCGTACAGCAG +>gnl|uv|NGB00627.1 ABI SOLiD P1-T Adaptor +GGCCACTACGCCTCCGCTTTCCTCTCTATGGGCAGTCGGTGAT +>gnl|uv|NGB00628.1 ABI SOLiD Barcode Adaptor T-001 +CTGCCCCGGGTTCCTCATTCTCTGTGTAAGAGGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00629.1 ABI SOLiD Barcode Adaptor T-002 +CTGCCCCGGGTTCCTCATTCTCTAGGGAGTGGTCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00630.1 ABI SOLiD Barcode Adaptor T-003 +CTGCCCCGGGTTCCTCATTCTCTATAGGTTATACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00631.1 ABI SOLiD Barcode Adaptor T-004 +CTGCCCCGGGTTCCTCATTCTCTGGATGCGGTCCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00632.1 ABI SOLiD Barcode Adaptor T-005 +CTGCCCCGGGTTCCTCATTCTCTGTGGTGTAAGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00633.1 ABI SOLiD Barcode Adaptor T-006 +CTGCCCCGGGTTCCTCATTCTCTGCGAGGGACACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00634.1 ABI SOLiD Barcode Adaptor T-007 +CTGCCCCGGGTTCCTCATTCTCTGGGTTATGCCCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00635.1 ABI SOLiD Barcode Adaptor T-008 +CTGCCCCGGGTTCCTCATTCTCTGAGCGAGGATCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00636.1 ABI SOLiD Barcode Adaptor T-009 +CTGCCCCGGGTTCCTCATTCTCTAGGTTGCGACCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00637.1 ABI SOLiD Barcode Adaptor T-010 +CTGCCCCGGGTTCCTCATTCTCTGCGGTAAGCTCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00638.1 ABI SOLiD Barcode Adaptor T-011 +CTGCCCCGGGTTCCTCATTCTCTGTGCGACACGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00639.1 ABI SOLiD Barcode Adaptor T-012 +CTGCCCCGGGTTCCTCATTCTCTAAGAGGAAAACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00640.1 ABI SOLiD Barcode Adaptor T-013 +CTGCCCCGGGTTCCTCATTCTCTGCGGTAAGGCCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00641.1 ABI SOLiD Barcode Adaptor T-014 +CTGCCCCGGGTTCCTCATTCTCTGTGCGGCAGACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00642.1 ABI SOLiD Barcode Adaptor T-015 +CTGCCCCGGGTTCCTCATTCTCTGAGTTGAATGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00643.1 ABI SOLiD Barcode Adaptor T-016 +CTGCCCCGGGTTCCTCATTCTCTGGGAGACGTTCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00644.1 ABI SOLiD Barcode Adaptor T-017 +CTGCCCCGGGTTCCTCATTCTCTGGCTCACCGCCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00645.1 ABI SOLiD Barcode Adaptor T-018 +CTGCCCCGGGTTCCTCATTCTCTAGGCGGATGACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00646.1 ABI SOLiD Barcode Adaptor T-019 +CTGCCCCGGGTTCCTCATTCTCTATGGTAACTGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00647.1 ABI SOLiD Barcode Adaptor T-020 +CTGCCCCGGGTTCCTCATTCTCTGTCAAGCTTTCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00648.1 ABI SOLiD Barcode Adaptor T-021 +CTGCCCCGGGTTCCTCATTCTCTGTGCGGTTCCCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00649.1 ABI SOLiD Barcode Adaptor T-022 +CTGCCCCGGGTTCCTCATTCTCTGAGAAGATGACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00650.1 ABI SOLiD Barcode Adaptor T-023 +CTGCCCCGGGTTCCTCATTCTCTGCGGTGCTTGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00651.1 ABI SOLiD Barcode Adaptor T-024 +CTGCCCCGGGTTCCTCATTCTCTGGGTCGGTATCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00652.1 ABI SOLiD Barcode Adaptor T-025 +CTGCCCCGGGTTCCTCATTCTCTAACATGATGACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00653.1 ABI SOLiD Barcode Adaptor T-026 +CTGCCCCGGGTTCCTCATTCTCTCGGGAGCCCGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00654.1 ABI SOLiD Barcode Adaptor T-027 +CTGCCCCGGGTTCCTCATTCTCTCAGCAAACTTCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00655.1 ABI SOLiD Barcode Adaptor T-028 +CTGCCCCGGGTTCCTCATTCTCTAGCTTACTACCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00656.1 ABI SOLiD Barcode Adaptor T-029 +CTGCCCCGGGTTCCTCATTCTCTGAATCTAGGGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00657.1 ABI SOLiD Barcode Adaptor T-030 +CTGCCCCGGGTTCCTCATTCTCTGTAGCGAAGACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00658.1 ABI SOLiD Barcode Adaptor T-031 +CTGCCCCGGGTTCCTCATTCTCTGCTGGTGCGTCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00659.1 ABI SOLiD Barcode Adaptor T-032 +CTGCCCCGGGTTCCTCATTCTCTGGTTGGGTGCCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00660.1 ABI SOLiD Barcode Adaptor T-033 +CTGCCCCGGGTTCCTCATTCTCTCGTTGGATACCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00661.1 ABI SOLiD Barcode Adaptor T-034 +CTGCCCCGGGTTCCTCATTCTCTTCGTTAAAGGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00662.1 ABI SOLiD Barcode Adaptor T-035 +CTGCCCCGGGTTCCTCATTCTCTAAGCGTAGGACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00663.1 ABI SOLiD Barcode Adaptor T-036 +CTGCCCCGGGTTCCTCATTCTCTGTTCTCACATCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00664.1 ABI SOLiD Barcode Adaptor T-037 +CTGCCCCGGGTTCCTCATTCTCTCTGTTATACCCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00665.1 ABI SOLiD Barcode Adaptor T-038 +CTGCCCCGGGTTCCTCATTCTCTGTCGTCTTAGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00666.1 ABI SOLiD Barcode Adaptor T-039 +CTGCCCCGGGTTCCTCATTCTCTTATCGTGAGTCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00667.1 ABI SOLiD Barcode Adaptor T-040 +CTGCCCCGGGTTCCTCATTCTCTAAAAGGGTTACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00668.1 ABI SOLiD Barcode Adaptor T-041 +CTGCCCCGGGTTCCTCATTCTCTTGTGGGATTGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00669.1 ABI SOLiD Barcode Adaptor T-042 +CTGCCCCGGGTTCCTCATTCTCTGAATGTACTACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00670.1 ABI SOLiD Barcode Adaptor T-043 +CTGCCCCGGGTTCCTCATTCTCTCGCTAGGGTTCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00671.1 ABI SOLiD Barcode Adaptor T-044 +CTGCCCCGGGTTCCTCATTCTCTAAGGATGATCCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00672.1 ABI SOLiD Barcode Adaptor T-045 +CTGCCCCGGGTTCCTCATTCTCTGTACTTGGCTCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00673.1 ABI SOLiD Barcode Adaptor T-046 +CTGCCCCGGGTTCCTCATTCTCTGGTCGTCGAACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00674.1 ABI SOLiD Barcode Adaptor T-047 +CTGCCCCGGGTTCCTCATTCTCTGAGGGATGGCCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00675.1 ABI SOLiD Barcode Adaptor T-048 +CTGCCCCGGGTTCCTCATTCTCTGCCGTAAGTGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00676.1 ABI SOLiD Barcode Adaptor T-049 +CTGCCCCGGGTTCCTCATTCTCTATGTCATAAGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00677.1 ABI SOLiD Barcode Adaptor T-050 +CTGCCCCGGGTTCCTCATTCTCTGAAGGCTTGCCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00678.1 ABI SOLiD Barcode Adaptor T-051 +CTGCCCCGGGTTCCTCATTCTCTAAGCAGGAGTCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00679.1 ABI SOLiD Barcode Adaptor T-052 +CTGCCCCGGGTTCCTCATTCTCTGTAATTGTAACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00680.1 ABI SOLiD Barcode Adaptor T-053 +CTGCCCCGGGTTCCTCATTCTCTGTCATCAAGTCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00681.1 ABI SOLiD Barcode Adaptor T-054 +CTGCCCCGGGTTCCTCATTCTCTAAAAGGCGGACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00682.1 ABI SOLiD Barcode Adaptor T-055 +CTGCCCCGGGTTCCTCATTCTCTAGCTTAAGCGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00683.1 ABI SOLiD Barcode Adaptor T-056 +CTGCCCCGGGTTCCTCATTCTCTGCATGTCACCCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00684.1 ABI SOLiD Barcode Adaptor T-057 +CTGCCCCGGGTTCCTCATTCTCTCTAGTAAGAACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00685.1 ABI SOLiD Barcode Adaptor T-058 +CTGCCCCGGGTTCCTCATTCTCTTAAAGTGGCGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00686.1 ABI SOLiD Barcode Adaptor T-059 +CTGCCCCGGGTTCCTCATTCTCTAAGTAATGTCCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00687.1 ABI SOLiD Barcode Adaptor T-060 +CTGCCCCGGGTTCCTCATTCTCTGTGCCTCGGTCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00688.1 ABI SOLiD Barcode Adaptor T-061 +CTGCCCCGGGTTCCTCATTCTCTAAGATTATCGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00689.1 ABI SOLiD Barcode Adaptor T-062 +CTGCCCCGGGTTCCTCATTCTCTAGGTGAGGGTCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00690.1 ABI SOLiD Barcode Adaptor T-063 +CTGCCCCGGGTTCCTCATTCTCTGCGGGTTCGACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00691.1 ABI SOLiD Barcode Adaptor T-064 +CTGCCCCGGGTTCCTCATTCTCTGTGCTACACCCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00692.1 ABI SOLiD Barcode Adaptor T-065 +CTGCCCCGGGTTCCTCATTCTCTGGGATCAAGCCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00693.1 ABI SOLiD Barcode Adaptor T-066 +CTGCCCCGGGTTCCTCATTCTCTGATGTAATGTCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00694.1 ABI SOLiD Barcode Adaptor T-067 +CTGCCCCGGGTTCCTCATTCTCTGTCCTTAGGGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00695.1 ABI SOLiD Barcode Adaptor T-068 +CTGCCCCGGGTTCCTCATTCTCTGCATTGACGACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00696.1 ABI SOLiD Barcode Adaptor T-069 +CTGCCCCGGGTTCCTCATTCTCTGATATGCTTTCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00697.1 ABI SOLiD Barcode Adaptor T-070 +CTGCCCCGGGTTCCTCATTCTCTGCCCTACAGACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00698.1 ABI SOLiD Barcode Adaptor T-071 +CTGCCCCGGGTTCCTCATTCTCTACAGGGAACGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00699.1 ABI SOLiD Barcode Adaptor T-072 +CTGCCCCGGGTTCCTCATTCTCTAAGTGAATACCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00700.1 ABI SOLiD Barcode Adaptor T-073 +CTGCCCCGGGTTCCTCATTCTCTGCAATGACGTCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00701.1 ABI SOLiD Barcode Adaptor T-074 +CTGCCCCGGGTTCCTCATTCTCTAGGACGCTGACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00702.1 ABI SOLiD Barcode Adaptor T-075 +CTGCCCCGGGTTCCTCATTCTCTGTATCTGGGCCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00703.1 ABI SOLiD Barcode Adaptor T-076 +CTGCCCCGGGTTCCTCATTCTCTAAGTTTTAGGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00704.1 ABI SOLiD Barcode Adaptor T-077 +CTGCCCCGGGTTCCTCATTCTCTATCTGGTCTTCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00705.1 ABI SOLiD Barcode Adaptor T-078 +CTGCCCCGGGTTCCTCATTCTCTGGCAATCATCCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00706.1 ABI SOLiD Barcode Adaptor T-079 +CTGCCCCGGGTTCCTCATTCTCTAGTAGAATTACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00707.1 ABI SOLiD Barcode Adaptor T-080 +CTGCCCCGGGTTCCTCATTCTCTGTTTACGGTGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00708.1 ABI SOLiD Barcode Adaptor T-081 +CTGCCCCGGGTTCCTCATTCTCTGAACGTCATTCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00709.1 ABI SOLiD Barcode Adaptor T-082 +CTGCCCCGGGTTCCTCATTCTCTGTGAAGGGAGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00710.1 ABI SOLiD Barcode Adaptor T-083 +CTGCCCCGGGTTCCTCATTCTCTGGATGGCGTACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00711.1 ABI SOLiD Barcode Adaptor T-084 +CTGCCCCGGGTTCCTCATTCTCTGCGGATGAACCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00712.1 ABI SOLiD Barcode Adaptor T-085 +CTGCCCCGGGTTCCTCATTCTCTGGAAAGCGTTCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00713.1 ABI SOLiD Barcode Adaptor T-086 +CTGCCCCGGGTTCCTCATTCTCTAGTACCAGGACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00714.1 ABI SOLiD Barcode Adaptor T-087 +CTGCCCCGGGTTCCTCATTCTCTATAGCAAAGCCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00715.1 ABI SOLiD Barcode Adaptor T-088 +CTGCCCCGGGTTCCTCATTCTCTGTTGATCATGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00716.1 ABI SOLiD Barcode Adaptor T-089 +CTGCCCCGGGTTCCTCATTCTCTAGGCTGTCTACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00717.1 ABI SOLiD Barcode Adaptor T-090 +CTGCCCCGGGTTCCTCATTCTCTGTGACCTACTCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00718.1 ABI SOLiD Barcode Adaptor T-091 +CTGCCCCGGGTTCCTCATTCTCTGCGTATTGGGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00719.1 ABI SOLiD Barcode Adaptor T-092 +CTGCCCCGGGTTCCTCATTCTCTAAGGGATTACCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00720.1 ABI SOLiD Barcode Adaptor T-093 +CTGCCCCGGGTTCCTCATTCTCTGTTACGATGCCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00721.1 ABI SOLiD Barcode Adaptor T-094 +CTGCCCCGGGTTCCTCATTCTCTATGGGTGTTTCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00722.1 ABI SOLiD Barcode Adaptor T-095 +CTGCCCCGGGTTCCTCATTCTCTGAGTCCGGCACTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00723.1 ABI SOLiD Barcode Adaptor T-096 +CTGCCCCGGGTTCCTCATTCTCTAATCGAAGAGCTGCTGTACGGCCAAGGCGT +>gnl|uv|NGB00724.1 ABI SOLiD Barcode Adaptor A +GCTGTACGGCCAAGGCGCAGCAGCATG +>gnl|uv|NGB00727.1 Illumina Nextera PCR primer i5 index N501 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +AATGATACGGCGACCACCGAGATCTACACTAGATCGCTCGTCGGCAGCGTC +>gnl|uv|NGB00728.1 Illumina Nextera PCR primer i5 index N502 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +AATGATACGGCGACCACCGAGATCTACACCTCTCTATTCGTCGGCAGCGTC +>gnl|uv|NGB00729.1 Illumina Nextera PCR primer i5 index N503 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +AATGATACGGCGACCACCGAGATCTACACTATCCTCTTCGTCGGCAGCGTC +>gnl|uv|NGB00730.1 Illumina Nextera PCR primer i5 index N504 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +AATGATACGGCGACCACCGAGATCTACACAGAGTAGATCGTCGGCAGCGTC +>gnl|uv|NGB00731.1 Illumina Nextera PCR primer i5 index N505 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +AATGATACGGCGACCACCGAGATCTACACGTAAGGAGTCGTCGGCAGCGTC +>gnl|uv|NGB00732.1 Illumina Nextera PCR primer i5 index N506 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +AATGATACGGCGACCACCGAGATCTACACACTGCATATCGTCGGCAGCGTC +>gnl|uv|NGB00733.1 Illumina Nextera PCR primer i5 index N507 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +AATGATACGGCGACCACCGAGATCTACACAAGGAGTATCGTCGGCAGCGTC +>gnl|uv|NGB00734.1 Illumina Nextera PCR primer i5 index N508 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +AATGATACGGCGACCACCGAGATCTACACCTAAGCCTTCGTCGGCAGCGTC +>gnl|uv|NGB00735.1 Illumina Nextera PCR primer i7 index N701 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATTCGCCTTAGTCTCGTGGGCTCGG +>gnl|uv|NGB00736.1 Illumina Nextera PCR primer i7 index N702 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATCTAGTACGGTCTCGTGGGCTCGG +>gnl|uv|NGB00737.1 Illumina Nextera PCR primer i7 index N703 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATTTCTGCCTGTCTCGTGGGCTCGG +>gnl|uv|NGB00738.1 Illumina Nextera PCR primer i7 index N704 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATGCTCAGGAGTCTCGTGGGCTCGG +>gnl|uv|NGB00739.1 Illumina Nextera PCR primer i7 index N705 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATAGGAGTCCGTCTCGTGGGCTCGG +>gnl|uv|NGB00740.1 Illumina Nextera PCR primer i7 index N706 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATCATGCCTAGTCTCGTGGGCTCGG +>gnl|uv|NGB00741.1 Illumina Nextera PCR primer i7 index N707 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATGTAGAGAGGTCTCGTGGGCTCGG +>gnl|uv|NGB00742.1 Illumina Nextera PCR primer i7 index N708 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATCCTCTCTGGTCTCGTGGGCTCGG +>gnl|uv|NGB00743.1 Illumina Nextera PCR primer i7 index N709 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATAGCGTAGCGTCTCGTGGGCTCGG +>gnl|uv|NGB00744.1 Illumina Nextera PCR primer i7 index N710 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATCAGCCTCGGTCTCGTGGGCTCGG +>gnl|uv|NGB00745.1 Illumina Nextera PCR primer i7 index N711 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATTGCCTCTTGTCTCGTGGGCTCGG +>gnl|uv|NGB00746.1 Illumina Nextera PCR primer i7 index N712 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATTCCTCTACGTCTCGTGGGCTCGG +>gnl|uv|NGB00747.1 Illumina TruSeq DNA HT and RNA HT i5 index D501 adapter (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +AATGATACGGCGACCACCGAGATCTACACTATAGCCTACACTCTTTCCCTACACGACGCTCTTCCGATCT +>gnl|uv|NGB00748.1 Illumina TruSeq DNA HT and RNA HT i5 index D502 adapter (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +AATGATACGGCGACCACCGAGATCTACACATAGAGGCACACTCTTTCCCTACACGACGCTCTTCCGATCT +>gnl|uv|NGB00749.1 Illumina TruSeq DNA HT and RNA HT i5 index D503 adapter (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +AATGATACGGCGACCACCGAGATCTACACCCTATCCTACACTCTTTCCCTACACGACGCTCTTCCGATCT +>gnl|uv|NGB00750.1 Illumina TruSeq DNA HT and RNA HT i5 index D504 adapter (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +AATGATACGGCGACCACCGAGATCTACACGGCTCTGAACACTCTTTCCCTACACGACGCTCTTCCGATCT +>gnl|uv|NGB00751.1 Illumina TruSeq DNA HT and RNA HT i5 index D505 adapter (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +AATGATACGGCGACCACCGAGATCTACACAGGCGAAGACACTCTTTCCCTACACGACGCTCTTCCGATCT +>gnl|uv|NGB00752.1 Illumina TruSeq DNA HT and RNA HT i5 index D506 adapter (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +AATGATACGGCGACCACCGAGATCTACACTAATCTTAACACTCTTTCCCTACACGACGCTCTTCCGATCT +>gnl|uv|NGB00753.1 Illumina TruSeq DNA HT and RNA HT i5 index D507 adapter (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +AATGATACGGCGACCACCGAGATCTACACCAGGACGTACACTCTTTCCCTACACGACGCTCTTCCGATCT +>gnl|uv|NGB00754.1 Illumina TruSeq DNA HT and RNA HT i5 index D508 adapter (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +AATGATACGGCGACCACCGAGATCTACACGTACTGACACACTCTTTCCCTACACGACGCTCTTCCGATCT +>gnl|uv|NGB00755.1 Illumina TruSeq DNA HT and RNA HT i7 index D701 adapter (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACATTACTCGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00756.1 Illumina TruSeq DNA HT and RNA HT i7 index D702 adapter (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACTCCGGAGAATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00757.1 Illumina TruSeq DNA HT and RNA HT i7 index D703 adapter (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGCTCATTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00758.1 Illumina TruSeq DNA HT and RNA HT i7 index D704 adapter (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACGAGATTCCATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00759.1 Illumina TruSeq DNA HT and RNA HT i7 index D705 adapter (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACATTCAGAAATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00760.1 Illumina TruSeq DNA HT and RNA HT i7 index D706 adapter (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACGAATTCGTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00761.1 Illumina TruSeq DNA HT and RNA HT i7 index D707 adapter (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACCTGAAGCTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00762.1 Illumina TruSeq DNA HT and RNA HT i7 index D708 adapter (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACTAATGCGCATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00763.1 Illumina TruSeq DNA HT and RNA HT i7 index D709 adapter (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGGCTATGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00764.1 Illumina TruSeq DNA HT and RNA HT i7 index D710 adapter (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACTCCGCGAAATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00765.1 Illumina TruSeq DNA HT and RNA HT i7 index D711 adapter (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACTCTCGCGCATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00766.1 Illumina TruSeq DNA HT and RNA HT i7 index D712 adapter (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGCGATAGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00767.1 Illumina TruSeq Adapter Index 1 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00768.1 Illumina TruSeq Adapter Index 2 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00769.1 Illumina TruSeq Adapter Index 3 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00770.1 Illumina TruSeq Adapter Index 4 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00771.1 Illumina TruSeq Adapter Index 5 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00772.1 Illumina TruSeq Adapter Index 6 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00773.1 Illumina TruSeq Adapter Index 7 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACCAGATCATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00774.1 Illumina TruSeq Adapter Index 8 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACACTTGAATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00775.1 Illumina TruSeq Adapter Index 9 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACGATCAGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00776.1 Illumina TruSeq Adapter Index 10 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACTAGCTTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00777.1 Illumina TruSeq Adapter Index 11 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACGGCTACATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00778.1 Illumina TruSeq Adapter Index 12 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACCTTGTAATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00779.1 Illumina TruSeq Adapter Index 13 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTCAACAATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00780.1 Illumina TruSeq Adapter Index 14 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTTCCGTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00781.1 Illumina TruSeq Adapter Index 15 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACATGTCAGAATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00782.1 Illumina TruSeq Adapter Index 16 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACCCGTCCCGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00783.1 Illumina TruSeq Adapter Index 18 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACGTCCGCACATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00784.1 Illumina TruSeq Adapter Index 19 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACGTGAAACGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00785.1 Illumina TruSeq Adapter Index 20 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACGTGGCCTTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00786.1 Illumina TruSeq Adapter Index 21 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACGTTTCGGAATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00787.1 Illumina TruSeq Adapter Index 22 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGTACGTAATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00788.1 Illumina TruSeq Adapter Index 23 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACGAGTGGATATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00789.1 Illumina TruSeq Adapter Index 25 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACACTGATATATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00790.1 Illumina TruSeq Adapter Index 27 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GATCGGAAGAGCACACGTCTGAACTCCAGTCACATTCCTTTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB00791.1 Illumina TruSeq Small RNA Sample Prep Kit Stop Oligo (STP) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GAAUUCCACCACGUUCCCGUGG +>gnl|uv|NGB00792.1 Illumina TruSeq Small RNA Sample Prep Kit RNA RT Primer (RTP) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +GCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00793.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer (RP1) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +AATGATACGGCGACCACCGAGATCTACACGTTCAGAGTTCTACAGTCCGA +>gnl|uv|NGB00794.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 1 (RPI1) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00795.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 2 (RPI2) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATACATCGGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00796.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 3 (RPI3) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATGCCTAAGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00797.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 4 (RPI4) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATTGGTCAGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00798.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 5 (RPI5) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATCACTGTGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00799.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 6 (RPI6) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATATTGGCGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00800.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 7 (RPI7) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATGATCTGGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00801.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 8 (RPI8) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATTCAAGTGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00802.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 9 (RPI9) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATCTGATCGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00803.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 10 (RPI10) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATAAGCTAGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00804.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 11 (RPI11) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATGTAGCCGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00805.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 12 (RPI12) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATTACAAGGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00806.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 13 (RPI13) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATTTGACTGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00807.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 14 (RPI14) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATGGAACTGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00808.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 15 (RPI15) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATTGACATGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00809.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 16 (RPI16) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATGGACGGGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00810.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 17 (RPI17) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATCTCTACGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00811.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 18 (RPI18) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATGCGGACGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00812.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 19 (RPI19) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATTTTCACGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00813.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 20 (RPI20) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATGGCCACGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00814.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 21 (RPI21) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATCGAAACGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00815.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 22 (RPI22) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATCGTACGGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00816.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 23 (RPI23) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATCCACTCGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00817.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 24 (RPI24) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATGCTACCGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00818.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 25 (RPI25) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATATCAGTGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00819.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 26 (RPI26) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATGCTCATGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00820.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 27 (RPI27) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATAGGAATGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00821.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 28 (RPI28) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATCTTTTGGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00822.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 29 (RPI29) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATTAGTTGGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00823.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 30 (RPI30) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATCCGGTGGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00824.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 31 (RPI31) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATATCGTGGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00825.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 32 (RPI32) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATTGAGTGGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00826.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 33 (RPI33) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATCGCCTGGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00827.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 34 (RPI34) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATGCCATGGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00828.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 35 (RPI35) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATAAAATGGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00829.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 36 (RPI36) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATTGTTGGGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00830.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 37 (RPI37) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATATTCCGGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00831.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 38 (RPI38) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATAGCTAGGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00832.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 39 (RPI39) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATGTATAGGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00833.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 40 (RPI40) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATTCTGAGGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00834.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 41 (RPI41) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATGTCGTCGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00835.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 42 (RPI42) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATCGATTAGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00836.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 43 (RPI43) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATGCTGTAGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00837.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 44 (RPI44) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATATTATAGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00838.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 45 (RPI45) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATGAATGAGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00839.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 46 (RPI46) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATTCGGGAGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00840.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 47 (RPI47) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATCTTCGAGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00841.1 Illumina TruSeq Small RNA Sample Prep Kit RNA PCR Primer Index 48 (RPI48) (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CAAGCAGAAGACGGCATACGAGATTGCCGAGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA +>gnl|uv|NGB00844.1 Epicentre BiotechnologiesNextera DNA Sample Prep Kit Adaptor (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +AATGATACGGCGACCACCGAGATCTACACGCCTCCCTCGCGCCATCAG +>gnl|uv|NGB00845.1 Epicentre Biotechnologies Nextera DNA Sample Prep Kit Adaptor, following the barcode (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +CGGTCTGCCTTGCCAGCCCGCTCAG +>gnl|uv|NGB00846.1 NEBNext Adaptor for Illumina +GATCGGAAGAGCACACGTCTGAACTCCAGTCTACACTCTTTCCCTACACGACGCTCTTCCGATCT +>gnl|uv|NGB00847.1 NEBNext Index 1 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00848.1 NEBNext Index 2 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATACATCGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00849.1 NEBNext Index 3 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATGCCTAAGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00850.1 NEBNext Index 4 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATTGGTCAGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00851.1 NEBNext Index 5 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATCACTGTGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00852.1 NEBNext Index 6 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATATTGGCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00853.1 NEBNext Index 7 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATGATCTGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00854.1 NEBNext Index 8 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATTCAAGTGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00855.1 NEBNext Index 9 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATCTGATCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00856.1 NEBNext Index 10 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATAAGCTAGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00857.1 NEBNext Index 11 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATGTAGCCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00858.1 NEBNext Index 12 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATTACAAGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00859.1 NEBNext Index 13 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATTGTTGACTGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00860.1 NEBNext Index 14 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATACGGAACTGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00861.1 NEBNext Index 15 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATTCTGACATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00862.1 NEBNext Index 16 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATCGGGACGGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00863.1 NEBNext Index 18 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATGTGCGGACGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00864.1 NEBNext Index 19 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATCGTTTCACGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00865.1 NEBNext Index 20 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATAAGGCCACGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00866.1 NEBNext Index 21 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATTCCGAAACGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00867.1 NEBNext Index 22 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATTACGTACGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00868.1 NEBNext Index 23 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATATCCACTCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00869.1 NEBNext Index 25 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATATATCAGTGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00870.1 NEBNext Index 27 Primer for Illumina +CAAGCAGAAGACGGCATACGAGATAAAGGAATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT +>gnl|uv|NGB00871.1 Ion Xpress A Adapter +AACCATCTCATCCCTGCGTGTCTCCGACTCAG +>gnl|uv|NGB00872.1 Ion Xpress P1 Adapter +AACCACTACGCCTCCGCTTTCCTCTCTATGGGCAGTCGGTGAT +>gnl|uv|NGB00873.1 Ion Xpress Barcode 1 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTAAGGTAACGAT +>gnl|uv|NGB00874.1 Ion Xpress Barcode 2 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTAAGGAGAACGAT +>gnl|uv|NGB00875.1 Ion Xpress Barcode 3 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGAAGAGGATTCGAT +>gnl|uv|NGB00876.1 Ion Xpress Barcode 4 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTACCAAGATCGAT +>gnl|uv|NGB00877.1 Ion Xpress Barcode 5 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCAGAAGGAACGAT +>gnl|uv|NGB00878.1 Ion Xpress Barcode 6 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTGCAAGTTCGAT +>gnl|uv|NGB00879.1 Ion Xpress Barcode 7 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTTCGTGATTCGAT +>gnl|uv|NGB00880.1 Ion Xpress Barcode 8 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTTCCGATAACGAT +>gnl|uv|NGB00881.1 Ion Xpress Barcode 9 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGAGCGGAACGAT +>gnl|uv|NGB00882.1 Ion Xpress Barcode 10 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTGACCGAACGAT +>gnl|uv|NGB00883.1 Ion Xpress Barcode 11 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCCTCGAATCGAT +>gnl|uv|NGB00884.1 Ion Xpress Barcode 12 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTAGGTGGTTCGAT +>gnl|uv|NGB00885.1 Ion Xpress Barcode 13 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCTAACGGACGAT +>gnl|uv|NGB00886.1 Ion Xpress Barcode 14 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTTGGAGTGTCGAT +>gnl|uv|NGB00887.1 Ion Xpress Barcode 15 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCTAGAGGTCGAT +>gnl|uv|NGB00888.1 Ion Xpress Barcode 16 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCTGGATGACGAT +>gnl|uv|NGB00889.1 Ion Xpress Barcode 17 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCTATTCGTCGAT +>gnl|uv|NGB00890.1 Ion Xpress Barcode 18 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGAGGCAATTGCGAT +>gnl|uv|NGB00891.1 Ion Xpress Barcode 19 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTTAGTCGGACGAT +>gnl|uv|NGB00892.1 Ion Xpress Barcode 20 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCAGATCCATCGAT +>gnl|uv|NGB00893.1 Ion Xpress Barcode 21 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCGCAATTACGAT +>gnl|uv|NGB00894.1 Ion Xpress Barcode 22 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTTCGAGACGCGAT +>gnl|uv|NGB00895.1 Ion Xpress Barcode 23 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGCCACGAACGAT +>gnl|uv|NGB00896.1 Ion Xpress Barcode 24 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGAACCTCATTCGAT +>gnl|uv|NGB00897.1 Ion Xpress Barcode 25 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCCTGAGATACGAT +>gnl|uv|NGB00898.1 Ion Xpress Barcode 26 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTTACAACCTCGAT +>gnl|uv|NGB00899.1 Ion Xpress Barcode 27 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGAACCATCCGCGAT +>gnl|uv|NGB00900.1 Ion Xpress Barcode 28 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGATCCGGAATCGAT +>gnl|uv|NGB00901.1 Ion Xpress Barcode 29 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCGACCACTCGAT +>gnl|uv|NGB00902.1 Ion Xpress Barcode 30 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGAGGTTATCGAT +>gnl|uv|NGB00903.1 Ion Xpress Barcode 31 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCCAAGCTGCGAT +>gnl|uv|NGB00904.1 Ion Xpress Barcode 32 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCTTACACACGAT +>gnl|uv|NGB00905.1 Ion Xpress Barcode 33 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTTCTCATTGAACGAT +>gnl|uv|NGB00906.1 Ion Xpress Barcode 34 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCGCATCGTTCGAT +>gnl|uv|NGB00907.1 Ion Xpress Barcode 35 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTAAGCCATTGTCGAT +>gnl|uv|NGB00908.1 Ion Xpress Barcode 36 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGAAGGAATCGTCGAT +>gnl|uv|NGB00909.1 Ion Xpress Barcode 37 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTTGAGAATGTCGAT +>gnl|uv|NGB00910.1 Ion Xpress Barcode 38 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGGAGGACGGACGAT +>gnl|uv|NGB00911.1 Ion Xpress Barcode 39 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTAACAATCGGCGAT +>gnl|uv|NGB00912.1 Ion Xpress Barcode 40 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTGACATAATCGAT +>gnl|uv|NGB00913.1 Ion Xpress Barcode 41 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTTCCACTTCGCGAT +>gnl|uv|NGB00914.1 Ion Xpress Barcode 42 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGAGCACGAATCGAT +>gnl|uv|NGB00915.1 Ion Xpress Barcode 43 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTTGACACCGCGAT +>gnl|uv|NGB00916.1 Ion Xpress Barcode 44 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTTGGAGGCCAGCGAT +>gnl|uv|NGB00917.1 Ion Xpress Barcode 45 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGGAGCTTCCTCGAT +>gnl|uv|NGB00918.1 Ion Xpress Barcode 46 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCAGTCCGAACGAT +>gnl|uv|NGB00919.1 Ion Xpress Barcode 47 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTAAGGCAACCACGAT +>gnl|uv|NGB00920.1 Ion Xpress Barcode 48 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTTCTAAGAGACGAT +>gnl|uv|NGB00921.1 Ion Xpress Barcode 49 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCCTAACATAACGAT +>gnl|uv|NGB00922.1 Ion Xpress Barcode 50 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGGACAATGGCGAT +>gnl|uv|NGB00923.1 Ion Xpress Barcode 51 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTTGAGCCTATTCGAT +>gnl|uv|NGB00924.1 Ion Xpress Barcode 52 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCCGCATGGAACGAT +>gnl|uv|NGB00925.1 Ion Xpress Barcode 53 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTGGCAATCCTCGAT +>gnl|uv|NGB00926.1 Ion Xpress Barcode 54 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCCGGAGAATCGCGAT +>gnl|uv|NGB00927.1 Ion Xpress Barcode 55 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCCACCTCCTCGAT +>gnl|uv|NGB00928.1 Ion Xpress Barcode 56 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCAGCATTAATTCGAT +>gnl|uv|NGB00929.1 Ion Xpress Barcode 57 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCTGGCAACGGCGAT +>gnl|uv|NGB00930.1 Ion Xpress Barcode 58 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCCTAGAACACGAT +>gnl|uv|NGB00931.1 Ion Xpress Barcode 59 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCCTTGATGTTCGAT +>gnl|uv|NGB00932.1 Ion Xpress Barcode 60 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCTAGCTCTTCGAT +>gnl|uv|NGB00933.1 Ion Xpress Barcode 61 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCACTCGGATCGAT +>gnl|uv|NGB00934.1 Ion Xpress Barcode 62 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTTCCTGCTTCACGAT +>gnl|uv|NGB00935.1 Ion Xpress Barcode 63 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCCTTAGAGTTCGAT +>gnl|uv|NGB00936.1 Ion Xpress Barcode 64 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTGAGTTCCGACGAT +>gnl|uv|NGB00937.1 Ion Xpress Barcode 65 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCCTGGCACATCGAT +>gnl|uv|NGB00938.1 Ion Xpress Barcode 66 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCCGCAATCATCGAT +>gnl|uv|NGB00939.1 Ion Xpress Barcode 67 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTTCCTACCAGTCGAT +>gnl|uv|NGB00940.1 Ion Xpress Barcode 68 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCAAGAAGTTCGAT +>gnl|uv|NGB00941.1 Ion Xpress Barcode 69 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTTCAATTGGCGAT +>gnl|uv|NGB00942.1 Ion Xpress Barcode 70 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCCTACTGGTCGAT +>gnl|uv|NGB00943.1 Ion Xpress Barcode 71 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTGAGGCTCCGACGAT +>gnl|uv|NGB00944.1 Ion Xpress Barcode 72 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGAAGGCCACACGAT +>gnl|uv|NGB00945.1 Ion Xpress Barcode 73 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCTGCCTGTCGAT +>gnl|uv|NGB00946.1 Ion Xpress Barcode 74 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGATCGGTTCGAT +>gnl|uv|NGB00947.1 Ion Xpress Barcode 75 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCAGGAATACGAT +>gnl|uv|NGB00948.1 Ion Xpress Barcode 76 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGGAAGAACCTCGAT +>gnl|uv|NGB00949.1 Ion Xpress Barcode 77 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGAAGCGATTCGAT +>gnl|uv|NGB00950.1 Ion Xpress Barcode 78 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCAGCCAATTCTCGAT +>gnl|uv|NGB00951.1 Ion Xpress Barcode 79 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCCTGGTTGTCGAT +>gnl|uv|NGB00952.1 Ion Xpress Barcode 80 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCGAAGGCAGGCGAT +>gnl|uv|NGB00953.1 Ion Xpress Barcode 81 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCCTGCCATTCGCGAT +>gnl|uv|NGB00954.1 Ion Xpress Barcode 82 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTTGGCATCTCGAT +>gnl|uv|NGB00955.1 Ion Xpress Barcode 83 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTAGGACATTCGAT +>gnl|uv|NGB00956.1 Ion Xpress Barcode 84 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTTCCATAACGAT +>gnl|uv|NGB00957.1 Ion Xpress Barcode 85 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCCAGCCTCAACGAT +>gnl|uv|NGB00958.1 Ion Xpress Barcode 86 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTTGGTTATTCGAT +>gnl|uv|NGB00959.1 Ion Xpress Barcode 87 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTTGGCTGGACGAT +>gnl|uv|NGB00960.1 Ion Xpress Barcode 88 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCCGAACACTTCGAT +>gnl|uv|NGB00961.1 Ion Xpress Barcode 89 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCCTGAATCTCGAT +>gnl|uv|NGB00962.1 Ion Xpress Barcode 90 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTAACCACGGCGAT +>gnl|uv|NGB00963.1 Ion Xpress Barcode 91 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGGAAGGATGCGAT +>gnl|uv|NGB00964.1 Ion Xpress Barcode 92 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTAGGAACCGCGAT +>gnl|uv|NGB00965.1 Ion Xpress Barcode 93 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCTTGTCCAATCGAT +>gnl|uv|NGB00966.1 Ion Xpress Barcode 94 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTCCGACAAGCGAT +>gnl|uv|NGB00967.1 Ion Xpress Barcode 95 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGCGGACAGATCGAT +>gnl|uv|NGB00968.1 Ion Xpress Barcode 96 A Adapter +CCATCTCATCCCTGCGTGTCTCCGACTCAGTTAAGCGGTCGAT +>gnl|uv|NGB00969.1 Illumina Single End Apapter 1 (Oligonucleotide sequence copyright 2007-2012 Illumina, Inc. All rights reserved.) +ACACTCTTTCCCTACACGACGCTGTTCCATCT +>gnl|uv|NGB00970.1 ABI SOLiD SAGE Dynabeads Oligo-dT EcoP Primer +CTGATCTAGAGGTACCGGATCCCAGCAGTTTTTTTTTTTTTTTTTTTTTTTTT +>gnl|uv|NGB00971.1 ABI SOLiD SAGE Adapter A +CTGCCCCGGGTTCCTCATTCTCTCAGCAGCATG +>gnl|uv|NGB00972.1 Pacific Biosciences Blunt Adapter +ATCTCTCTCTTTTCCTCCTCCTCCGTTGTTGTTGTTGAGAGAGAT +>gnl|uv|NGB00973.1 Pacific Biosciences C2 Primer +AAAAAAAAAAAAAAAAAATTAACGGAGGAGGAGGA +>gnl|uv|NGB00982.1 Universal primer-dN6 +GCCGGAGCTCTGCAGAATTCNNNNNN +>gnl|uv|NGB00983.1 Whole Transcriptome Amplification 5'-end tag +GTGGTGTGTTGGGTGTGTTTGGNNNNNNNNN +>gnl|uv|NGB01026.1 SISPA primer FR20RV +GCCGGAGCTCTGCAGATATC +>gnl|uv|NGB01029.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT1 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGTTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01030.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT2 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTTTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01031.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT3 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01032.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT4 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCACTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01033.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT5 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGGTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01034.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT6 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATGTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01035.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT7 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCAGATCTGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01036.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT8 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACACTTGATGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01037.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT9 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGATCAGCGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01038.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT10 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTAGCTTGTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01039.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT11 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGGCTACAGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01040.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT12 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCTTGTACTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01041.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT13 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTGGTTGTTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01042.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT14 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCTCGGTTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01043.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT15 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTAAGCGTTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01044.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT16 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCCGTCTTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01045.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT17 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTGTACCTTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01046.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT18 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTTCTGTGTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01047.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT19 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCTGCTGTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01048.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT20 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTTGGAGGTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01049.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT21 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCGAGCGTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01050.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT22 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTGATACGTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01051.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT99 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGTGCTACCATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01052.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT101 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGGTTGGACATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01053.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT25 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTGCGATCTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01054.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT26 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTTCCTGCTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01055.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT27 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTAGTGACTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01056.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT28 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTACAGGATATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01057.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT29 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCCTCAATATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01058.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT30 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTGTGGTTGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01059.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT31 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTAGTCTTGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01060.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT32 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTTCCATTGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01061.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT33 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCGAAGTGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01062.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT34 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTAACGCTGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01063.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT35 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTTGGTATGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01064.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT36 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTGAACTGGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01065.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT102 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGGCACAACATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01066.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT38 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCTCACGGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01067.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT39 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCAGGAGGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01068.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT40 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTAAGTTCGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01069.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT41 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCCAGTCGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01070.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT42 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTGTATGCGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01071.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT43 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCATTGAGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01072.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT44 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTGGCTCAGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01073.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT45 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTATGCCAGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01074.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT46 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCAGATTCATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01075.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT47 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTACTAGTCATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01076.1 Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT48 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTTCAGCTCATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01077.1 Rubicon Genomics ThruPLEX DNA-seq dual-index D701 +AATGATACGGCGACCACCGAGATCTACACATTACTCGACACTCTTTCCCTACACGACGCTCTTCCGATCT +>gnl|uv|NGB01078.1 Rubicon Genomics ThruPLEX DNA-seq dual-index D702 +AATGATACGGCGACCACCGAGATCTACACTCCGGAGAACACTCTTTCCCTACACGACGCTCTTCCGATCT +>gnl|uv|NGB01079.1 Rubicon Genomics ThruPLEX DNA-seq dual-index D703 +AATGATACGGCGACCACCGAGATCTACACCGCTCATTACACTCTTTCCCTACACGACGCTCTTCCGATCT +>gnl|uv|NGB01080.1 Rubicon Genomics ThruPLEX DNA-seq dual-index D704 +AATGATACGGCGACCACCGAGATCTACACGAGATTCCACACTCTTTCCCTACACGACGCTCTTCCGATCT +>gnl|uv|NGB01081.1 Rubicon Genomics ThruPLEX DNA-seq dual-index D705 +AATGATACGGCGACCACCGAGATCTACACATTCAGAAACACTCTTTCCCTACACGACGCTCTTCCGATCT +>gnl|uv|NGB01082.1 Rubicon Genomics ThruPLEX DNA-seq dual-index D706 +AATGATACGGCGACCACCGAGATCTACACGAATTCGTACACTCTTTCCCTACACGACGCTCTTCCGATCT +>gnl|uv|NGB01083.1 Rubicon Genomics ThruPLEX DNA-seq dual-index D707 +AATGATACGGCGACCACCGAGATCTACACCTGAAGCTACACTCTTTCCCTACACGACGCTCTTCCGATCT +>gnl|uv|NGB01084.1 Rubicon Genomics ThruPLEX DNA-seq dual-index D708 +AATGATACGGCGACCACCGAGATCTACACTAATGCGCACACTCTTTCCCTACACGACGCTCTTCCGATCT +>gnl|uv|NGB01085.1 Rubicon Genomics ThruPLEX DNA-seq dual-index D709 +AATGATACGGCGACCACCGAGATCTACACCGGCTATGACACTCTTTCCCTACACGACGCTCTTCCGATCT +>gnl|uv|NGB01086.1 Rubicon Genomics ThruPLEX DNA-seq dual-index D710 +AATGATACGGCGACCACCGAGATCTACACTCCGCGAAACACTCTTTCCCTACACGACGCTCTTCCGATCT +>gnl|uv|NGB01087.1 Rubicon Genomics ThruPLEX DNA-seq dual-index D711 +AATGATACGGCGACCACCGAGATCTACACTCTCGCGCACACTCTTTCCCTACACGACGCTCTTCCGATCT +>gnl|uv|NGB01088.1 Rubicon Genomics ThruPLEX DNA-seq dual-index D712 +AATGATACGGCGACCACCGAGATCTACACAGCGATAGACACTCTTTCCCTACACGACGCTCTTCCGATCT +>gnl|uv|NGB01089.1 Rubicon Genomics ThruPLEX DNA-seq dual-index D501 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTATAGCCTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01090.1 Rubicon Genomics ThruPLEX DNA-seq dual-index D502 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACATAGAGGCATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01091.1 Rubicon Genomics ThruPLEX DNA-seq dual-index D503 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCCTATCCTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01092.1 Rubicon Genomics ThruPLEX DNA-seq dual-index D504 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGGCTCTGAATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01093.1 Rubicon Genomics ThruPLEX DNA-seq dual-index D505 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACAGGCGAAGATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01094.1 Rubicon Genomics ThruPLEX DNA-seq dual-index D506 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTAATCTTAATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01095.1 Rubicon Genomics ThruPLEX DNA-seq dual-index D507 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCAGGACGTATCTCGTATGCCGTCTTCTGCTTG +>gnl|uv|NGB01096.1 Rubicon Genomics ThruPLEX DNA-seq dual-index D508 +AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGTACTGACATCTCGTATGCCGTCTTCTGCTTG
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/assets/dummy_file.txt Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,1 @@ +DuMmY
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/assets/dummy_file2.txt Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,1 @@ +DuMmY
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/bin/check_samplesheet.py Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,188 @@ +#!/usr/bin/env python3 + +import argparse +import errno +import os +import sys + + +def parse_args(args=None): + Description = "Reformat samplesheet file and check its contents." + Epilog = "Example usage: python check_samplesheet.py <FILE_IN> <FILE_OUT>" + + parser = argparse.ArgumentParser(description=Description, epilog=Epilog) + parser.add_argument("FILE_IN", help="Input samplesheet file.") + parser.add_argument("FILE_OUT", help="Output file.") + return parser.parse_args(args) + + +def make_dir(path): + if len(path) > 0: + try: + os.makedirs(path) + except OSError as exception: + if exception.errno != errno.EEXIST: + raise exception + + +def print_error(error, context="Line", context_str=""): + error_str = f"ERROR: Please check samplesheet -> {error}" + if context != "" and context_str != "": + error_str = f"ERROR: Please check samplesheet -> {error}\n{context.strip()}: '{context_str.strip()}'" + print(error_str) + sys.exit(1) + + +def check_samplesheet(file_in, file_out): + """ + This function checks that the samplesheet follows the following structure: + + sample,fq1,fq2,strandedness + SAMPLE_PE,SAMPLE_PE_RUN1_1.fastq.gz,SAMPLE_PE_RUN1_2.fastq.gz,forward + SAMPLE_PE,SAMPLE_PE_RUN2_1.fastq.gz,SAMPLE_PE_RUN2_2.fastq.gz,forward + SAMPLE_SE,SAMPLE_SE_RUN1_1.fastq,,forward + SAMPLE_SE,SAMPLE_SE_RUN1_2.fastq.gz,,forward + + For an example see: + https://github.com/nf-core/test-datasets/blob/rnaseq/samplesheet/v3.1/samplesheet_test.csv + """ + + sample_mapping_dict = {} + with open(file_in, "r", encoding="utf-8-sig") as fin: + + ## Check header + MIN_COLS = 3 + HEADER = ["sample", "fq1", "fq2", "strandedness"] + header = [x.strip('"') for x in fin.readline().strip().split(",")] + if header[: len(HEADER)] != HEADER: + print( + f"ERROR: Please check samplesheet header -> {','.join(header)} != {','.join(HEADER)}" + ) + sys.exit(1) + + ## Check sample entries + for line in fin: + if line.strip(): + lspl = [x.strip().strip('"') for x in line.strip().split(",")] + + ## Check valid number of columns per row + if len(lspl) < len(HEADER): + print_error( + f"Invalid number of columns (minimum = {len(HEADER)})!", + "Line", + line, + ) + + num_cols = len([x for x in lspl if x]) + if num_cols < MIN_COLS: + print_error( + f"Invalid number of populated columns (minimum = {MIN_COLS})!", + "Line", + line, + ) + + ## Check sample name entries + sample, fq1, fq2, strandedness = lspl[: len(HEADER)] + if sample.find(" ") != -1: + print( + f"WARNING: Spaces have been replaced by underscores for sample: {sample}" + ) + sample = sample.replace(" ", "_") + if not sample: + print_error("Sample entry has not been specified!", "Line", line) + + ## Check FastQ file extension + for fastq in [fq1, fq2]: + if fastq: + if fastq.find(" ") != -1: + print_error("FastQ file contains spaces!", "Line", line) + # if not fastq.endswith(".fastq.gz") and not fastq.endswith(".fq.gz"): + # print_error( + # "FastQ file does not have extension '.fastq.gz' or '.fq.gz'!", + # "Line", + # line, + # ) + + ## Check strandedness + strandednesses = ["unstranded", "forward", "reverse"] + if strandedness: + if strandedness not in strandednesses: + print_error( + f"Strandedness must be one of '{', '.join(strandednesses)}'!", + "Line", + line, + ) + else: + print_error( + f"Strandedness has not been specified! Must be one of {', '.join(strandednesses)}.", + "Line", + line, + ) + + ## Auto-detect paired-end/single-end + sample_info = [] ## [single_end, fq1, fq2, strandedness] + if sample and fq1 and fq2: ## Paired-end short reads + sample_info = ["0", fq1, fq2, strandedness] + elif sample and fq1 and not fq2: ## Single-end short reads + sample_info = ["1", fq1, fq2, strandedness] + else: + print_error( + "Invalid combination of columns provided!", "Line", line + ) + + ## Create sample mapping dictionary = {sample: [[ single_end, fq1, fq2, strandedness ]]} + if sample not in sample_mapping_dict: + sample_mapping_dict[sample] = [sample_info] + else: + if sample_info in sample_mapping_dict[sample]: + print_error( + "Samplesheet contains duplicate rows!", "Line", line + ) + else: + sample_mapping_dict[sample].append(sample_info) + + ## Write validated samplesheet with appropriate columns + if len(sample_mapping_dict) > 0: + out_dir = os.path.dirname(file_out) + make_dir(out_dir) + with open(file_out, "w") as fout: + fout.write( + ",".join(["sample", "single_end", "fq1", "fq2", "strandedness"]) + "\n" + ) + for sample in sorted(sample_mapping_dict.keys()): + + ## Check that multiple runs of the same sample are of the same datatype i.e. single-end / paired-end + if not all( + x[0] == sample_mapping_dict[sample][0][0] + for x in sample_mapping_dict[sample] + ): + print_error( + f"Multiple runs of a sample must be of the same datatype i.e. single-end or paired-end!", + "Sample", + sample, + ) + + ## Check that multiple runs of the same sample are of the same strandedness + if not all( + x[-1] == sample_mapping_dict[sample][0][-1] + for x in sample_mapping_dict[sample] + ): + print_error( + f"Multiple runs of a sample must have the same strandedness!", + "Sample", + sample, + ) + + for idx, val in enumerate(sample_mapping_dict[sample]): + fout.write(",".join([f"{sample}_T{idx+1}"] + val) + "\n") + else: + print_error(f"No entries to process!", "Samplesheet: {file_in}") + + +def main(args=None): + args = parse_args(args) + check_samplesheet(args.FILE_IN, args.FILE_OUT) + + +if __name__ == "__main__": + sys.exit(main())
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/bin/create_fasta_and_lineages.py Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,518 @@ +#!/usr/bin/env python3 + +import argparse +import gzip +import inspect +import logging +import os +import pprint +import re +import shutil +import ssl +import tempfile +from html.parser import HTMLParser +from urllib.request import urlopen + +from Bio import SeqIO +from Bio.Seq import Seq +from Bio.SeqRecord import SeqRecord + + +# Multiple inheritence for pretty printing of help text. +class MultiArgFormatClasses( + argparse.RawTextHelpFormatter, argparse.ArgumentDefaultsHelpFormatter +): + pass + + +# HTMLParser override class to get fna.gz and gbff.gz +class NCBIHTMLParser(HTMLParser): + def __init__(self, *, convert_charrefs: bool = ...) -> None: + super().__init__(convert_charrefs=convert_charrefs) + self.reset() + self.href_data = list() + + def handle_data(self, data): + self.href_data.append(data) + + +# Download organelle FASTA and GenBank file. +def dl_mito_seqs_and_flat_files(url: str, suffix: re, out: os.PathLike) -> os.PathLike: + """ + Method to save .fna.gz and .gbff.gz files for the + RefSeq mitochondrion release. + """ + contxt = ssl.create_default_context() + contxt.check_hostname = False + contxt.verify_mode = ssl.CERT_NONE + + if url == None: + logging.error( + "Please provide the base URL where .fna.gz and .gbff.gz" + + "\nfiles for RefSeq mitochondrion can be found." + ) + exit(1) + + if os.path.exists(out): + for file in os.listdir(out): + file_path = os.path.join(out, file) + + if suffix.match(file_path) and os.path.getsize(file_path) > 0: + logging.info( + f"The required mitochondrion file(s)\n[{os.path.basename(file_path)}]" + + " already exists.\nSkipping download from NCBI..." + + "\nPlease use -f to delete and overwrite." + ) + return file_path + else: + os.makedirs(out) + + html_parser = NCBIHTMLParser() + logging.info(f"Finding latest NCBI RefSeq mitochondrion release at:\n{url}") + + with urlopen(url, context=contxt) as response: + with tempfile.NamedTemporaryFile(delete=False) as tmp_html_file: + shutil.copyfileobj(response, tmp_html_file) + + with open(tmp_html_file.name, "r") as html: + html_parser.feed("".join(html.readlines())) + + file = suffix.search("".join(html_parser.href_data)).group(0) + file_url = "/".join([url, file + ".gz"]) + file_at = os.path.join(out, file) + + logging.info(f"Found NCBI RefSeq mitochondrian file(s):\n{file_url}") + + logging.info(f"Saving to:\n{file_at}") + + with tempfile.NamedTemporaryFile(delete=False) as tmp_gz: + with urlopen(file_url, context=contxt) as response: + tmp_gz.write(response.read()) + + with open(file_at, "w") as fh: + with gzip.open(tmp_gz.name, "rb") as web_gz: + fh.write(web_gz.read().decode("utf-8")) + + html.close() + tmp_gz.close() + tmp_html_file.close() + os.unlink(tmp_gz.name) + os.unlink(tmp_html_file.name) + fh.close() + web_gz.close() + response.close() + + return file_at + + +def get_lineages(csv: os.PathLike, cols: list) -> list: + """ + Parse the output from `ncbitax2lin` tool and + return a dict of lineages where the key is + genusspeciesstrain. + """ + lineages = dict() + if csv == None or not (os.path.exists(csv) or os.path.getsize(csv) > 0): + logging.error( + f"The CSV file [{os.path.basename(csv)}] is empty or does not exist!" + ) + exit(1) + + logging.info(f"Indexing {os.path.basename(csv)}...") + + with open(csv, "r") as csv_fh: + header_cols = csv_fh.readline().strip().split(",") + user_req_cols = [ + tcol_i for tcol_i, tcol in enumerate(header_cols) if tcol in cols + ] + cols_not_found = [tcol for tcol in cols if tcol not in header_cols] + raw_recs = 0 + + if len(cols_not_found) > 0: + logging.error( + f"The following columns do not exist in the" + + f"\nCSV file [ {os.path.basename(csv)} ]:\n" + + "".join(cols_not_found) + ) + exit(1) + elif len(user_req_cols) > 9: + logging.error( + f"Only a total of 9 columns are needed!" + + "\ntax_id,kindom,phylum,class,order,family,genus,species,strain" + ) + exit(1) + + for tax in csv_fh: + raw_recs += 1 + lcols = tax.strip().split(",") + + if bool(lcols[user_req_cols[8]]): + lineages[lcols[user_req_cols[8]]] = ",".join( + [lcols[l] for l in user_req_cols[1:]] + ) + elif bool(lcols[user_req_cols[7]]): + lineages[lcols[user_req_cols[7]]] = ",".join( + [lcols[l] for l in user_req_cols[1:8]] + [str()] + ) + + csv_fh.close() + return lineages, raw_recs + + +def from_genbank(gbk: os.PathLike, min_len: int) -> dict: + """ + Method to parse GenBank file and return + organism to latest accession mapping. + """ + accs2orgs = dict() + + if not (os.path.exists(gbk) or os.path.getsize(gbk) > 0): + logging.info( + f"The GenBank file [{os.path.basename(gbk)}] does not exist" + + "\nor is of size 0." + ) + exit(1) + + logging.info(f"Indexing {os.path.basename(gbk)}...") + + # a = open("./_accs", "w") + for record in SeqIO.parse(gbk, "genbank"): + if len(record.seq) < min_len: + continue + else: + # a.write(f"{record.id}\n") + accs2orgs[record.id] = record.annotations["organism"] + + return accs2orgs + + +def from_genbank_alt(gbk: os.PathLike) -> dict: + """ + Method to parse GenBank file and return + organism to latest accession mapping without + using BioPython's GenBank Scanner + """ + accs2orgs = dict() + accs = dict() + orgs = dict() + acc = False + acc_pat = re.compile(r"^VERSION\s+(.+)") + org_pat = re.compile(r"^\s+ORGANISM\s+(.+)") + + if not (os.path.exists(gbk) or os.path.getsize(gbk) > 0): + logging.info( + f"The GenBank file [{os.path.basename(gbk)}] does not exist" + + "\nor is of size 0." + ) + exit(1) + + logging.info( + f"Indexing {os.path.basename(gbk)} without using\nBioPython's GenBank Scanner..." + ) + + with open(gbk, "r") as gbk_fh: + for line in gbk_fh: + line = line.rstrip() + if line.startswith("VERSION") and acc_pat.match(line): + acc = acc_pat.match(line).group(1) + accs[acc] = 1 + if org_pat.match(line): + if acc and acc not in orgs.keys(): + orgs[acc] = org_pat.match(line).group(1) + elif acc and acc in orgs.keys(): + logging.error(f"Duplicate VERSION line: {acc}") + exit(1) + if len(accs.keys()) != len(orgs.keys()): + logging.error( + f"Got unequal number of organisms ({len(orgs.keys())})\n" + + f"and accessions ({len(accs.keys())})" + ) + exit(1) + else: + for acc in accs.keys(): + if acc not in orgs.keys(): + logging.error(f"ORAGANISM not found for accession: {acc}") + exit(1) + accs2orgs[acc] = orgs[acc] + + gbk_fh.close() + return accs2orgs + + +def write_fasta(seq: str, id: str, basedir: os.PathLike, suffix: str) -> None: + """ + Write sequence with no description to specified file. + """ + SeqIO.write( + SeqRecord(Seq(seq), id=id, description=str()), + os.path.join(basedir, id + suffix), + "fasta", + ) + + +# Main +def main() -> None: + """ + This script takes: + 1. Downloads the RefSeq Mitochrondrial GenBank and FASTA format files. + 2. Takes as input and output .csv.gz or .csv file generated by `ncbitax2lin`. + + and then generates a folder containing individual FASTA sequence files + per organelle, and a corresponding lineage file in CSV format. + """ + + # Set logging. + logging.basicConfig( + format="\n" + + "=" * 55 + + "\n%(asctime)s - %(levelname)s\n" + + "=" * 55 + + "\n%(message)s\n\n", + level=logging.DEBUG, + ) + + # Debug print. + ppp = pprint.PrettyPrinter(width=55) + prog_name = os.path.basename(inspect.stack()[0].filename) + + parser = argparse.ArgumentParser( + prog=prog_name, description=main.__doc__, formatter_class=MultiArgFormatClasses + ) + + required = parser.add_argument_group("required arguments") + + required.add_argument( + "-csv", + dest="csv", + default=False, + required=True, + help="Absolute UNIX path to .csv or .csv.gz file which is generated " + + "\nby the `ncbitax2lin` tool.", + ) + parser.add_argument( + "-cols", + dest="lineage_cols", + default="tax_id,superkingdom,phylum,class,order,family,genus,species,strain", + required=False, + help="Taxonomic lineage will be built using these columns from the output of" + + "\n`ncbitax2lin` tool.", + ) + parser.add_argument( + "-url", + dest="url", + default="https://ftp.ncbi.nlm.nih.gov/refseq/release/mitochondrion", + required=False, + help="Base URL from where NCBI RefSeq mitochondrion files will be downloaded\nfrom.", + ) + parser.add_argument( + "-out", + dest="out_folder", + default=os.path.join(os.getcwd(), "organelles"), + required=False, + help="By default, the output is written to this folder.", + ) + parser.add_argument( + "-f", + dest="force_write_out", + default=False, + action="store_true", + required=False, + help="Force overwrite output directory contents.", + ) + parser.add_argument( + "--fna-suffix", + dest="fna_suffix", + default=".fna", + required=False, + help="Suffix of the individual organelle FASTA files that will be saved.", + ) + parser.add_argument( + "-ml", + dest="fa_min_len", + default=200, + required=False, + help="Minimum length of the FASTA sequence for it to be considered for" + + "\nfurther processing", + ) + parser.add_argument( + "--skip-per-fa", + dest="skip_per_fa", + default=False, + required=False, + action="store_true", + help="Do not generate per sequence FASTA file.", + ) + parser.add_argument( + "--alt-gb-parser", + dest="alt_gb_parser", + default=False, + required=False, + action="store_true", + help="Use alternate GenBank parser instead of BioPython's.", + ) + + # Parse defaults + args = parser.parse_args() + csv = args.csv + out = args.out_folder + overwrite = args.force_write_out + fna_suffix = args.fna_suffix + url = args.url + tax_cols = args.lineage_cols + skip_per_fa = args.skip_per_fa + alt_gb_parser = args.alt_gb_parser + min_len = int(args.fa_min_len) + tcols_pat = re.compile(r"^[\w\,]+?\w$") + mito_fna_suffix = re.compile(r".*?\.genomic\.fna") + mito_gbff_suffix = re.compile(r".*?\.genomic\.gbff") + final_lineages = os.path.join(out, "lineages.csv") + lineages_not_found = os.path.join(out, "lineages_not_found.csv") + base_fasta_dir = os.path.join(out, "fasta") + + # Basic checks + if not overwrite and os.path.exists(out): + logging.warning( + f"Output destination [{os.path.basename(out)}] already exists!" + + "\nPlease use -f to delete and overwrite." + ) + elif overwrite and os.path.exists(out): + logging.info(f"Overwrite requested. Deleting {os.path.basename(out)}...") + shutil.rmtree(out) + + if not tcols_pat.match(tax_cols): + logging.error( + f"Supplied columns' names {tax_cols} should only have words (alphanumeric) separated by a comma." + ) + exit(1) + else: + tax_cols = re.sub("\n", "", tax_cols).split(",") + + # Get .fna and .gbk files + fna = dl_mito_seqs_and_flat_files(url, mito_fna_suffix, out) + gbk = dl_mito_seqs_and_flat_files(url, mito_gbff_suffix, out) + + # Get taxonomy from ncbitax2lin + lineages, raw_recs = get_lineages(csv, tax_cols) + + # Get parsed organisms and latest accession from GenBank file. + if alt_gb_parser: + accs2orgs = from_genbank_alt(gbk) + else: + accs2orgs = from_genbank(gbk, min_len) + + # # Finally, read FASTA and create individual FASTA if lineage exists. + logging.info(f"Creating new sequences and lineages...") + + l_fh = open(final_lineages, "w") + ln_fh = open(lineages_not_found, "w") + l_fh.write( + "identifiers,superkingdom,phylum,class,order,family,genus,species,strain\n" + ) + ln_fh.write("fna_id,gbk_org\n") + passed_lookup = 0 + failed_lookup = 0 + gbk_recs_missing = 0 + skipped_len_short = 0 + + if not os.path.exists(base_fasta_dir): + os.makedirs(base_fasta_dir) + + for record in SeqIO.parse(fna, "fasta"): + if len(record.seq) < min_len: + skipped_len_short += 1 + continue + elif record.id in accs2orgs.keys(): + org_words = accs2orgs[record.id].split(" ") + else: + gbk_recs_missing += 1 + continue + + genus_species = ( + " ".join(org_words[0:2]) if len(org_words) > 2 else " ".join(org_words[0:]) + ) + + if not skip_per_fa: + write_fasta(record.seq, record.id, base_fasta_dir, fna_suffix) + + if record.id in accs2orgs.keys() and accs2orgs[record.id] in lineages.keys(): + l_fh.write(",".join([record.id, lineages[accs2orgs[record.id]]]) + "\n") + passed_lookup += 1 + elif record.id in accs2orgs.keys() and genus_species in lineages.keys(): + if len(org_words) > 2: + l_fh.write( + ",".join( + [ + record.id, + lineages[genus_species].rstrip(","), + accs2orgs[record.id], + ] + ) + + "\n" + ) + else: + l_fh.write(",".join([record.id, lineages[genus_species]]) + "\n") + passed_lookup += 1 + else: + if len(org_words) > 2: + l_fh.write( + ",".join( + [ + record.id, + "", + "", + "", + "", + "", + org_words[0], + org_words[0] + " " + org_words[1], + accs2orgs[record.id], + ] + ) + + "\n" + ) + else: + l_fh.write( + ",".join( + [ + record.id, + "", + "", + "", + "", + "", + org_words[0], + accs2orgs[record.id], + "", + ] + ) + + "\n" + ) + ln_fh.write(",".join([record.id, accs2orgs[record.id]]) + "\n") + failed_lookup += 1 + + logging.info( + f"No. of raw records present in `ncbitax2lin` [{os.path.basename(csv)}]: {raw_recs}" + + f"\nNo. of valid records collected from `ncbitax2lin` [{os.path.basename(csv)}]: {len(lineages.keys())}" + + f"\nNo. of sequences skipped (Sequence length < {min_len}): {skipped_len_short}" + + f"\nNo. of records in FASTA [{os.path.basename(fna)}]: {passed_lookup + failed_lookup}" + + f"\nNo. of records in GenBank [{os.path.basename(gbk)}]: {len(accs2orgs.keys())}" + + f"\nNo. of FASTA records for which new lineages were created: {passed_lookup}" + + f"\nNo. of FASTA records for which only genus, species and/or strain information were created: {failed_lookup}" + + f"\nNo. of FASTA records for which no GenBank records exist: {gbk_recs_missing}" + ) + + if (passed_lookup + failed_lookup) != len(accs2orgs.keys()): + logging.error( + f"The number of FASTA records written [{len(accs2orgs.keys())}]" + + f"\nis not equal to number of lineages created [{passed_lookup + failed_lookup}]!" + ) + exit(1) + else: + logging.info("Succesfully created lineages and FASTA records! Done!!") + + l_fh.close() + ln_fh.close() + + +if __name__ == "__main__": + main()
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/bin/create_mqc_data_table.py Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,153 @@ +#!/usr/bin/env python + +import os +import sys +from textwrap import dedent + +import yaml + + +def main(): + """ + Takes a tab-delimited text file with a mandatory header + column and generates an HTML table. + """ + + args = sys.argv + if len(args) < 2 or len(args) >= 4: + print( + f"\nAt least one argument specifying the *.tblsum file is required.\n" + + "No more than 2 command-line arguments should be passed.\n" + ) + exit(1) + + table_sum_on = str(args[1]).lower() + table_sum_on_file = table_sum_on + ".tblsum.txt" + cell_colors = f"{table_sum_on}.cellcolors.yml" + + if len(args) == 3: + description = str(args[2]) + else: + description = "The results table shown here is a collection from all samples." + + if os.path.exists(cell_colors) and os.path.getsize(cell_colors) > 0: + with open(cell_colors, "r") as cc_yml: + cell_colors = yaml.safe_load(cc_yml) + else: + cell_colors = dict() + + if not ( + os.path.exists(table_sum_on_file) and os.path.getsize(table_sum_on_file) > 0 + ): + exit(0) + + with open(table_sum_on_file, "r") as tbl: + header = tbl.readline() + header_cols = header.strip().split("\t") + + html = [ + dedent( + f"""<script type="text/javascript"> + $(document).ready(function () {{ + $('#cpipes-process-custom-res-{table_sum_on}').DataTable({{ + scrollX: true, + fixedColumns: true, dom: 'Bfrtip', + buttons: [ + 'copy', + {{ + extend: 'print', + title: 'CPIPES: MultiQC Report: {table_sum_on}' + }}, + {{ + extend: 'excel', + filename: '{table_sum_on}_results', + }}, + {{ + extend: 'csv', + filename: '{table_sum_on}_results', + }} + ] + }}); + }}); + </script> + <div class="table-responsive"> + <style> + #cpipes-process-custom-res tr:nth-child(even) {{ + background-color: #f2f2f2; + }} + </style> + <table class="table" style="width:100%" id="cpipes-process-custom-res-{table_sum_on}"> + <thead> + <tr>""" + ) + ] + + for header_col in header_cols: + html.append( + dedent( + f""" + <th> {header_col} </th>""" + ) + ) + + html.append( + dedent( + """ + </tr> + </thead> + <tbody>""" + ) + ) + + for row in tbl: + html.append("<tr>\n") + data_cols = row.strip().split("\t") + if len(header_cols) != len(data_cols): + print( + f"\nWARN: Number of header columns ({len(header_cols)}) and data " + + f"columns ({len(data_cols)}) are not equal!\nWill append empty columns!\n" + ) + if len(header_cols) > len(data_cols): + data_cols += (len(header_cols) - len(data_cols)) * " " + print(len(data_cols)) + else: + header_cols += (len(data_cols) - len(header_cols)) * " " + + html.append( + dedent( + f""" + <td><samp>{data_cols[0]}</samp></td> + """ + ) + ) + + for data_col in data_cols[1:]: + data_col_w_color = f"""<td>{data_col}</td> + """ + if ( + table_sum_on in cell_colors.keys() + and data_col in cell_colors[table_sum_on].keys() + ): + data_col_w_color = f"""<td style="background-color: {cell_colors[table_sum_on][data_col]}">{data_col}</td> + """ + html.append(dedent(data_col_w_color)) + html.append("</tr>\n") + html.append("</tbody>\n") + html.append("</table>\n") + html.append("</div>\n") + + mqc_yaml = { + "id": f"{table_sum_on.upper()}_collated_table", + "section_name": f"{table_sum_on.upper()}", + "section_href": f"https://github.com/CFSAN-Biostatistics/nowayout", + "plot_type": "html", + "description": f"{description}", + "data": ("").join(html), + } + + with open(f"{table_sum_on.lower()}_mqc.yml", "w") as html_mqc: + yaml.dump(mqc_yaml, html_mqc, default_flow_style=False) + + +if __name__ == "__main__": + main()
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/bin/fasta_join.pl Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,88 @@ +#!/usr/bin/env perl + +# Kranti Konganti +# Takes in a gzipped multi-fasta file +# and joins contigs by 10 N's + +use strict; +use warnings; +use Cwd; +use Bio::SeqIO; +use Getopt::Long; +use File::Find; +use File::Basename; +use File::Spec::Functions; + +my ( $in_dir, $out_dir, $suffix, @uncatted_genomes ); + +GetOptions( + 'in_dir=s' => \$in_dir, + 'out_dir=s' => \$out_dir, + 'suffix=s' => \$suffix +) or die usage(); + +$in_dir = getcwd if ( !defined $in_dir ); +$out_dir = getcwd if ( !defined $out_dir ); +$suffix = '_genomic.fna.gz' if ( !defined $suffix ); + +find( + { + wanted => sub { + push @uncatted_genomes, $File::Find::name if ( $_ =~ m/$suffix$/ ); + } + }, + $in_dir +); + +if ( $out_dir ne getcwd && !-d $out_dir ) { + mkdir $out_dir || die "\nCannot create directory $out_dir: $!\n\n"; +} + +open( my $geno_path, '>genome_paths.txt' ) + || die "\nCannot open file genome_paths.txt: $!\n\n"; + +foreach my $uncatted_genome_path (@uncatted_genomes) { + my $catted_genome_header = '>' . basename( $uncatted_genome_path, $suffix ); + $catted_genome_header =~ s/(GC[AF]\_\d+\.\d+)\_*.*/$1/; + + my $catted_genome = + catfile( $out_dir, $catted_genome_header . '_scaffolded' . $suffix ); + + $catted_genome =~ s/\/\>(GC[AF])/\/$1/; + + print $geno_path "$catted_genome\n"; + + open( my $fh, "gunzip -c $uncatted_genome_path |" ) + || die "\nCannot create pipe for $uncatted_genome_path: $!\n\n"; + + open( my $fho, '|-', "gzip -c > $catted_genome" ) + || die "\nCannot pipe to gzip: $!\n\n"; + + my $seq_obj = Bio::SeqIO->new( + -fh => $fh, + -format => 'Fasta' + ); + + my $joined_seq = ''; + while ( my $seq = $seq_obj->next_seq ) { + $joined_seq = $joined_seq . 'NNNNNNNNNN' . $seq->seq; + } + + $joined_seq =~ s/NNNNNNNNNN$//; + $joined_seq =~ s/^NNNNNNNNNN//; + + # $joined_seq =~ s/.{80}\K/\n/g; + # $joined_seq =~ s/\n$//; + print $fho $catted_genome_header, "\n", $joined_seq, "\n"; + + $seq_obj->close(); + close $fh; + close $fho; +} + +sub usage { + print +"\nUsage: $0 [-in IN_DIR] [-ou OUT_DIR] [-su Filename Suffix for Header]\n\n"; + exit; +} +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/bin/fastq_dir_to_samplesheet.py Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,177 @@ +#!/usr/bin/env python3 + +import os +import sys +import glob +import argparse +import re + + +def parse_args(args=None): + Description = "Generate samplesheet from a directory of FastQ files." + Epilog = "Example usage: python fastq_dir_to_samplesheet.py <FASTQ_DIR> <SAMPLESHEET_FILE>" + + parser = argparse.ArgumentParser(description=Description, epilog=Epilog) + parser.add_argument("FASTQ_DIR", help="Folder containing raw FastQ files.") + parser.add_argument("SAMPLESHEET_FILE", help="Output samplesheet file.") + parser.add_argument( + "-st", + "--strandedness", + type=str, + dest="STRANDEDNESS", + default="unstranded", + help="Value for 'strandedness' in samplesheet. Must be one of 'unstranded', 'forward', 'reverse'.", + ) + parser.add_argument( + "-r1", + "--read1_extension", + type=str, + dest="READ1_EXTENSION", + default="_R1_001.fastq.gz", + help="File extension for read 1.", + ) + parser.add_argument( + "-r2", + "--read2_extension", + type=str, + dest="READ2_EXTENSION", + default="_R2_001.fastq.gz", + help="File extension for read 2.", + ) + parser.add_argument( + "-se", + "--single_end", + dest="SINGLE_END", + action="store_true", + help="Single-end information will be auto-detected but this option forces paired-end FastQ files to be treated as single-end so only read 1 information is included in the samplesheet.", + ) + parser.add_argument( + "-sn", + "--sanitise_name", + dest="SANITISE_NAME", + action="store_true", + help="Whether to further sanitise FastQ file name to get sample id. Used in conjunction with --sanitise_name_delimiter and --sanitise_name_index.", + ) + parser.add_argument( + "-sd", + "--sanitise_name_delimiter", + type=str, + dest="SANITISE_NAME_DELIMITER", + default="_", + help="Delimiter to use to sanitise sample name.", + ) + parser.add_argument( + "-si", + "--sanitise_name_index", + type=int, + dest="SANITISE_NAME_INDEX", + default=1, + help="After splitting FastQ file name by --sanitise_name_delimiter all elements before this index (1-based) will be joined to create final sample name.", + ) + return parser.parse_args(args) + + +def fastq_dir_to_samplesheet( + fastq_dir, + samplesheet_file, + strandedness="unstranded", + read1_extension="_R1_001.fastq.gz", + read2_extension="_R2_001.fastq.gz", + single_end=False, + sanitise_name=False, + sanitise_name_delimiter="_", + sanitise_name_index=1, +): + def sanitize_sample(path, extension): + """Retrieve sample id from filename""" + sample = os.path.basename(path).replace(extension, "") + if sanitise_name: + if sanitise_name_index > 0: + sample = sanitise_name_delimiter.join( + os.path.basename(path).split(sanitise_name_delimiter)[ + :sanitise_name_index + ] + ) + # elif sanitise_name_index == -1: + # sample = os.path.basename(path)[ :os.path.basename(path).index('.') ] + return sample + + def get_fastqs(extension): + """ + Needs to be sorted to ensure R1 and R2 are in the same order + when merging technical replicates. Glob is not guaranteed to produce + sorted results. + See also https://stackoverflow.com/questions/6773584/how-is-pythons-glob-glob-ordered + """ + abs_fq_files = glob.glob(os.path.join(fastq_dir, f"**", f"*{extension}"), recursive=True) + return sorted( + [ + fq for _, fq in enumerate(abs_fq_files) if re.match('^((?!undetermined|unclassified|downloads).)*$', fq, flags=re.IGNORECASE) + ] + ) + + read_dict = {} + + ## Get read 1 files + for read1_file in get_fastqs(read1_extension): + sample = sanitize_sample(read1_file, read1_extension) + if sample not in read_dict: + read_dict[sample] = {"R1": [], "R2": []} + read_dict[sample]["R1"].append(read1_file) + + ## Get read 2 files + if not single_end: + for read2_file in get_fastqs(read2_extension): + sample = sanitize_sample(read2_file, read2_extension) + read_dict[sample]["R2"].append(read2_file) + + ## Write to file + if len(read_dict) > 0: + out_dir = os.path.dirname(samplesheet_file) + if out_dir and not os.path.exists(out_dir): + os.makedirs(out_dir) + + with open(samplesheet_file, "w") as fout: + header = ["sample", "fq1", "fq2", "strandedness"] + fout.write(",".join(header) + "\n") + for sample, reads in sorted(read_dict.items()): + for idx, read_1 in enumerate(reads["R1"]): + read_2 = "" + if idx < len(reads["R2"]): + read_2 = reads["R2"][idx] + sample_info = ",".join([sample, read_1, read_2, strandedness]) + fout.write(f"{sample_info}\n") + else: + error_str = ( + "\nWARNING: No FastQ files found so samplesheet has not been created!\n\n" + ) + error_str += "Please check the values provided for the:\n" + error_str += " - Path to the directory containing the FastQ files\n" + error_str += " - '--read1_extension' parameter\n" + error_str += " - '--read2_extension' parameter\n" + print(error_str) + sys.exit(1) + + +def main(args=None): + args = parse_args(args) + + strandedness = "unstranded" + if args.STRANDEDNESS in ["unstranded", "forward", "reverse"]: + strandedness = args.STRANDEDNESS + + fastq_dir_to_samplesheet( + fastq_dir=args.FASTQ_DIR, + samplesheet_file=args.SAMPLESHEET_FILE, + strandedness=strandedness, + read1_extension=args.READ1_EXTENSION, + read2_extension=args.READ2_EXTENSION, + single_end=args.SINGLE_END, + sanitise_name=args.SANITISE_NAME, + sanitise_name_delimiter=args.SANITISE_NAME_DELIMITER, + sanitise_name_index=args.SANITISE_NAME_INDEX, + ) + + +if __name__ == "__main__": + sys.exit(main())
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/bin/gen_otf_genome.py Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,227 @@ +#!/usr/bin/env python3 + +# Kranti Konganti + +import argparse +import glob +import gzip +import inspect +import logging +import os +import pprint +import re + +# Set logging. +logging.basicConfig( + format="\n" + + "=" * 55 + + "\n%(asctime)s - %(levelname)s\n" + + "=" * 55 + + "\n%(message)s\n\n", + level=logging.DEBUG, +) + +# Debug print. +ppp = pprint.PrettyPrinter(width=50, indent=4) + + +# Multiple inheritence for pretty printing of help text. +class MultiArgFormatClasses( + argparse.RawTextHelpFormatter, argparse.ArgumentDefaultsHelpFormatter +): + pass + + +def main() -> None: + """ + This script works only in the context of a Nextflow workflow. + It takes: + 1. A text file containing accessions or FASTA IDs, one per line and + then, + 2. Optionally, searches for a genome FASTA file in gzipped format in specified + search path, where the prefix of the filename is the accession or + FASTA ID from 1. and then, creates a new concatenated gzipped genome FASTA + file with all the genomes in the text file from 1. + 3. Creates a new FASTQ file with reads aligned to the accessions in the text + file from 1. + """ + + prog_name = os.path.basename(inspect.stack()[0].filename) + + parser = argparse.ArgumentParser( + prog=prog_name, description=main.__doc__, formatter_class=MultiArgFormatClasses + ) + + required = parser.add_argument_group("required arguments") + + required.add_argument( + "-txt", + dest="accs_txt", + default=False, + required=True, + help="Absolute UNIX path to .txt file containing accessions\n" + + "FASTA IDs, one per line.", + ) + required.add_argument( + "-op", + dest="out_prefix", + default="CATTED_GENOMES", + required=True, + help="Set the output file prefix for .fna.gz and .txt\n" + "files.", + ) + parser.add_argument( + "-gd", + dest="genomes_dir", + default=False, + required=False, + help="Absolute UNIX path to a directory containing\n" + + "gzipped genome FASTA files or a file.\n", + ) + parser.add_argument( + "-gds", + dest="genomes_dir_suffix", + default="_scaffolded_genomic.fna.gz", + required=False, + help="Genome FASTA file suffix to search for\nin the directory mentioned using\n-gd.", + ) + parser.add_argument( + "-query", + dest="id_is_query", + default=False, + action="store_true", + required=False, + help="In the produced FASTQ file, should the FASTA ID should be of KMA query ID\n" + + "or template ID.", + ) + parser.add_argument( + "-txts", + dest="accs_suffix", + default="_template_hits.txt", + required=False, + help="The suffix of the file supplied with -txt option. It is assumed that the\n" + + "sample name is present in the file supplied with -txt option and the suffix\n" + + "will be stripped and stored in a file that logs samples which have no hits.", + ) + parser.add_argument( + "-frag_delim", + dest="frag_delim", + default="\t", + required=False, + help="The delimitor by which the fields are separated in *_frag.gz file.", + ) + + args = parser.parse_args() + accs_txt = args.accs_txt + genomes_dir = args.genomes_dir + genomes_dir_suffix = args.genomes_dir_suffix + id_is_query = args.id_is_query + out_prefix = args.out_prefix + accs_suffix = args.accs_suffix + frag_delim = args.frag_delim + accs_seen = dict() + cat_genomes_gz = os.path.join(os.getcwd(), out_prefix + "_" + genomes_dir_suffix) + cat_genomes_gz = re.sub("__", "_", str(cat_genomes_gz)) + frags_gz = os.path.join(os.getcwd(), out_prefix + ".frag.gz") + cat_reads_gz = os.path.join(os.getcwd(), out_prefix + "_aln_reads.fna.gz") + cat_reads_gz = re.sub("__", "_", cat_reads_gz) + + if ( + accs_txt + and os.path.exists(cat_genomes_gz) + and os.path.getsize(cat_genomes_gz) > 0 + ): + logging.error( + "A concatenated genome FASTA file,\n" + + f"{os.path.basename(cat_genomes_gz)} already exists in:\n" + + f"{os.getcwd()}\n" + + "Please remove or move it as we will not " + + "overwrite it." + ) + exit(1) + + if accs_txt and (not os.path.exists(accs_txt) or not os.path.getsize(accs_txt) > 0): + logging.error("File,\n" + f"{accs_txt}\ndoes not exist " + "or is empty!") + failed_sample_name = re.sub(accs_suffix, "", os.path.basename(accs_txt)) + with open( + os.path.join(os.getcwd(), "_".join([out_prefix, "FAILED.txt"])), "w" + ) as failed_sample_fh: + failed_sample_fh.write(f"{failed_sample_name}\n") + failed_sample_fh.close() + exit(0) + + # ppp.pprint(mash_hits) + empty_lines = 0 + empty_lines_msg = "" + + with open(accs_txt, "r") as accs_txt_fh: + for line in accs_txt_fh: + if line in ["\n", "\n\r"]: + empty_lines += 1 + continue + else: + line = line.strip() + + if line in accs_seen.keys(): + continue + else: + accs_seen[line] = 1 + accs_txt_fh.close() + + if genomes_dir: + if not os.path.isdir(genomes_dir): + logging.error("UNIX path\n" + f"{genomes_dir}\n" + "does not exist!") + exit(1) + if len(glob.glob(os.path.join(genomes_dir, "*" + genomes_dir_suffix))) <= 0: + logging.error( + "Genomes directory" + + f"{genomes_dir}" + + "\ndoes not seem to have any\n" + + f"files ending with suffix: {genomes_dir_suffix}" + ) + exit(1) + + with open(cat_genomes_gz, "wb") as genomes_out_gz: + for line in accs_seen.keys(): + genome_file = os.path.join(genomes_dir, line + genomes_dir_suffix) + + if not os.path.exists(genome_file) or os.path.getsize(genome_file) <= 0: + logging.error( + f"Genome file {os.path.basename(genome_file)} does not\n" + + "exits or is empty!" + ) + exit(1) + else: + with open(genome_file, "rb") as genome_file_h: + genomes_out_gz.writelines(genome_file_h.readlines()) + genome_file_h.close() + genomes_out_gz.close() + + if ( + len(accs_seen.keys()) > 0 + and os.path.exists(frags_gz) + and os.path.getsize(frags_gz) > 0 + ): + with gzip.open( + cat_reads_gz, "wt", encoding="utf-8", compresslevel=6 + ) as cat_reads_gz_fh: + with gzip.open(frags_gz, "rb", compresslevel=6) as fragz_gz_fh: + fasta_id = 7 if id_is_query else 6 + for frag_line in fragz_gz_fh: + frag_lines = frag_line.decode("utf-8").strip().split(frag_delim) + # Per KMA specification, 6=template, 7=query, 1=read + cat_reads_gz_fh.write(f">{frag_lines[fasta_id]}\n{frag_lines[0]}\n") + fragz_gz_fh.close() + cat_reads_gz_fh.close() + + if empty_lines > 0: + empty_lines_msg = f"Skipped {empty_lines} empty line(s).\n" + + logging.info( + empty_lines_msg + + f"File {os.path.basename(cat_genomes_gz)}\n" + + f"written in:\n{os.getcwd()}\nDone! Bye!" + ) + + +if __name__ == "__main__": + main()
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/bin/gen_per_species_fa_from_bold.py Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,437 @@ +#!/usr/bin/env python3 + +import argparse +import gzip +import inspect +import logging +import os +import pprint +import re +import shutil +from collections import defaultdict +from typing import BinaryIO, TextIO, Union + +from Bio import SeqIO +from Bio.Seq import Seq +from Bio.SeqRecord import SeqRecord + + +# Multiple inheritence for pretty printing of help text. +class MultiArgFormatClasses( + argparse.RawTextHelpFormatter, argparse.ArgumentDefaultsHelpFormatter +): + pass + + +def get_lineages(csv: os.PathLike, cols: list) -> list: + """ + Parse the output from `ncbitax2lin` tool and + return a dict of lineages where the key is + genusspeciesstrain. + """ + lineages = dict() + if csv == None or not (os.path.exists(csv) or os.path.getsize(csv) > 0): + logging.error( + f"The CSV file [{os.path.basename(csv)}] is empty or does not exist!" + ) + exit(1) + + logging.info(f"Indexing {os.path.basename(csv)}...") + + with open(csv, "r") as csv_fh: + header_cols = csv_fh.readline().strip().split(",") + user_req_cols = [ + tcol_i for tcol_i, tcol in enumerate(header_cols) if tcol in cols + ] + cols_not_found = [tcol for tcol in cols if tcol not in header_cols] + raw_recs = 0 + + if len(cols_not_found) > 0: + logging.error( + f"The following columns do not exist in the" + + f"\nCSV file [ {os.path.basename(csv)} ]:\n" + + "".join(cols_not_found) + ) + exit(1) + elif len(user_req_cols) > 9: + logging.error( + f"Only a total of 9 columns are needed!" + + "\ntax_id,kindom,phylum,class,order,family,genus,species,strain" + ) + exit(1) + + for tax in csv_fh: + raw_recs += 1 + lcols = tax.strip().split(",") + + if bool(lcols[user_req_cols[8]]): + lineages[lcols[user_req_cols[8]]] = ",".join( + [lcols[l] for l in user_req_cols[1:]] + ) + elif bool(lcols[user_req_cols[7]]): + lineages[lcols[user_req_cols[7]]] = ",".join( + [lcols[l] for l in user_req_cols[1:8]] + [str()] + ) + + csv_fh.close() + return lineages, raw_recs + + +def write_fasta(recs: list, basedir: os.PathLike, name: str, suffix: str) -> None: + """ + Write sequence with no description to a specified file. + """ + SeqIO.write( + recs, + os.path.join(basedir, name + suffix), + "fasta", + ) + + +def check_and_get_cols(pat: re, cols: str, delim: str) -> list: + """ + Check if header column matches the pattern and return + columns. + """ + if not pat.match(cols): + logging.error( + f"Supplied columns' names {cols} should only have words" + f"\n(alphanumeric) separated by: {delim}." + ) + exit(1) + else: + cols = re.sub("\n", "", cols).split(delim) + + return cols + + +def parse_tsv(fh: Union[TextIO, BinaryIO], tcols: list, delim: str) -> list: + """ + Parse the TSV file and produce the required per + species FASTA's. + """ + records, sp2accs = (defaultdict(list), defaultdict(list)) + header = fh.readline().strip().split(delim) + raw_recs = 0 + + if not all(col in header for col in tcols): + logging.error( + "The following columns were not found in the" + + f"\nheader row of file {os.path.basename(fh.name)}\n" + + "\n".join([ele for ele in tcols if ele not in header]) + ) + + id_i, genus_i, species_i, strain_i, seq_i = [ + i for i, ele in enumerate(header) if ele in tcols + ] + + for record in fh: + raw_recs += 1 + + id = record.strip().split(delim)[id_i] + genus = record.strip().split(delim)[genus_i] + species = re.sub(r"[\/\\]+", "-", record.strip().split(delim)[species_i]) + strain = record.strip().split(delim)[strain_i] + seq = re.sub(r"[^ATGC]+", "", record.strip().split(delim)[seq_i], re.IGNORECASE) + + if re.match(r"None|Null", species, re.IGNORECASE): + continue + + # print(id) + # print(genus) + # print(species) + # print(strain) + # print(seq) + + records.setdefault(species, []).append( + SeqRecord(Seq(seq), id=id, description=str()) + ) + sp2accs.setdefault(species, []).append(id) + + logging.info(f"Collected FASTA records for {len(records.keys())} species'.") + fh.close() + return records, sp2accs, raw_recs + + +# Main +def main() -> None: + """ + This script takes: + 1. The TSV file from BOLD systems, + 2. Takes as input a .csv file generated by `ncbitax2lin`. + + and then generates a folder containing individual FASTA sequence files + per species. This is only possible if the full taxonomy of the barcode + sequence is present in the FASTA header. + """ + + # Set logging. + logging.basicConfig( + format="\n" + + "=" * 55 + + "\n%(asctime)s - %(levelname)s\n" + + "=" * 55 + + "\n%(message)s\r\r", + level=logging.DEBUG, + ) + + # Debug print. + ppp = pprint.PrettyPrinter(width=55) + prog_name = os.path.basename(inspect.stack()[0].filename) + + parser = argparse.ArgumentParser( + prog=prog_name, description=main.__doc__, formatter_class=MultiArgFormatClasses + ) + + required = parser.add_argument_group("required arguments") + + required.add_argument( + "-tsv", + dest="tsv", + default=False, + required=True, + help="Absolute UNIX path to the TSV file from BOLD systems" + + "\nin uncompressed TXT format.", + ) + required.add_argument( + "-csv", + dest="csv", + default=False, + required=True, + help="Absolute UNIX path to .csv or .csv.gz file which is generated " + + "\nby the `ncbitax2lin` tool.", + ) + parser.add_argument( + "-out", + dest="out_folder", + default=os.path.join(os.getcwd(), "species"), + required=False, + help="By default, the output is written to this\nfolder.", + ) + parser.add_argument( + "-f", + dest="force_write_out", + default=False, + action="store_true", + required=False, + help="Force overwrite output directory contents.", + ) + parser.add_argument( + "-suffix", + dest="fna_suffix", + default=".fna", + required=False, + help="Suffix of the individual species FASTA files\nthat will be saved.", + ) + parser.add_argument( + "-ccols", + dest="csv_cols", + default="tax_id,superkingdom,phylum,class,order,family,genus,species,strain", + required=False, + help="Taxonomic lineage will be built using these columns from the output of" + + "\n`ncbitax2lin`\ntool.", + ) + parser.add_argument( + "-ccols-sep", + dest="csv_delim", + default=",", + required=False, + help="The delimitor of the fields in the CSV file.", + ) + parser.add_argument( + "-tcols", + dest="tsv_cols", + default="processid\tgenus\tspecies\tsubspecies\tnucraw", + required=False, + help="For each species, the nucletide sequences will be\naggregated.", + ) + parser.add_argument( + "-tcols-sep", + dest="tsv_delim", + default="\t", + required=False, + help="The delimitor of the fields in the TSV file.", + ) + + # Parse defaults + args = parser.parse_args() + tsv = args.tsv + csv = args.csv + csep = args.csv_delim + tsep = args.tsv_delim + csv_cols = args.csv_cols + tsv_cols = args.tsv_cols + out = args.out_folder + overwrite = args.force_write_out + fna_suffix = args.fna_suffix + ccols_pat = re.compile(f"^[\w\{csep}]+?\w$") + tcols_pat = re.compile(f"^[\w\{tsep}]+?\w$") + final_lineages = os.path.join(out, "lineages.csv") + lineages_not_found = os.path.join(out, "lineages_not_found.csv") + base_fasta_dir = os.path.join(out, "fasta") + + # Basic checks + if not overwrite and os.path.exists(out): + logging.warning( + f"Output destination [{os.path.basename(out)}] already exists!" + + "\nPlease use -f to delete and overwrite." + ) + elif overwrite and os.path.exists(out): + logging.info(f"Overwrite requested. Deleting {os.path.basename(out)}...") + shutil.rmtree(out) + + # Validate user requested columns + passed_ccols = check_and_get_cols(ccols_pat, csv_cols, csep) + passed_tcols = check_and_get_cols(tcols_pat, tsv_cols, tsep) + + # Get taxonomy from ncbitax2lin + lineages, raw_recs = get_lineages(csv, passed_ccols) + + # Finally, read BOLD tsv if lineage exists. + logging.info(f"Creating new squences per species...") + + if not os.path.exists(out): + os.makedirs(out) + + try: + gz_fh = gzip.open(tsv, "rt") + records, sp2accs, traw_recs = parse_tsv(gz_fh, passed_tcols, tsep) + except gzip.BadGzipFile: + logging.info(f"Input TSV file {os.path.basename(tsv)} is not in\nGZIP format.") + txt_fh = open(tsv, "r") + records, sp2accs, traw_recs = parse_tsv(txt_fh, passed_tcols, tsep) + + passed_tax_check = 0 + failed_tax_check = 0 + fasta_recs_written = 0 + l_fh = open(final_lineages, "w") + ln_fh = open(lineages_not_found, "w") + l_fh.write( + "identifiers,superkingdom,phylum,class,order,family,genus,species,strain\n" + ) + ln_fh.write("fna_id,parsed_org\n") + + if not os.path.exists(base_fasta_dir): + os.makedirs(base_fasta_dir) + + for genus_species in records.keys(): + fasta_recs_written += len(records[genus_species]) + write_fasta( + records[genus_species], + base_fasta_dir, + "_".join(genus_species.split(" ")), + fna_suffix, + ) + org_words = genus_species.split(" ") + + for id in sp2accs[genus_species]: + if genus_species in lineages.keys(): + this_line = ",".join([id, lineages[genus_species]]) + "\n" + + if len(org_words) > 2: + this_line = ( + ",".join( + [id, lineages[genus_species].rstrip(","), genus_species] + ) + + "\n" + ) + + l_fh.write(this_line) + passed_tax_check += 1 + else: + this_line = ( + ",".join( + [ + id, + "", + "", + "", + "", + "", + org_words[0], + genus_species, + "", + ] + ) + + "\n" + ) + if len(org_words) > 2: + this_line = ( + ",".join( + [ + id, + "", + "", + "", + "", + "", + org_words[0], + org_words[0] + " " + org_words[1], + genus_species, + ] + ) + + "\n" + ) + l_fh.write(this_line) + ln_fh.write(",".join([id, genus_species]) + "\n") + failed_tax_check += 1 + + logging.info( + f"No. of raw records present in `ncbitax2lin` [{os.path.basename(csv)}]: {raw_recs}" + + f"\nNo. of valid records collected from `ncbitax2lin` [{os.path.basename(csv)}]: {len(lineages.keys())}" + + f"\nNo. of raw records in TSV [{os.path.basename(tsv)}]: {traw_recs}" + + f"\nNo. of valid records in TSV [{os.path.basename(tsv)}]: {passed_tax_check + failed_tax_check}" + + f"\nNo. of FASTA records for which new lineages were created: {passed_tax_check}" + + f"\nNo. of FASTA records for which only genus, species and/or strain information were created: {failed_tax_check}" + ) + + if (passed_tax_check + failed_tax_check) != fasta_recs_written: + logging.error( + f"The number of input FASTA records [{fasta_recs_written}]" + + f"\nis not equal to number of lineages created [{passed_tax_check + failed_tax_check}]!" + ) + exit(1) + else: + logging.info("Succesfully created lineages and FASTA records! Done!!") + + +if __name__ == "__main__": + main() + +# ~/apps/nowayout/bin/gen_per_species_fa_from_bold.py -tsv BOLD_Public.05-Feb-2024.tsv -csv ../tax.csv ─╯ + +# ======================================================= +# 2024-02-08 21:37:28,541 - INFO +# ======================================================= +# Indexing tax.csv... + +# ======================================================= +# 2024-02-08 21:38:06,567 - INFO +# ======================================================= +# Creating new squences per species... + +# ======================================================= +# 2024-02-08 21:38:06,572 - INFO +# ======================================================= +# Input TSV file BOLD_Public.05-Feb-2024.tsv is not in +# GZIP format. + +# ======================================================= +# 2024-02-08 22:01:04,554 - INFO +# ======================================================= +# Collected FASTA records for 497421 species'. + +# ======================================================= +# 2024-02-08 22:24:35,000 - INFO +# ======================================================= +# No. of raw records present in `ncbitax2lin` [tax.csv]: 2550767 +# No. of valid records collected from `ncbitax2lin` [tax.csv]: 2134980 +# No. of raw records in TSV [BOLD_Public.05-Feb-2024.tsv]: 9735210 +# No. of valid records in TSV [BOLD_Public.05-Feb-2024.tsv]: 4988323 +# No. of FASTA records for which new lineages were created: 4069202 +# No. of FASTA records for which only genus, species and/or strain information were created: 919121 + +# ======================================================= +# 2024-02-08 22:24:35,001 - INFO +# ======================================================= +# Succesfully created lineages and FASTA records! Done!!
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/bin/gen_per_species_fa_from_lin.py Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,248 @@ +#!/usr/bin/env python3 + +import argparse +import gzip +import inspect +import logging +import os +import pprint +import re +import shutil +from collections import defaultdict +from typing import BinaryIO, TextIO, Union + +from Bio import SeqIO +from Bio.Seq import Seq +from Bio.SeqRecord import SeqRecord + + +# Multiple inheritence for pretty printing of help text. +class MultiArgFormatClasses( + argparse.RawTextHelpFormatter, argparse.ArgumentDefaultsHelpFormatter +): + pass + + +def get_lineages(csv: os.PathLike) -> defaultdict: + """ + Parse the lineages.csv file and store a list of + accessions. + """ + lineages = dict() + if csv == None or not (os.path.exists(csv) or os.path.getsize(csv) > 0): + logging.error( + f"The CSV file [{os.path.basename(csv)}] is empty or does not exist!" + ) + exit(1) + + logging.info(f"Indexing {os.path.basename(csv)}...") + + with open(csv, "r") as csv_fh: + _ = csv_fh.readline().strip().split(",") + for line in csv_fh: + cols = line.strip().split(",") + + if len(cols) < 9: + logging.error( + f"The CSV file {os.path.basename(csv)} should have a mandatory 9 columns." + + "\n\nEx: identifiers,superkingdom,phylum,class,order,family,genus,species,strain" + + "\nAB211151.1,Eukaryota,Arthropoda,Malacostraca,Decapoda,Majidae,Chionoecetes,Chionoecetes opilio," + + f"\n\nGot:\n{line}" + ) + exit(1) + + lineages[cols[0]] = re.sub(r"\W+", "-", "_".join(cols[7].split(" "))) + + csv_fh.close() + return lineages + + +def write_fasta(recs: list, basedir: os.PathLike, name: str, suffix: str) -> None: + """ + Write sequence with no description to a specified file. + """ + SeqIO.write( + recs, + os.path.join(basedir, name + suffix), + "fasta", + ) + + +def parse_fasta(fh: Union[TextIO, BinaryIO], sp2accs: dict) -> list: + """ + Parse the sequences and create per species FASTA record. + """ + records = defaultdict() + logging.info("") + + for record in SeqIO.parse(fh, "fasta"): + + id = record.id + seq = record.seq + + if id in sp2accs.keys(): + records.setdefault(sp2accs[id], []).append( + SeqRecord(Seq(seq), id=id, description=str()) + ) + else: + print(f"Lineage row does not exist for accession: {id}") + + logging.info(f"Collected FASTA records for {len(records.keys())} species'.") + fh.close() + return records + + +# Main +def main() -> None: + """ + This script takes: + 1. The FASTA file and, + 2. Takes the corresponding lineages.csv file and, + + then generates a folder containing individual FASTA sequence files + per species. + """ + + # Set logging. + logging.basicConfig( + format="\n" + + "=" * 55 + + "\n%(asctime)s - %(levelname)s\n" + + "=" * 55 + + "\n%(message)s\r\r", + level=logging.DEBUG, + ) + + # Debug print. + ppp = pprint.PrettyPrinter(width=55) + prog_name = os.path.basename(inspect.stack()[0].filename) + + parser = argparse.ArgumentParser( + prog=prog_name, description=main.__doc__, formatter_class=MultiArgFormatClasses + ) + + required = parser.add_argument_group("required arguments") + + required.add_argument( + "-fa", + dest="fna", + default=False, + required=True, + help="Absolute UNIX path to the FASTA file that corresponds" + + "\nto the lineages.csv file.", + ) + required.add_argument( + "-csv", + dest="csv", + default=False, + required=True, + help="Absolute UNIX path to lineages.csv which has a guaranteed 9 " + + "\ncolumns with the first being an accession.", + ) + parser.add_argument( + "-out", + dest="out_folder", + default=os.path.join(os.getcwd(), "species"), + required=False, + help="By default, the output is written to this\nfolder.", + ) + parser.add_argument( + "-f", + dest="force_write_out", + default=False, + action="store_true", + required=False, + help="Force overwrite output directory contents.", + ) + parser.add_argument( + "-suffix", + dest="fna_suffix", + default=".fna", + required=False, + help="Suffix of the individual species FASTA files\nthat will be saved.", + ) + + # Parse defaults + args = parser.parse_args() + csv = args.csv + fna = args.fna + out = args.out_folder + overwrite = args.force_write_out + fna_suffix = args.fna_suffix + + # Basic checks + if not overwrite and os.path.exists(out): + logging.warning( + f"Output destination [{os.path.basename(out)}] already exists!" + + "\nPlease use -f to delete and overwrite." + ) + elif overwrite and os.path.exists(out): + logging.info(f"Overwrite requested. Deleting {os.path.basename(out)}...") + shutil.rmtree(out) + + # Get taxonomy from ncbitax2lin + lineages = get_lineages(csv) + + logging.info(f"Creating new squences per species...") + + if not os.path.exists(out): + os.makedirs(out) + + try: + gz_fh = gzip.open(fna, "rt") + fa_recs = parse_fasta(gz_fh, lineages) + except gzip.BadGzipFile: + logging.info( + f"Input FASTA file {os.path.basename(csv)} is not in\nGZIP format." + ) + txt_fh = open(fna, "r") + fa_recs = parse_fasta(txt_fh, lineages) + finally: + logging.info("Assigned FASTA records per species...") + + logging.info("Writing FASTA records per species...") + + for sp in fa_recs.keys(): + write_fasta(fa_recs[sp], out, sp, fna_suffix) + + +if __name__ == "__main__": + main() + +# ~/apps/nowayout/bin/gen_per_species_fa_from_bold.py -tsv BOLD_Public.05-Feb-2024.tsv -csv ../tax.csv ─╯ + +# ======================================================= +# 2024-02-08 21:37:28,541 - INFO +# ======================================================= +# Indexing tax.csv... + +# ======================================================= +# 2024-02-08 21:38:06,567 - INFO +# ======================================================= +# Creating new squences per species... + +# ======================================================= +# 2024-02-08 21:38:06,572 - INFO +# ======================================================= +# Input TSV file BOLD_Public.05-Feb-2024.tsv is not in +# GZIP format. + +# ======================================================= +# 2024-02-08 22:01:04,554 - INFO +# ======================================================= +# Collected FASTA records for 497421 species'. + +# ======================================================= +# 2024-02-08 22:24:35,000 - INFO +# ======================================================= +# No. of raw records present in `ncbitax2lin` [tax.csv]: 2550767 +# No. of valid records collected from `ncbitax2lin` [tax.csv]: 2134980 +# No. of raw records in TSV [BOLD_Public.05-Feb-2024.tsv]: 9735210 +# No. of valid records in TSV [BOLD_Public.05-Feb-2024.tsv]: 4988323 +# No. of FASTA records for which new lineages were created: 4069202 +# No. of FASTA records for which only genus, species and/or strain information were created: 919121 + +# ======================================================= +# 2024-02-08 22:24:35,001 - INFO +# ======================================================= +# Succesfully created lineages and FASTA records! Done!!
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/bin/gen_salmon_tph_and_krona_tsv.py Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,523 @@ +#!/usr/bin/env python3 + +# Kranti Konganti +# 03/06/2024 + +import argparse +import glob +import inspect +import logging +import os +import pprint +import re +from collections import defaultdict + + +# Multiple inheritence for pretty printing of help text. +class MultiArgFormatClasses( + argparse.RawTextHelpFormatter, argparse.ArgumentDefaultsHelpFormatter +): + pass + + +# Main +def main() -> None: + """ + The succesful execution of this script requires access to properly formatted + lineages.csv file which has no more than 9 columns. + + It takes the lineages.csv file, the *_hits.csv results from `sourmash gather` + mentioned with -smres option and and a root parent directory of the + `salmon quant` results mentioned with -sal option and generates a final + results table with the TPM values and a .krona.tsv file for each sample + to be used by KronaTools. + """ + # Set logging. + logging.basicConfig( + format="\n" + + "=" * 55 + + "\n%(asctime)s - %(levelname)s\n" + + "=" * 55 + + "\n%(message)s\n\n", + level=logging.DEBUG, + ) + + # Debug print. + ppp = pprint.PrettyPrinter(width=55) + prog_name = inspect.stack()[0].filename + + parser = argparse.ArgumentParser( + prog=prog_name, description=main.__doc__, formatter_class=MultiArgFormatClasses + ) + + required = parser.add_argument_group("required arguments") + + required.add_argument( + "-sal", + dest="salmon_res_dir", + default=False, + required=True, + help="Absolute UNIX path to the parent directory that contains the\n" + + "`salmon quant` results. For example, if path to\n" + + "`quant.sf` is in /hpc/john_doe/test/salmon_res/sampleA/quant.sf, then\n" + + "use this command-line option as:\n" + + "-sal /hpc/john_doe/test/salmon_res", + ) + required.add_argument( + "-lin", + dest="lin", + default=False, + required=True, + help="Absolute UNIX Path to the lineages CSV file.\n" + + "This file should have only 9 columns.", + ) + required.add_argument( + "-smres", + dest="sm_res_dir", + default=False, + required=True, + help="Absolute UNIX path to the parent directory that contains the\n" + + "filtered `sourmas gather` results. For example, if path to\n" + + "`sampleA.csv` is in /hpc/john_doe/test/sourmash_gather/sampleA.csv,\n" + + "then use this command-line option as:\n" + + "-sal /hpc/john_doe/test", + ) + parser.add_argument( + "-op", + dest="out_prefix", + default="nowayout.tblsum", + required=False, + help="Set the output file(s) prefix for output(s) generated\n" + + "by this program.", + ) + parser.add_argument( + "-sf", + dest="scale_down_factor", + default=float(10000), + required=False, + help="Set the scaling factor by which TPM values are scaled\ndown.", + ) + parser.add_argument( + "-smres-suffix", + dest="sm_res_suffix", + default="_hits.csv", + required=False, + help="Find the `sourmash gather` result files ending in this\nsuffix.", + ) + parser.add_argument( + "-failed-suffix", + dest="failed_suffix", + default="_FAILED.txt", + required=False, + help="Find the sample names which failed classification stored\n" + + "inside the files ending in this suffix.", + ) + parser.add_argument( + "-num-lin-cols", + dest="num_lin_cols", + default=int(9), + required=False, + help="Number of columns expected in the lineages CSV file.", + ) + parser.add_argument( + "-lin-acc-regex", + dest="lin_acc_regex", + default=re.compile(r"\w+[\-\.]{1}[0-9]+"), + required=False, + help="The pattern of the lineage's accession.", + ) + + args = parser.parse_args() + salmon_res_dir = args.salmon_res_dir + sm_res_dir = args.sm_res_dir + sm_res_suffix = args.sm_res_suffix + failed_suffix = args.failed_suffix + out_prefix = args.out_prefix + lin = args.lin + num_lin_cols = args.num_lin_cols + acc_pat = args.lin_acc_regex + scale_down = float(args.scale_down_factor) + no_hit = "Unclassified" + no_hit_reads = "reads mapped to the database" + tpm_const = float(1000000.0000000000) + round_to = 10 + all_samples = set() + ( + lineage2sample, + unclassified2sample, + lineage2sm, + sm2passed, + reads_total, + per_taxon_reads, + lineages, + ) = ( + defaultdict(defaultdict), + defaultdict(defaultdict), + defaultdict(defaultdict), + defaultdict(defaultdict), + defaultdict(defaultdict), + defaultdict(defaultdict), + defaultdict(int), + ) + + salmon_comb_res = os.path.join(os.getcwd(), out_prefix + ".txt") + # salmon_comb_res_reads_mapped = os.path.join( + # os.getcwd(), re.sub(".tblsum", "_reads_mapped.tblsum", out_prefix) + ".txt" + # ) + salmon_comb_res_indiv_reads_mapped = os.path.join( + os.getcwd(), + re.sub(".tblsum", "_indiv_reads_mapped.tblsum", out_prefix) + ".txt", + ) + salmon_res_files = glob.glob( + os.path.join(salmon_res_dir, "*", "quant.sf"), recursive=True + ) + sample_res_files_failed = glob.glob( + os.path.join(salmon_res_dir, "*" + failed_suffix), recursive=True + ) + sm_res_files = glob.glob( + os.path.join(sm_res_dir, "*" + sm_res_suffix), recursive=True + ) + + # Basic checks + if lin and not (os.path.exists(lin) and os.path.getsize(lin) > 0): + logging.error( + "The lineages file,\n" + + f"{os.path.basename(lin)} does not exist or is empty!" + ) + exit(1) + + if salmon_res_dir: + if not os.path.isdir(salmon_res_dir): + logging.error("UNIX path\n" + f"{salmon_res_dir}\n" + "does not exist!") + exit(1) + if len(salmon_res_files) <= 0: + with open(salmon_comb_res, "w") as salmon_comb_res_fh, open( + salmon_comb_res_indiv_reads_mapped, "w" + ) as salmon_comb_res_indiv_reads_mapped_fh: + salmon_comb_res_fh.write(f"Sample\n{no_hit} reads in all samples\n") + salmon_comb_res_indiv_reads_mapped_fh.write( + f"Sample\nNo {no_hit_reads} from all samples\n" + ) + salmon_comb_res_fh.close() + salmon_comb_res_indiv_reads_mapped_fh.close() + exit(0) + + # Only proceed if lineages.csv exists. + if lin and os.path.exists(lin) and os.path.getsize(lin) > 0: + lin_fh = open(lin, "r") + _ = lin_fh.readline() + + # Index lineages.csv + for line in lin_fh: + cols = line.strip().split(",") + + if len(cols) < num_lin_cols: + logging.error( + f"The file {os.path.basename(lin)} seems to\n" + + "be malformed. It contains less than required 9 columns." + ) + exit(1) + + if cols[0] in lineages.keys(): + continue + # logging.info( + # f"There is a duplicate accession [{cols[0]}]" + # + f" in the lineages file {os.path.basename(lin)}!" + # ) + elif acc_pat.match(cols[0]): + lineages[cols[0]] = ",".join(cols[1:]) + + lin_fh.close() + + # Index each samples' filtered sourmash results. + for sm_res_file in sm_res_files: + sample_name = re.sub(sm_res_suffix, "", os.path.basename(sm_res_file)) + + with open(sm_res_file, "r") as sm_res_fh: + _ = sm_res_fh.readline() + for line in sm_res_fh: + acc = acc_pat.findall(line.strip().split(",")[9]) + + if len(acc) == 0: + logging.info( + f"Got empty lineage accession: {acc}" + + f"\nRow elements: {line.strip().split(',')}" + ) + exit(1) + if len(acc) not in [1]: + logging.info( + f"Got more than one lineage accession: {acc}" + + f"\nRow elements: {line.strip().split(',')}" + ) + logging.info(f"Considering first element: {acc[0]}") + if acc[0] not in lineages.keys(): + logging.error( + f"The lineage accession {acc[0]} is not found in {os.path.basename(lin)}" + ) + exit(1) + lineage2sm[lineages[acc[0]]].setdefault(sample_name, 1) + sm2passed["sourmash_passed"].setdefault(sample_name, 1) + sm_res_fh.close() + + # Index each samples' salmon results. + for salmon_res_file in salmon_res_files: + sample_name = re.match( + r"(^.+?)((\_salmon\_res)|(\.salmon))$", + os.path.basename(os.path.dirname(salmon_res_file)), + )[1] + salmon_meta_json = os.path.join( + os.path.dirname(salmon_res_file), "aux_info", "meta_info.json" + ) + + if ( + not os.path.exists(salmon_meta_json) + or not os.path.getsize(salmon_meta_json) > 0 + ): + logging.error( + "The file\n" + + f"{salmon_meta_json}\ndoes not exist or is empty!\n" + + "Did `salmon quant` fail?" + ) + exit(1) + + if ( + not os.path.exists(salmon_res_file) + or not os.path.getsize(salmon_res_file) > 0 + ): + logging.error( + "The file\n" + + f"{salmon_res_file}\ndoes not exist or is empty!\n" + + "Did `salmon quant` fail?" + ) + exit(1) + + # Initiate all_tpm, rem_tpm and reads_mapped + # all_tpm + reads_total[sample_name].setdefault("all_tpm", []).append(float(0.0)) + # rem_tpm + reads_total[sample_name].setdefault("rem_tpm", []).append(float(0.0)) + # reads_mapped + reads_total[sample_name].setdefault("reads_mapped", []).append(float(0.0)) + + with open(salmon_res_file, "r") as salmon_res_fh: + for line in salmon_res_fh.readlines(): + if re.match(r"^Name.+", line): + continue + cols = line.strip().split("\t") + ref_acc = cols[0] + tpm = cols[3] + num_reads_mapped = cols[4] + + ( + reads_total[sample_name] + .setdefault("all_tpm", []) + .append( + round(float(tpm), round_to), + ) + ) + + ( + reads_total[sample_name] + .setdefault("reads_mapped", []) + .append( + round(float(num_reads_mapped), round_to), + ) + ) + + if lineages[ref_acc] in lineage2sm.keys(): + ( + lineage2sample[lineages[ref_acc]] + .setdefault(sample_name, []) + .append(round(float(tpm), round_to)) + ) + ( + per_taxon_reads[sample_name] + .setdefault(lineages[ref_acc], []) + .append(round(float(num_reads_mapped))) + ) + else: + ( + reads_total[sample_name] + .setdefault("rem_tpm", []) + .append( + round(float(tpm), round_to), + ) + ) + + salmon_res_fh.close() + + # Index each samples' complete failure results i.e., 100% unclassified. + for sample_res_file_failed in sample_res_files_failed: + sample_name = re.sub( + failed_suffix, "", os.path.basename(sample_res_file_failed) + ) + with open("".join(sample_res_file_failed), "r") as no_calls_fh: + for line in no_calls_fh.readlines(): + if line in ["\n", "\n\r", "\r"]: + continue + unclassified2sample[sample_name].setdefault(no_hit, tpm_const) + no_calls_fh.close() + + # Finally, write all results. + for sample in sorted(reads_total.keys()) + sorted(unclassified2sample.keys()): + all_samples.add(sample) + + # Check if sourmash results exist but salmon `quant` failed + # and if so, set the sample to 100% Unclassified as well. + for sample in sm2passed["sourmash_passed"].keys(): + if sample not in all_samples: + unclassified2sample[sample].setdefault(no_hit, tpm_const) + all_samples.add(sample) + + # Write total number of reads mapped to nowayout database. + # with open(salmon_comb_res_reads_mapped, "w") as nowo_reads_mapped_fh: + # nowo_reads_mapped_fh.write( + # "\t".join( + # [ + # "Sample", + # "All reads", + # "Classified reads", + # "Unclassified reads (Reads failed thresholds )", + # ] + # ) + # ) + + # for sample in all_samples: + # if sample in reads_total.keys(): + # nowo_reads_mapped_fh.write( + # "\n" + # + "\t".join( + # [ + # f"\n{sample}", + # f"{int(sum(reads_total[sample]['reads_mapped']))}", + # f"{int(reads_total[sample]['reads_mapped'])}", + # f"{int(reads_total[sample]['rem_tpm'])}", + # ], + # ) + # ) + # else: + # nowo_reads_mapped_fh.write(f"\n{sample}\t{int(0.0)}") + # nowo_reads_mapped_fh.close() + + # Write scaled down TPM values for each sample. + with open(salmon_comb_res, "w") as salmon_comb_res_fh, open( + salmon_comb_res_indiv_reads_mapped, "w" + ) as salmon_comb_res_indiv_reads_mapped_fh: + salmon_comb_res_fh.write("Lineage\t" + "\t".join(all_samples) + "\n") + salmon_comb_res_indiv_reads_mapped_fh.write( + "Lineage\t" + "\t".join(all_samples) + "\n" + ) + + # Write *.krona.tsv header for all samples. + for sample in all_samples: + krona_fh = open( + os.path.join(salmon_res_dir, sample + ".krona.tsv"), "w" + ) + krona_fh.write( + "\t".join( + [ + "fraction", + "superkingdom", + "phylum", + "class", + "order", + "family", + "genus", + "species", + ] + ) + ) + krona_fh.close() + + # Write the TPM values (TPM/scale_down) for valid lineages. + for lineage in lineage2sm.keys(): + salmon_comb_res_fh.write(lineage) + salmon_comb_res_indiv_reads_mapped_fh.write(lineage) + + for sample in all_samples: + krona_fh = open( + os.path.join(salmon_res_dir, sample + ".krona.tsv"), "a" + ) + + if sample in unclassified2sample.keys(): + salmon_comb_res_fh.write(f"\t0.0") + salmon_comb_res_indiv_reads_mapped_fh.write(f"\t0") + elif sample in lineage2sample[lineage].keys(): + reads = sum(per_taxon_reads[sample][lineage]) + tpm = sum(lineage2sample[lineage][sample]) + tph = round(tpm / scale_down, round_to) + lineage2sample[sample].setdefault("hits_tpm", []).append( + float(tpm) + ) + + salmon_comb_res_fh.write(f"\t{tph}") + salmon_comb_res_indiv_reads_mapped_fh.write(f"\t{reads}") + krona_lin_row = lineage.split(",") + + if len(krona_lin_row) > num_lin_cols - 1: + logging.error( + "Taxonomy columns are more than 8 for the following lineage:" + + f"{krona_lin_row}" + ) + exit(1) + else: + krona_fh.write( + "\n" + + str(round((tpm / tpm_const), round_to)) + + "\t" + + "\t".join(krona_lin_row[:-1]) + ) + else: + salmon_comb_res_fh.write(f"\t0.0") + salmon_comb_res_indiv_reads_mapped_fh.write(f"\t0") + krona_fh.close() + + salmon_comb_res_fh.write("\n") + salmon_comb_res_indiv_reads_mapped_fh.write(f"\n") + + # Finally write TPH (TPM/scale_down) for Unclassified + # Row = Unclassified / No reads mapped to the database ... + salmon_comb_res_fh.write(f"{no_hit}") + salmon_comb_res_indiv_reads_mapped_fh.write(f"Total {no_hit_reads}") + + for sample in all_samples: + krona_ufh = open( + os.path.join(salmon_res_dir, sample + ".krona.tsv"), "a" + ) + # krona_ufh.write("\t") + if sample in unclassified2sample.keys(): + salmon_comb_res_fh.write( + f"\t{round((unclassified2sample[sample][no_hit] / scale_down), round_to)}" + ) + salmon_comb_res_indiv_reads_mapped_fh.write(f"\t0") + krona_ufh.write( + f"\n{round((unclassified2sample[sample][no_hit] / tpm_const), round_to)}" + ) + else: + trace_tpm = tpm_const - sum(reads_total[sample]["all_tpm"]) + trace_tpm = float(f"{trace_tpm:.{round_to}f}") + if trace_tpm <= 0: + trace_tpm = float(0.0) + tph_unclassified = float( + f"{(sum(reads_total[sample]['rem_tpm']) + trace_tpm) / scale_down:{round_to}f}" + ) + krona_unclassified = float( + f"{(sum(reads_total[sample]['rem_tpm']) + trace_tpm) / tpm_const:{round_to}f}" + ) + salmon_comb_res_fh.write(f"\t{tph_unclassified}") + salmon_comb_res_indiv_reads_mapped_fh.write( + f"\t{int(sum(sum(per_taxon_reads[sample].values(), [])))}" + ) + krona_ufh.write(f"\n{krona_unclassified}") + krona_ufh.write("\t" + "\t".join(["unclassified"] * (num_lin_cols - 2))) + krona_ufh.close() + + salmon_comb_res_fh.close() + salmon_comb_res_indiv_reads_mapped_fh.close() + # ppp.pprint(lineage2sample) + # ppp.pprint(lineage2sm) + # ppp.pprint(reads_total) + + +if __name__ == "__main__": + main()
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/bin/gen_sim_abn_table.py Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,191 @@ +#!/usr/bin/env python3 + +# Kranti Konganti + +import argparse +import glob +import inspect +import logging +import os +import pprint +import re +from collections import defaultdict + + +# Multiple inheritence for pretty printing of help text. +class MultiArgFormatClasses( + argparse.RawTextHelpFormatter, argparse.ArgumentDefaultsHelpFormatter +): + pass + + +# Main +def main() -> None: + """ + This script will take the final taxonomic classification files and create a + global relative abundance type file in the current working directory. The + relative abundance type files should be in CSV or TSV format and should have + the lineage or taxonomy in first column and samples in the subsequent columns. + """ + # Set logging. + logging.basicConfig( + format="\n" + + "=" * 55 + + "\n%(asctime)s - %(levelname)s\n" + + "=" * 55 + + "\n%(message)s\n\n", + level=logging.DEBUG, + ) + + # Debug print. + ppp = pprint.PrettyPrinter(width=55) + prog_name = inspect.stack()[0].filename + + parser = argparse.ArgumentParser( + prog=prog_name, description=main.__doc__, formatter_class=MultiArgFormatClasses + ) + + required = parser.add_argument_group("required arguments") + + required.add_argument( + "-abn", + dest="rel_abn_dir", + default=False, + required=True, + help="Absolute UNIX path to the parent directory that contains the\n" + + "abundance type files.", + ) + parser.add_argument( + "-op", + dest="out_prefix", + default="nowayout.tblsum", + required=False, + help="Set the output file(s) prefix for output(s) generated\nby this program.", + ) + parser.add_argument( + "-header", + dest="header", + action="store_true", + default=True, + required=False, + help="Do the relative abundance files have a header.", + ) + parser.add_argument( + "-filepat", + dest="file_pat", + default="*.lineage_summary.tsv", + required=False, + help="Files will be searched by this suffix for merged output generation\nby this program.", + ) + parser.add_argument( + "-failedfilepat", + dest="failed_file_pat", + default="*FAILED.txt", + required=False, + help="Files will be searched by this suffix for merged output generation\nby this program.", + ) + parser.add_argument( + "-delim", + dest="delim", + default="\t", + required=False, + help="The delimitor by which the fields are separated in the file.", + ) + + args = parser.parse_args() + rel_abn_dir = args.rel_abn_dir + is_header = args.header + out_prefix = args.out_prefix + file_pat = args.file_pat + failed_file_pat = args.failed_file_pat + delim = args.delim + suffix = re.sub(r"^\*", "", file_pat) + rel_abn_comb = os.path.join(os.getcwd(), out_prefix + ".txt") + rel_abn_files = glob.glob(os.path.join(rel_abn_dir, file_pat)) + failed_rel_abn_files = glob.glob(os.path.join(rel_abn_dir, failed_file_pat)) + empty_results = "Relative abundance results did not pass thresholds" + sample2lineage, seen_lineage = (defaultdict(defaultdict), defaultdict(int)) + + if len(rel_abn_files) == 0: + logging.info( + "Unable to find any files with .tsv extentsion.\nNow trying .csv extension." + ) + rel_abn_files = glob.glob(os.path.join(rel_abn_dir, "*.csv")) + delim = "," + + if len(failed_rel_abn_files) == 0: + logging.info( + f"Unable to find any files with patttern {failed_file_pat}.\n" + + "The failed samples will not appear in the final aggregate file." + ) + + if rel_abn_dir: + if not os.path.isdir(rel_abn_dir): + logging.error("UNIX path\n" + f"{rel_abn_dir}\n" + "does not exist!") + exit(1) + if len(rel_abn_files) <= 0: + with open(rel_abn_comb, "w") as rel_abn_comb_fh: + rel_abn_comb_fh.write(f"Sample\n{empty_results} in any samples\n") + rel_abn_comb_fh.close() + exit(0) + + for failed_rel_abn in failed_rel_abn_files: + with open(failed_rel_abn, "r") as failed_fh: + sample2lineage[failed_fh.readline().strip()].setdefault( + "unclassified", [] + ).append(float("1.0")) + failed_fh.close() + + for rel_abn_file in rel_abn_files: + sample_name = re.match(r"(^.+?)\..*$", os.path.basename(rel_abn_file))[1] + + with open(rel_abn_file, "r") as rel_abn_fh: + if is_header: + sample_names = rel_abn_fh.readline().strip().split(delim)[1:] + if len(sample_names) > 2: + logging.error( + "The individual relative abundance file has more " + + "\nthan 1 sample. This is rare in the context of running the " + + "\n nowayout Nextflow workflow." + ) + exit(1) + elif len(sample_names) < 2: + sample_name = re.sub(suffix, "", os.path.basename(rel_abn_file)) + logging.info( + "Seems like there is no sample name in the lineage summary file." + + f"\nTherefore, sample name has been extracted from file name: {sample_name}." + ) + else: + sample_name = sample_names[0] + + for line in rel_abn_fh.readlines(): + cols = line.strip().split(delim) + lineage = cols[0] + abn = cols[1] + sample2lineage[sample_name].setdefault(lineage, []).append( + float(abn) + ) + seen_lineage[lineage] = 1 + + with open(rel_abn_comb, "w") as rel_abn_comb_fh: + samples = sorted(sample2lineage.keys()) + rel_abn_comb_fh.write(f"Lineage{delim}" + delim.join(samples) + "\n") + + for lineage in sorted(seen_lineage.keys()): + rel_abn_comb_fh.write(lineage) + for sample in samples: + if lineage in sample2lineage[sample].keys(): + rel_abn_comb_fh.write( + delim + + "".join( + [str(abn) for abn in sample2lineage[sample][lineage]] + ) + ) + else: + rel_abn_comb_fh.write(f"{delim}0.0") + rel_abn_comb_fh.write("\n") + rel_abn_comb_fh.close() + + +if __name__ == "__main__": + main()
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/bin/remove_dup_fasta_ids.py Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,201 @@ +#!/usr/bin/env python3 + +import argparse +import gzip +import inspect +import logging +import os +import pprint +import shutil +from typing import BinaryIO, TextIO, Union + +from Bio import SeqIO +from Bio.Seq import Seq +from Bio.SeqRecord import SeqRecord +from genericpath import isdir + + +# Multiple inheritence for pretty printing of help text. +class MultiArgFormatClasses( + argparse.RawTextHelpFormatter, argparse.ArgumentDefaultsHelpFormatter +): + pass + + +def write_fasta(seq: str, id: str, fh: Union[TextIO, BinaryIO]) -> None: + """ + Write sequence with no description to specified file. + """ + SeqIO.write( + SeqRecord(Seq(seq), id=id, description=str()), + fh, + "fasta", + ) + + +# Main +def main() -> None: + """ + This script takes: + 1. A FASTA file in gzip or non-gzip (ASCII TXT) format and + + and then generates a new FASTA file with duplicate FASTA IDs replaced + with a unique ID. + """ + + # Set logging. + logging.basicConfig( + format="\n" + + "=" * 55 + + "\n%(asctime)s - %(levelname)s\n" + + "=" * 55 + + "\n%(message)s\n\n", + level=logging.DEBUG, + ) + + # Debug print. + ppp = pprint.PrettyPrinter(width=55) + prog_name = os.path.basename(inspect.stack()[0].filename) + + parser = argparse.ArgumentParser( + prog=prog_name, description=main.__doc__, formatter_class=MultiArgFormatClasses + ) + + required = parser.add_argument_group("required arguments") + + required.add_argument( + "-fna", + dest="fna", + default=False, + required=True, + help="Absolute UNIX path to .fna or .fna.gz file.", + ) + parser.add_argument( + "-lin", + dest="lineages", + default=False, + required=False, + help="Absolute UNIX path to lineages.csv file for which the" + + "\nthe duplicate IDs will be made unique corresponding to" + + "\nthe FASTA IDs", + ) + parser.add_argument( + "-outdir", + dest="out_folder", + default=os.getcwd(), + required=False, + help="By default, the output is written to this\nfolder.", + ) + parser.add_argument( + "-f", + dest="force_write_out", + default=False, + action="store_true", + required=False, + help="Force overwrite the output file.", + ) + parser.add_argument( + "--fna-suffix", + dest="fna_suffix", + default=".fna", + required=False, + help="Suffix of the output FASTA file.", + ) + + # Parse defaults + args = parser.parse_args() + fna = args.fna + lineages = args.lineages + outdir = args.out_folder + overwrite = args.force_write_out + fna_suffix = args.fna_suffix + new_fna = os.path.join( + outdir, os.path.basename(fna).split(".")[0] + "_dedup_ids" + fna_suffix + ) + lin_header = False + new_lin = False + seen_ids = dict() + seen_lineages = dict() + + # Basic checks + if not overwrite and os.path.exists(new_fna): + logging.warning( + f"Output destination [{os.path.basename(new_fna)}] already exists!" + + "\nPlease use -f to delete and overwrite." + ) + elif overwrite and os.path.exists(new_fna): + logging.info(f"Overwrite requested. Deleting {os.path.basename(new_fna)}...") + if os.path.isdir(new_fna): + shutil.rmtree(new_fna) + else: + os.remove(new_fna) + + # Prepare for writing + new_fna_fh = open(new_fna, "+at") + + # If lineages file is mentioned, index it. + if lineages and os.path.exists(lineages) and os.path.getsize(lineages) > 0: + new_lin = os.path.join(os.getcwd(), os.path.basename(lineages) + "_dedup.csv") + new_lin_fh = open(new_lin, "w") + with open(lineages, "r") as l_fh: + lin_header = l_fh.readline() + for line in l_fh: + cols = line.strip().split(",") + if len(cols) < 9: + logging.error( + f"The row in the lineages file {os.path.basename(lineages)}" + + f"\ndoes not have 9 required columns: {len(cols)}" + + f"\n\n{lin_header.strip()}\n{line.strip()}" + ) + exit(1) + elif len(cols) > 9: + logging.info( + f"The row in the lineages file {os.path.basename(lineages)}" + + f"\nhas more than 9 required columns: {len(cols)}" + + f"\nRetaining only 9 columns of the following 10 columns." + + f"\n\n{lin_header.strip()}\n{line.strip()}" + ) + + if cols[0] not in seen_lineages.keys(): + seen_lineages[cols[0]] = ",".join(cols[1:9]) + + new_lin_fh.write(lin_header) + l_fh.close() + + # Read FASTA and create unique FASTA IDs. + logging.info(f"Creating new FASTA with unique IDs.") + try: + fna_fh = gzip.open(fna, "rt") + _ = fna_fh.readline() + except gzip.BadGzipFile: + logging.info( + f"Input FASTA file {os.path.basename(fna)} is not in\nGZIP format." + + "\nAttempting text parsing." + ) + fna_fh = open(fna, "r") + + for record in SeqIO.parse(fna_fh, format="fasta"): + seq_id = record.id + + if record.id not in seen_ids.keys(): + seen_ids[record.id] = 1 + else: + seen_ids[record.id] += 1 + + if seen_ids[seq_id] > 1: + seq_id = str(record.id) + str(seen_ids[record.id]) + + if new_lin: + new_lin_fh.write(",".join([seq_id, seen_lineages[record.id]]) + "\n") + + write_fasta(record.seq, seq_id, new_fna_fh) + + if new_lin: + new_lin_fh.close() + + logging.info("Done!") + + +if __name__ == "__main__": + + main()
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/bin/sourmash_filter_hits.py Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,193 @@ +#!/usr/bin/env python3 + +# Kranti Konganti + +import argparse +import gzip +import inspect +import logging +import os +import pprint +import re + +# Set logging. +logging.basicConfig( + format="\n" + "=" * 55 + "\n%(asctime)s - %(levelname)s\n" + "=" * 55 + "\n%(message)s\n\n", + level=logging.DEBUG, +) + +# Debug print. +ppp = pprint.PrettyPrinter(width=50, indent=4) + +# Multiple inheritence for pretty printing of help text. +class MultiArgFormatClasses(argparse.RawTextHelpFormatter, argparse.ArgumentDefaultsHelpFormatter): + pass + + +def write_failures(prefix: str, file: os.PathLike) -> None: + with open(file, "w") as outfile_failed_fh: + outfile_failed_fh.write(f"{prefix}\n") + outfile_failed_fh.close() + + +def main() -> None: + """ + This script will take the CSV output of `sourmash search` and `sourmash gather` + and will return a column's value filtered by requested column name and its value + """ + + prog_name = os.path.basename(inspect.stack()[0].filename) + + parser = argparse.ArgumentParser( + prog=prog_name, description=main.__doc__, formatter_class=MultiArgFormatClasses + ) + + required = parser.add_argument_group("required arguments") + + required.add_argument( + "-csv", + dest="csv", + default=False, + required=True, + help="Absolute UNIX path to CSV file containing output from\n" + + "`sourmash gather` or `sourmash search`", + ) + required.add_argument( + "-extract", + dest="extract", + required=False, + default="name", + help="Extract this column's value which matches the filters.\n" + + "Controlled by -fcn and -fcv.", + ) + parser.add_argument( + "-all", + dest="alllines", + required=False, + default=False, + action="store_true", + help="Instead of just the column value, print entire row.", + ) + parser.add_argument( + "-fcn", + dest="filter_col_name", + default="f_match", + required=False, + help="Column name by which the filtering of rows\nshould be applied.", + ) + parser.add_argument( + "-fcv", + dest="filter_col_val", + default="0", + required=False, + help="Only rows where the column (defined by --fcn)\nsatisfies this value will be\n" + + "will be considered. This can be numeric, regex\nor a string value.", + ) + parser.add_argument( + "-gt", + dest="gt", + default=True, + required=False, + action="store_true", + help="Apply greater than or equal to condition on\nnumeric values of --fcn column.", + ) + parser.add_argument( + "-lt", + dest="lt", + default=False, + required=False, + action="store_true", + help="Apply less than or equal to condition on\nnumeric values of --fcn column.", + ) + + args = parser.parse_args() + csv = args.csv + ex = args.extract + all_lines = args.alllines + fcn = args.filter_col_name + fcv = args.filter_col_val + gt = args.gt + lt = args.lt + hits = set() + hit_lines = set() + empty_lines = 0 + + outfile_prefix = re.sub(r"(^.*?)\.csv\.gz", r"\1", os.path.basename(csv)) + outfile_failed = os.path.join(os.getcwd(), "_".join([outfile_prefix, "FAILED.txt"])) + + if csv and (not os.path.exists(csv) or not os.path.getsize(csv) > 0): + logging.error( + "The CSV file,\n" + f"{os.path.basename(csv)} does not exists or\nis of size zero." + ) + write_failures(outfile_prefix, outfile_failed) + exit(0) + + if all_lines: + outfile = os.path.join(os.getcwd(), "_".join([outfile_prefix, "hits.csv"])) + else: + outfile = os.path.join(os.getcwd(), "_".join([outfile_prefix, "template_hits.txt"])) + + with gzip.open(csv, "rb") as csv_fh: + header_cols = dict( + [ + (col, ele) + for ele, col in enumerate(csv_fh.readline().decode("utf-8").strip().split(",")) + ] + ) + + if fcn and ex not in header_cols.keys(): + logging.info( + f"The header row in file\n{os.path.basename(csv)}\n" + + "does not have a column whose names are:\n" + + f"-fcn: {fcn} and -extract: {ex}" + ) + exit(1) + + for line in csv_fh: + line = line.decode("utf-8") + + if line in ["\n", "\n\r"]: + empty_lines += 1 + continue + + cols = [x.strip() for x in line.strip().split(",")] + investigate = float(format(float(cols[header_cols[fcn]]), '.10f')) + fcv = float(fcv) + + if re.match(r"[\d\.]+", str(investigate)): + if gt and investigate >= fcv: + hits.add(cols[header_cols[ex]]) + hit_lines.add(line.strip()) + elif lt and investigate <= fcv: + hits.add(cols[header_cols[ex]]) + hit_lines.add(line.strip()) + elif investigate == fcv: + hits.add(cols[header_cols[ex]]) + hit_lines.add(line.strip()) + + csv_fh.close() + + if len(hits) >= 1: + with open(outfile, "w") as outfile_fh: + outfile_fh.write(",".join(header_cols.keys()) + "\n") + if all_lines: + outfile_fh.write("\n".join(hit_lines) + "\n") + else: + outfile_fh.writelines("\n".join(hits) + "\n") + outfile_fh.close() + else: + write_failures(outfile_prefix, outfile_failed) + + if empty_lines > 0: + empty_lines_msg = f"Skipped {empty_lines} empty line(s).\n" + + logging.info( + empty_lines_msg + + f"File {os.path.basename(csv)}\n" + + f"written in:\n{os.getcwd()}\nDone! Bye!" + ) + exit(0) + + +if __name__ == "__main__": + main()
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/conf/base.config Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,58 @@ +plugins { + id 'nf-amazon' +} + +params { + fs = File.separator + cfsanpipename = 'CPIPES' + center = 'CFSAN, FDA.' + libs = "${projectDir}${params.fs}lib" + modules = "${projectDir}${params.fs}modules" + projectconf = "${projectDir}${params.fs}conf" + assetsdir = "${projectDir}${params.fs}assets" + subworkflows = "${projectDir}${params.fs}subworkflows" + workflows = "${projectDir}${params.fs}workflows" + workflowsconf = "${workflows}${params.fs}conf" + routines = "${libs}${params.fs}routines" + toolshelp = "${libs}${params.fs}help" + swmodulepath = "${params.fs}nfs${params.fs}software${params.fs}modules" + tracereportsdir = "${launchDir}${params.fs}${cfsanpipename}-${params.pipeline}${params.fs}nextflow-reports" + dummyfile = "${projectDir}${params.fs}assets${params.fs}dummy_file.txt" + dummyfile2 = "${projectDir}${params.fs}assets${params.fs}dummy_file2.txt" + max_cpus = 10 + linewidth = 80 + pad = 32 + pipeline = null + help = null + input = null + output = null + metadata = null + publish_dir_mode = "copy" + publish_dir_overwrite = true + user_email = null +} + +dag { + enabled = true + file = "${params.tracereportsdir}${params.fs}${params.pipeline}_dag.html" + overwrite = true +} + +report { + enabled = true + file = "${params.tracereportsdir}${params.fs}${params.pipeline}_exec_report.html" + overwrite = true +} + +trace { + enabled = true + file = "${params.tracereportsdir}${params.fs}${params.pipeline}_exec_trace.txt" + overwrite = true +} + +timeline { + enabled = true + file = "${params.tracereportsdir}${params.fs}${params.pipeline}_exec_timeline.html" + overwrite = true +} +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/conf/computeinfra.config Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,155 @@ +standard { + process.executor = 'local' + process.cpus = 1 + params.enable_conda = false + params.enable_module = true + singularity.enabled = false + docker.enabled = false +} + +stdkondagac { + process.executor = 'local' + process.cpus = 4 + params.enable_conda = true + conda.enabled = true + conda.useMicromamba = true + params.enable_module = false + singularity.enabled = false + docker.enabled = false +} + +stdcingularitygac { + process.executor = 'local' + process.cpus = 4 + params.enable_conda = false + params.enable_module = false + singularity.enabled = true + singularity.autoMounts = true + singularity.runOptions = "-B ${params.input} -B ${params.bcs_root_dbdir}" + docker.enabled = false +} + +raven { + process.executor = 'slurm' + process.queue = 'prod' + process.memory = '10GB' + process.cpus = 4 + params.enable_conda = false + params.enable_module = true + singularity.enabled = false + docker.enabled = false + clusterOptions = '--signal B:USR2' +} + +eprod { + process.executor = 'slurm' + process.queue = 'lowmem,midmem,bigmem' + process.memory = '10GB' + process.cpus = 4 + params.enable_conda = false + params.enable_module = true + singularity.enabled = false + docker.enabled = false + clusterOptions = '--signal B:USR2' +} + +eprodkonda { + process.executor = 'slurm' + process.queue = 'lowmem,midmem,bigmem' + process.memory = '10GB' + process.cpus = 4 + params.enable_conda = true + conda.enabled = true + conda.useMicromamba = true + params.enable_module = false + singularity.enabled = false + singularity.autoMounts = true + singularity.runOptions = "-B ${params.input} -B ${params.bcs_root_dbdir}" + docker.enabled = false + clusterOptions = '--signal B:USR2' +} + +eprodcingularity { + process.executor = 'slurm' + process.queue = 'lowmem,midmem,bigmem' + process.memory = '10GB' + process.cpus = 4 + params.enable_conda = false + params.enable_module = false + singularity.enabled = true + singularity.autoMounts = true + singularity.runOptions = "-B ${params.input} -B ${params.bcs_root_dbdir}" + docker.enabled = false + clusterOptions = '--signal B:USR2' +} + +cingularity { + process.executor = 'slurm' + process.queue = 'prod' + process.memory = '10GB' + process.cpus = 4 + singularity.enabled = true + singularity.autoMounts = true + singularity.runOptions = "-B ${params.input} -B ${params.bcs_root_dbdir}" + docker.enabled = false + params.enable_conda = false + params.enable_module = false + clusterOptions = '--signal B:USR2' +} + +cingularitygac { + process.executor = 'slurm' + executor.$slurm.exitReadTimeout = 120000 + process.queue = 'centriflaken' + process.cpus = 4 + singularity.enabled = true + singularity.autoMounts = true + singularity.runOptions = "-B ${params.input} -B ${params.bcs_root_dbdir}" + docker.enabled = false + params.enable_conda = false + params.enable_module = false + clusterOptions = '-n 1 --signal B:USR2' +} + +konda { + process.executor = 'slurm' + process.queue = 'prod' + process.memory = '10GB' + process.cpus = 4 + singularity.enabled = false + docker.enabled = false + params.enable_conda = true + conda.enabled = true + conda.useMicromamba = true + params.enable_module = false + clusterOptions = '--signal B:USR2' +} + +kondagac { + process.executor = 'slurm' + executor.$slurm.exitReadTimeout = 120000 + process.queue = 'centriflaken' + process.cpus = 4 + singularity.enabled = false + docker.enabled = false + params.enable_conda = true + conda.enabled = true + conda.useMicromamba = true + params.enable_module = false + clusterOptions = '-n 1 --signal B:USR2' +} + +cfsanawsbatch { + process.executor = 'awsbatch' + process.queue = 'cfsan-nf-batch-job-queue' + aws.batch.cliPath = '/home/ec2-user/miniconda/bin/aws' + aws.batch.region = 'us-east-1' + aws.batch.volumes = ['/hpc/db:/hpc/db:ro', '/hpc/scratch:/hpc/scratch:rw'] + singularity.enabled = false + singularity.autoMounts = true + docker.enabled = true + params.enable_conda = false + conda.enabled = false + conda.useMicromamba = false + params.enable_module = false +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/conf/fastq.config Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,9 @@ +params { + fq_filter_by_len = "4000" + fq_suffix = ".fastq.gz" + fq2_suffix = false + fq_strandedness = "unstranded" + fq_single_end = false + fq_filename_delim = "_" + fq_filename_delim_idx = "1" +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/conf/logtheseparams.config Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,17 @@ +params { + logtheseparams = [ + "${params.metadata}" ? 'metadata' : null, + "${params.input}" ? 'input' : null, + "${params.output}" ? 'output' : null, + "${params.fq_suffix}" ? 'fq_suffix' : null, + "${params.fq2_suffix}" ? 'fq2_suffix' : null, + "${params.fq_strandedness}" ? 'fq_strandedness' : null, + "${params.fq_single_end}" ? 'fq_single_end' : null, + "${params.fq_filter_by_len}" ? 'fq_filter_by_len' : null, + "${params.fq_filename_delim}" ? 'fq_filename_delim' : null, + "${params.fq_filename_delim_idx}" ? 'fq_filename_delim_idx' : null, + 'enable_conda', + 'enable_module', + 'max_cpus' + ] +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/conf/manifest.config Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,8 @@ +manifest { + author = 'Kranti.Konganti@fda.hhs.gov' + homePage = 'https://cfsan-git.fda.gov/Kranti.Konganti/cpipes' + name = 'CPIPES' + version = '0.8.0' + nextflowVersion = '>=23.04' + description = 'Modular Nextflow pipelines at CFSAN, FDA.' +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/conf/modules.config Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,122 @@ +process { + publishDir = [ + path: { + "${task.process.tokenize(':')[-1].toLowerCase()}" == "multiqc" ? + "${params.output}${params.fs}${params.pipeline.toLowerCase()}-${task.process.tokenize(':')[-1].toLowerCase()}" : + "${params.output}${params.fs}${task.process.tokenize(':')[-1].toLowerCase()}" + }, + mode: params.publish_dir_mode, + overwrite: params.publish_dir_overwrite, + saveAs: { filename -> filename =~ /^versions.yml|.+?_mqc.*/ ? null : filename } + ] + + errorStrategy = { + ![0].contains(task.exitStatus) ? dynamic_retry(task.attempt, 10) : 'finish' + } + + maxRetries = 1 + resourceLabels = {[ + process: task.process, + memoryRequested: task.memory.toString(), + cpusRequested: task.cpus.toString() + ]} + + withLabel: 'process_femto' { + cpus = { 1 * task.attempt } + memory = { 1.GB * task.attempt } + time = { 1.h * task.attempt } + } + + withLabel: 'process_pico' { + cpus = { min_cpus(2) * task.attempt } + memory = { 4.GB * task.attempt } + time = { 2.h * task.attempt } + } + + withLabel: 'process_nano' { + cpus = { min_cpus(4) * task.attempt } + memory = { 8.GB * task.attempt } + time = { 4.h * task.attempt } + } + + withLabel: 'process_micro' { + cpus = { min_cpus(8) * task.attempt } + memory = { 16.GB * task.attempt } + time = { 8.h * task.attempt } + } + + withLabel: 'process_only_mem_low' { + cpus = { 1 * task.attempt } + memory = { 60.GB * task.attempt } + time = { 20.h * task.attempt } + } + + withLabel: 'process_only_mem_medium' { + cpus = { 1 * task.attempt } + memory = { 100.GB * task.attempt } + time = { 30.h * task.attempt } + } + + withLabel: 'process_only_mem_high' { + cpus = { 1 * task.attempt } + memory = { 128.GB * task.attempt } + time = { 60.h * task.attempt } + } + + withLabel: 'process_low' { + cpus = { min_cpus(10) * task.attempt } + memory = { 60.GB * task.attempt } + time = { 20.h * task.attempt } + } + + withLabel: 'process_medium' { + cpus = { min_cpus(10) * task.attempt } + memory = { 100.GB * task.attempt } + time = { 30.h * task.attempt } + } + + withLabel: 'process_high' { + cpus = { min_cpus(10) * task.attempt } + memory = { 128.GB * task.attempt } + time = { 60.h * task.attempt } + } + + withLabel: 'process_higher' { + cpus = { min_cpus(10) * task.attempt } + memory = { 256.GB * task.attempt } + time = { 60.h * task.attempt } + } + + withLabel: 'process_gigantic' { + cpus = { min_cpus(10) * task.attempt } + memory = { 512.GB * task.attempt } + time = { 60.h * task.attempt } + } +} + +if ( (params.input || params.metadata ) && params.pipeline ) { + try { + includeConfig "${params.workflowsconf}${params.fs}process${params.fs}${params.pipeline}.process.config" + } catch (Exception e) { + System.err.println('-'.multiply(params.linewidth) + "\n" + + "\033[0;31m${params.cfsanpipename} - ERROR\033[0m\n" + + '-'.multiply(params.linewidth) + "\n" + "\033[0;31mCould not load " + + "default pipeline's process configuration. Please provide a pipeline \n" + + "name using the --pipeline option.\n\033[0m" + '-'.multiply(params.linewidth) + "\n") + System.exit(1) + } +} + +// Function will return after sleeping for some time. +// Sleep time increases exponentially by task attempt. +def dynamic_retry(task_retry_num, factor_by) { + // sleep(Math.pow(2, task_retry_num.toInteger()) * factor_by.toInteger() as long) + sleep(Math.pow(1.27, task_retry_num.toInteger()) as long) + return 'retry' +} + +// Function that will adjust the minimum number of CPU +// cores depending as requested by the user. +def min_cpus(cores) { + return Math.min(cores as int, "${params.max_cpus}" as int) +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/conf/multiqc/nowayout_mqc.yml Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,65 @@ +title: CPIPES Report +intro_text: > + CPIPES (CFSAN PIPELINES) is a modular bioinformatics data analysis project at CFSAN, FDA based on NEXTFLOW DSL2. +report_comment: > + This report has been generated by the <a href="https://github.com/CFSAN-Biostatistics/sequoia/blob/master/readme/Workflow_Name_Placeholder.md" target="_blank">CPIPES - Workflow_Name_Placeholder</a> + analysis pipeline. Only certain tables and plots are reported here. For complete results, please refer to the analysis pipeline output directory. +report_header_info: + - CPIPES Version: CPIPES_Version_Placeholder + - Workflow: Workflow_Name_Placeholder + - Workflow Version: Workflow_Version_Placeholder + - Conceived By: "Kranti Konganti" + - Input Directory: Workflow_Input_Placeholder + - Output Directory: Workflow_Output_Placeholder +show_analysis_paths: False +show_analysis_time: False +disable_version_detection: true +report_section_order: + kraken: + order: -994 + NOWAYOUT_collated_table: + order: -995 + NOWAYOUT_INDIV_READS_MAPPED_collated_table: + order: -996 + fastp: + order: -997 + fastqc: + order: -998 + software_versions: + order: -999 + +export_plots: true + +# Run only these modules +run_modules: + - fastqc + - fastp + - kraken + - custom_content + +module_order: + - kraken: + name: "SOURMASH TAX METAGENOME" + href: "https://sourmash.readthedocs.io/en/latest/command-line.html#sourmash-tax-metagenome-summarize-metagenome-content-from-gather-results" + doi: "10.21105/joss.00027" + info: > + section of the report shows how <b>reads</b> are approximately classified. + Please note that the plot title below is shown as + <b>Kraken2: Top taxa</b> since <code>kreport</code> fornat was used + to create Kraken-style reports with <code>sourmash tax metagenome</code>. + path_filters: + - "*.kreport.txt" + - fastqc: + name: "FastQC" + info: > + section of the report shows FastQC results <b>before</b> adapter trimming + on SE reads or on merged PE reads. + path_filters: + - "*_fastqc.zip" + - fastp: + name: "fastp" + info: > + section of the report shows read statistics <b>before</b> and <b>after</b> adapter trimming + with <code>fastp</code> on SE reads or on merged PE reads. + path_filters: + - "*.fastp.json"
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/cpipes Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,58 @@ +#!/usr/bin/env nextflow + +/* +---------------------------------------------------------------------------------------- + cfsan-dev/cpipes +---------------------------------------------------------------------------------------- + NAME : CPIPES + DESCRIPTION : Modular Nextflow pipelines at CFSAN, FDA. + GITLAB : https://xxxxxxxxxx/Kranti.Konganti/cpipes-framework + JIRA : https://xxxxxxxxxx/jira/projects/CPIPES/ + CONTRIBUTORS : Kranti Konganti +---------------------------------------------------------------------------------------- +*/ + +// Enable DSL 2 +nextflow.enable.dsl = 2 + +// Default routines for MAIN +include { pipelineBanner; stopNow; } from "${params.routines}" + +// Our banner for CPIPES +log.info pipelineBanner() + +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + INCLUDE ALL WORKFLOWS +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/ + +switch ("${params.pipeline}") { + case "nowayout": + include { NOWAYOUT } from "${params.workflows}${params.fs}${params.pipeline}" + break + default: + stopNow("PLEASE MENTION A PIPELINE NAME. Ex: --pipeline nowayout") +} + +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + RUN ALL WORKFLOWS +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/ + +workflow { + // THIS IS REPETETIVE BUT WE ARE NOT ALLOWED TO INCLUDE "INCLUDE" + // INSIDE WORKFLOW + switch ("${params.pipeline}") { + case "nowayout": + NOWAYOUT() + break + } +} + +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + THE END +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/dbcheck Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,128 @@ +#!/usr/bin/env bash + +########################################################## +# Constants +########################################################## +GREEN=$(tput setaf 2) +RED=$(tput setaf 1) +CYAN=$(tput setaf 6) +CLRESET=$(tput sgr0) +prog_name="nowayout" +dbBuild="03182024" +dbPath="/hpc/db/${prog_name}/$dbBuild" +taxonomyPath="$dbPath/taxonomy" + +usage() +{ + echo + echo usage: "$0" [-h] + echo + echo "Check for species presence in ${prog_name} database(s)." + echo + echo 'Example usage:' + echo + echo 'dbcheck -l' + echo 'dbcheck -g Cathartus' + echo 'dbcheck -d mitomine -g Cathartus' + echo 'dbcheck -d mitomine -s "Cathartus quadriculus"' + echo + echo 'Options:' + echo " -l : List ${prog_name} databases" + echo ' -d : Search this database. Default: mitomine.' + echo ' -g : Genus to search for.' + echo ' -s : "Genus Species" to search for.' + echo ' -h : Show this help message and exit' + echo + echo "$1" +} + +while getopts ":d:g:s:l" OPT; do + case "${OPT}" in + l) + listdb="list" + ;; + d) + dbname=${OPTARG} + ;; + g) + genus=${OPTARG} + ;; + s) + species=${OPTARG} + ;; + ?) + usage + exit 0 + ;; + esac +done + + + +if [ -n "$listdb" ]; then + num_dbs=$(find "$taxonomyPath" -type d | tail -n+2 | wc -l) + echo "==============================================" + + db_num="1" + find $taxonomyPath -type d | tail -n+2 | while read -r db; do + dbName=$(basename "$db") + echo "${db_num}. $dbName" + db_num=$(( db_num + 1 )) + done + echo "==============================================" + echo "Number of ${prog_name} databases: $num_dbs" + echo "==============================================" + + exit 0 +fi + + + +if [ -z "$dbname" ]; then + dbname="mitomine" +fi + +if [[ -n "$genus" && -n "$species" ]]; then + usage "ERROR: Only one of -g or -s needs to be defined!" + exit 1 +elif [ -n "$genus" ]; then + check="$genus" +elif [ -n "$species" ]; then + check="$species" +else + check="" +fi + +if [ -z "$check" ]; then + usage "ERROR: -g or -s is required! check:$check" + exit 1 +fi + +lineages="$taxonomyPath/$dbname/lineages.csv" + +echo +echo -e "Checking ${dbname} for ${CYAN}${check}${CLRESET}...\nPlease wait..." +echo + +num=$(grep -F ",$check," "$lineages" | cut -f1 -d, | sort -u | wc -l) +num_species=$(tail -n+2 "$lineages" | cut -f8 -d, | sort -u | wc -l) +num_entries=$(tail -n+2 "$lineages" | wc -l) + +echo "$dbname brief stats" +echo "==============================================" +echo "DB Build: $dbBuild" +echo "Number of unique species: $num_species" +echo "Number of accessions in database: $num_entries" +echo "==============================================" + + +if [ "$num" -gt 0 ]; then + echo + echo "${GREEN}$check is present in ${dbname}${CLRESET}." + echo "Number of accessions representing $check: $num" + echo "==============================================" +else + echo "${RED}$check is absent in ${dbname}${CLRESET}." + echo -e "No worries. Please request the developer of\n${prog_name} to augment the database!" + echo "==============================================" +fi \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/lib/help/fastp.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,280 @@ +// Help text for fastp within CPIPES. + +def fastpHelp(params) { + + Map tool = [:] + Map toolspecs = [:] + tool.text = [:] + tool.helpparams = [:] + + toolspecs = [ + 'fastp_run': [ + clihelp: 'Run fastp tool. Default: ' + + (params.fastp_run ?: false), + cliflag: null, + clivalue: null + ], + 'fastp_failed_out': [ + clihelp: 'Specify whether to store reads that cannot pass the filters. ' + + "Default: ${params.fastp_failed_out}", + cliflag: null, + clivalue: null + ], + 'fastp_merged_out': [ + clihelp: 'Specify whether to store merged output or not. ' + + "Default: ${params.fastp_merged_out}", + cliflag: null, + clivalue: null + ], + 'fastp_overlapped_out': [ + clihelp: 'For each read pair, output the overlapped region if it has no mismatched base. ' + + "Default: ${params.fastp_overlapped_out}", + cliflag: '--overlapped_out', + clivalue: (params.fastp_overlapped_out ?: '') + ], + 'fastp_6': [ + clihelp: "Indicate that the input is using phred64 scoring (it'll be converted to phred33, " + + 'so the output will still be phred33). ' + + "Default: ${params.fastp_6}", + cliflag: '-6', + clivalue: (params.fastp_6 ? ' ' : '') + ], + 'fastp_reads_to_process': [ + clihelp: 'Specify how many reads/pairs are to be processed. Default value 0 means ' + + 'process all reads. ' + + "Default: ${params.fastp_reads_to_process}", + cliflag: '--reads_to_process', + clivalue: (params.fastp_reads_to_process ?: '') + ], + 'fastp_fix_mgi_id': [ + clihelp: 'The MGI FASTQ ID format is not compatible with many BAM operation tools, ' + + 'enable this option to fix it. ' + + "Default: ${params.fastp_fix_mgi_id}", + cliflag: '--fix_mgi_id', + clivalue: (params.fastp_fix_mgi_id ? ' ' : '') + ], + 'fastp_A': [ + clihelp: 'Disable adapter trimming. On by default. ' + + "Default: ${params.fastp_A}", + cliflag: '-A', + clivalue: (params.fastp_A ? ' ' : '') + ], + 'fastp_adapter_fasta': [ + clihelp: 'Specify a FASTA file to trim both read1 and read2 (if PE) by all the sequences ' + + 'in this FASTA file. ' + + "Default: ${params.fastp_adapter_fasta}", + cliflag: '--adapter_fasta', + clivalue: (params.fastp_adapter_fasta ?: '') + ], + 'fastp_f': [ + clihelp: 'Trim how many bases in front of read1. ' + + "Default: ${params.fastp_f}", + cliflag: '-f', + clivalue: (params.fastp_f ?: '') + ], + 'fastp_t': [ + clihelp: 'Trim how many bases at the end of read1. ' + + "Default: ${params.fastp_t}", + cliflag: '-t', + clivalue: (params.fastp_t ?: '') + ], + 'fastp_b': [ + clihelp: 'Max length of read1 after trimming. ' + + "Default: ${params.fastp_b}", + cliflag: '-b', + clivalue: (params.fastp_b ?: '') + ], + 'fastp_F': [ + clihelp: 'Trim how many bases in front of read2. ' + + "Default: ${params.fastp_F}", + cliflag: '-F', + clivalue: (params.fastp_F ?: '') + ], + 'fastp_T': [ + clihelp: 'Trim how many bases at the end of read2. ' + + "Default: ${params.fastp_T}", + cliflag: '-T', + clivalue: (params.fastp_T ?: '') + ], + 'fastp_B': [ + clihelp: 'Max length of read2 after trimming. ' + + "Default: ${params.fastp_B}", + cliflag: '-B', + clivalue: (params.fastp_B ?: '') + ], + 'fastp_dedup': [ + clihelp: 'Enable deduplication to drop the duplicated reads/pairs. ' + + "Default: ${params.fastp_dedup}", + cliflag: '--dedup', + clivalue: (params.fastp_dedup ? ' ' : '') + ], + 'fastp_dup_calc_accuracy': [ + clihelp: 'Accuracy level to calculate duplication (1~6), higher level uses more memory ' + + '(1G, 2G, 4G, 8G, 16G, 24G). Default 1 for no-dedup mode, and 3 for dedup mode. ' + + "Default: ${params.fastp_dup_calc_accuracy}", + cliflag: '--dup_calc_accuracy', + clivalue: (params.fastp_dup_calc_accuracy ?: '') + ], + 'fastp_poly_g_min_len': [ + clihelp: 'The minimum length to detect polyG in the read tail. ' + + "Default: ${params.fastp_poly_g_min_len}", + cliflag: '--poly_g_min_len', + clivalue: (params.fastp_poly_g_min_len ?: '') + ], + 'fastp_G': [ + clihelp: 'Disable polyG tail trimming. ' + + "Default: ${params.fastp_G}", + cliflag: '-G', + clivalue: (params.fastp_G ? ' ' : '') + ], + 'fastp_x': [ + clihelp: "Enable polyX trimming in 3' ends. " + + "Default: ${params.fastp_x}", + cliflag: 'x=', + clivalue: (params.fastp_x ? ' ' : '') + ], + 'fastp_poly_x_min_len': [ + clihelp: 'The minimum length to detect polyX in the read tail. ' + + "Default: ${params.fastp_poly_x_min_len}", + cliflag: '--poly_x_min_len', + clivalue: (params.fastp_poly_x_min_len ?: '') + ], + 'fastp_cut_front': [ + clihelp: "Move a sliding window from front (5') to tail, drop the bases in the window " + + 'if its mean quality < threshold, stop otherwise. ' + + "Default: ${params.fastp_cut_front}", + cliflag: '--cut_front', + clivalue: (params.fastp_cut_front ? ' ' : '') + ], + 'fastp_cut_tail': [ + clihelp: "Move a sliding window from tail (3') to front, drop the bases in the window " + + 'if its mean quality < threshold, stop otherwise. ' + + "Default: ${params.fastp_cut_tail}", + cliflag: '--cut_tail', + clivalue: (params.fastp_cut_tail ? ' ' : '') + ], + 'fastp_cut_right': [ + clihelp: "Move a sliding window from tail, drop the bases in the window and the right part " + + 'if its mean quality < threshold, and then stop. ' + + "Default: ${params.fastp_cut_right}", + cliflag: '--cut_right', + clivalue: (params.fastp_cut_right ? ' ' : '') + ], + 'fastp_W': [ + clihelp: "Sliding window size shared by --fastp_cut_front, --fastp_cut_tail and " + + '--fastp_cut_right. ' + + "Default: ${params.fastp_W}", + cliflag: '--cut_window_size', + clivalue: (params.fastp_W ?: '') + ], + 'fastp_M': [ + clihelp: "The mean quality requirement shared by --fastp_cut_front, --fastp_cut_tail and " + + '--fastp_cut_right. ' + + "Default: ${params.fastp_M}", + cliflag: '--cut_mean_quality', + clivalue: (params.fastp_M ?: '') + ], + 'fastp_q': [ + clihelp: 'The quality value below which a base should is not qualified. ' + + "Default: ${params.fastp_q}", + cliflag: '-q', + clivalue: (params.fastp_q ?: '') + ], + 'fastp_u': [ + clihelp: 'What percent of bases are allowed to be unqualified. ' + + "Default: ${params.fastp_u}", + cliflag: '-u', + clivalue: (params.fastp_u ?: '') + ], + 'fastp_n': [ + clihelp: "How many N's can a read have. " + + "Default: ${params.fastp_n}", + cliflag: '-n', + clivalue: (params.fastp_n ?: '') + ], + 'fastp_e': [ + clihelp: "If the full reads' average quality is below this value, then it is discarded. " + + "Default: ${params.fastp_e}", + cliflag: '-e', + clivalue: (params.fastp_e ?: '') + ], + 'fastp_l': [ + clihelp: 'Reads shorter than this length will be discarded. ' + + "Default: ${params.fastp_l}", + cliflag: '-l', + clivalue: (params.fastp_l ?: '') + ], + 'fastp_max_len': [ + clihelp: 'Reads longer than this length will be discarded. ' + + "Default: ${params.fastp_max_len}", + cliflag: '--length_limit', + clivalue: (params.fastp_max_len ?: '') + ], + 'fastp_y': [ + clihelp: 'Enable low complexity filter. The complexity is defined as the percentage ' + + 'of bases that are different from its next base (base[i] != base[i+1]). ' + + "Default: ${params.fastp_y}", + cliflag: '-y', + clivalue: (params.fastp_y ? ' ' : '') + ], + 'fastp_Y': [ + clihelp: 'The threshold for low complexity filter (0~100). Ex: A value of 30 means ' + + '30% complexity is required. ' + + "Default: ${params.fastp_Y}", + cliflag: '-Y', + clivalue: (params.fastp_Y ?: '') + ], + 'fastp_U': [ + clihelp: 'Enable Unique Molecular Identifier (UMI) pre-processing. ' + + "Default: ${params.fastp_U}", + cliflag: '-U', + clivalue: (params.fastp_U ? ' ' : '') + ], + 'fastp_umi_loc': [ + clihelp: 'Specify the location of UMI, can be one of ' + + 'index1/index2/read1/read2/per_index/per_read. ' + + "Default: ${params.fastp_umi_loc}", + cliflag: '--umi_loc', + clivalue: (params.fastp_umi_loc ?: '') + ], + 'fastp_umi_len': [ + clihelp: 'If the UMI is in read1 or read2, its length should be provided. ' + + "Default: ${params.fastp_umi_len}", + cliflag: '--umi_len', + clivalue: (params.fastp_umi_len ?: '') + ], + 'fastp_umi_prefix': [ + clihelp: 'If specified, an underline will be used to connect prefix and UMI ' + + '(i.e. prefix=UMI, UMI=AATTCG, final=UMI_AATTCG). ' + + "Default: ${params.fastp_umi_prefix}", + cliflag: '--umi_prefix', + clivalue: (params.fastp_umi_prefix ?: '') + ], + 'fastp_umi_skip': [ + clihelp: 'If the UMI is in read1 or read2, fastp can skip several bases following the UMI. ' + + "Default: ${params.fastp_umi_skip}", + cliflag: '--umi_skip', + clivalue: (params.fastp_umi_skip ?: '') + ], + 'fastp_p': [ + clihelp: 'Enable overrepresented sequence analysis. ' + + "Default: ${params.fastp_p}", + cliflag: '-p', + clivalue: (params.fastp_p ? ' ' : '') + ], + 'fastp_P': [ + clihelp: 'One in this many number of reads will be computed for overrepresentation analysis ' + + '(1~10000), smaller is slower. ' + + "Default: ${params.fastp_P}", + cliflag: '-P', + clivalue: (params.fastp_P ?: '') + ] + ] + + toolspecs.each { + k, v -> tool.text['--' + k] = "${v.clihelp}" + tool.helpparams[k] = [ cliflag: "${v.cliflag}", clivalue: v.clivalue ] + } + + return tool +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/lib/help/gsalkronapy.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,58 @@ +// Help text for `gen_salmon_tph_and_krona_tsv.py` (gsalkronapy) within CPIPES. + +def gsalkronapyHelp(params) { + + Map tool = [:] + Map toolspecs = [:] + tool.text = [:] + tool.helpparams = [:] + + toolspecs = [ + 'gsalkronapy_run': [ + clihelp: 'Run the `gen_salmon_tph_and_krona_tsv.py` script. Default: ' + + (params.gsalkronapy_run ?: false), + cliflag: null, + clivalue: null + ], + 'gsalkronapy_sf': [ + clihelp: 'Set the scaling factor by which TPM values ' + + 'are scaled down.' + + " Default: ${params.gsalkronapy_sf}", + cliflag: '-sf', + clivalue: (params.gsalkronapy_sf ?: '') + ], + 'gsalkronapy_smres_suffix': [ + clihelp: 'Find the `sourmash gather` result files ' + + 'ending in this suffix.' + + " Default: ${params.gsalkronapy_smres_suffix}", + cliflag: '-smres-suffix', + clivalue: (params.gsalkronapy_smres_suffix ?: '') + ], + 'gsalkronapy_failed_suffix': [ + clihelp: 'Find the sample names which failed classification stored ' + + 'inside the files ending in this suffix.' + + " Default: ${params.gsalkronapy_failed_suffix}", + cliflag: '-failed-suffix', + clivalue: (params.gsalkronapy_failed_suffix ?: '') + ], + 'gsalkronapy_num_lin_cols': [ + clihelp: 'Number of columns expected in the lineages CSV file. ' + + " Default: ${params.gsalkronapy_num_lin_cols}", + cliflag: '-num-lin-cols', + clivalue: (params.gsalkronapy_num_lin_cols ?: '') + ], + 'gsalkronapy_lin_regex': [ + clihelp: 'Number of columns expected in the lineages CSV file. ' + + " Default: ${params.gsalkronapy_num_lin_cols}", + cliflag: '-num-lin-cols', + clivalue: (params.gsalkronapy_num_lin_cols ?: '') + ] + ] + + toolspecs.each { + k, v -> tool.text['--' + k] = "${v.clihelp}" + tool.helpparams[k] = [ cliflag: "${v.cliflag}", clivalue: v.clivalue ] + } + + return tool +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/lib/help/gsatpy.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,32 @@ +// Help text for gen_sim_abn_table.py (gsat) within CPIPES. + +def gsatpyHelp(params) { + + Map tool = [:] + Map toolspecs = [:] + tool.text = [:] + tool.helpparams = [:] + + toolspecs = [ + 'gsatpy_run': [ + clihelp: 'Run the gen_sim_abn_table.py script. Default: ' + + (params.gsatpy_run ?: false), + cliflag: null, + clivalue: null + ], + 'gsatpy_header': [ + clihelp: 'Does the taxonomic summary result files have ' + + 'a header line. ' + + " Default: ${params.gsatpy_header}", + cliflag: '-header', + clivalue: (params.gsatpy_header ? ' ' : '') + ] + ] + + toolspecs.each { + k, v -> tool.text['--' + k] = "${v.clihelp}" + tool.helpparams[k] = [ cliflag: "${v.cliflag}", clivalue: v.clivalue ] + } + + return tool +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/lib/help/kmaalign.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,200 @@ +// Help text for kma align within CPIPES. + +def kmaalignHelp(params) { + + Map tool = [:] + Map toolspecs = [:] + tool.text = [:] + tool.helpparams = [:] + + toolspecs = [ + 'kmaalign_run': [ + clihelp: 'Run kma tool. Default: ' + + (params.kmaalign_run ?: false), + cliflag: null, + clivalue: null + ], + 'kmaalign_int': [ + clihelp: 'Input file has interleaved reads. ' + + " Default: ${params.kmaalign_int}", + cliflag: '-int', + clivalue: (params.kmaalign_int ? ' ' : '') + ], + 'kmaalign_ef': [ + clihelp: 'Output additional features. ' + + "Default: ${params.kmaalign_ef}", + cliflag: '-ef', + clivalue: (params.kmaalign_ef ? ' ' : '') + ], + 'kmaalign_vcf': [ + clihelp: 'Output vcf file. 2 to apply FT. ' + + "Default: ${params.kmaalign_vcf}", + cliflag: '-vcf', + clivalue: (params.kmaalign_vcf ? ' ' : '') + ], + 'kmaalign_sam': [ + clihelp: 'Output SAM, 4/2096 for mapped/aligned. ' + + "Default: ${params.kmaalign_sam}", + cliflag: '-sam', + clivalue: (params.kmaalign_sam ? ' ' : '') + ], + 'kmaalign_nc': [ + clihelp: 'No consensus file. ' + + "Default: ${params.kmaalign_nc}", + cliflag: '-nc', + clivalue: (params.kmaalign_nc ? ' ' : '') + ], + 'kmaalign_na': [ + clihelp: 'No aln file. ' + + "Default: ${params.kmaalign_na}", + cliflag: '-na', + clivalue: (params.kmaalign_na ? ' ' : '') + ], + 'kmaalign_nf': [ + clihelp: 'No frag file. ' + + "Default: ${params.kmaalign_nf}", + cliflag: '-nf', + clivalue: (params.kmaalign_nf ? ' ' : '') + ], + 'kmaalign_a': [ + clihelp: 'Output all template mappings. ' + + "Default: ${params.kmaalign_a}", + cliflag: '-a', + clivalue: (params.kmaalign_a ? ' ' : '') + ], + 'kmaalign_and': [ + clihelp: 'Use both -mrs and p-value on consensus. ' + + "Default: ${params.kmaalign_and}", + cliflag: '-and', + clivalue: (params.kmaalign_and ? ' ' : '') + ], + 'kmaalign_oa': [ + clihelp: 'Use neither -mrs or p-value on consensus. ' + + "Default: ${params.kmaalign_oa}", + cliflag: '-oa', + clivalue: (params.kmaalign_oa ? ' ' : '') + ], + 'kmaalign_bc': [ + clihelp: 'Minimum support to call bases. ' + + "Default: ${params.kmaalign_bc}", + cliflag: '-bc', + clivalue: (params.kmaalign_bc ?: '') + ], + 'kmaalign_bcNano': [ + clihelp: 'Altered indel calling for ONT data. ' + + "Default: ${params.kmaalign_bcNano}", + cliflag: '-bcNano', + clivalue: (params.kmaalign_bcNano ? ' ' : '') + ], + 'kmaalign_bcd': [ + clihelp: 'Minimum depth to call bases. ' + + "Default: ${params.kmaalign_bcd}", + cliflag: '-bcd', + clivalue: (params.kmaalign_bcd ?: '') + ], + 'kmaalign_bcg': [ + clihelp: 'Maintain insignificant gaps. ' + + "Default: ${params.kmaalign_bcg}", + cliflag: '-bcg', + clivalue: (params.kmaalign_bcg ? ' ' : '') + ], + 'kmaalign_ID': [ + clihelp: 'Minimum consensus ID. ' + + "Default: ${params.kmaalign_ID}", + cliflag: '-ID', + clivalue: (params.kmaalign_ID ?: '') + ], + 'kmaalign_md': [ + clihelp: 'Minimum depth. ' + + "Default: ${params.kmaalign_md}", + cliflag: '-md', + clivalue: (params.kmaalign_md ?: '') + ], + 'kmaalign_dense': [ + clihelp: 'Skip insertion in consensus. ' + + "Default: ${params.kmaalign_dense}", + cliflag: '-dense', + clivalue: (params.kmaalign_dense ? ' ' : '') + ], + 'kmaalign_ref_fsa': [ + clihelp: 'Use Ns on indels. ' + + "Default: ${params.kmaalign_ref_fsa}", + cliflag: '-ref_fsa', + clivalue: (params.kmaalign_ref_fsa ? ' ' : '') + ], + 'kmaalign_Mt1': [ + clihelp: 'Map everything to one template. ' + + "Default: ${params.kmaalign_Mt1}", + cliflag: '-Mt1', + clivalue: (params.kmaalign_Mt1 ? ' ' : '') + ], + 'kmaalign_1t1': [ + clihelp: 'Map one query to one template. ' + + "Default: ${params.kmaalign_1t1}", + cliflag: '-1t1', + clivalue: (params.kmaalign_1t1 ? ' ' : '') + ], + 'kmaalign_mrs': [ + clihelp: 'Minimum relative alignment score. ' + + "Default: ${params.kmaalign_mrs}", + cliflag: '-mrs', + clivalue: (params.kmaalign_mrs ?: '') + ], + 'kmaalign_mrc': [ + clihelp: 'Minimum query coverage. ' + + "Default: ${params.kmaalign_mrc}", + cliflag: '-mrc', + clivalue: (params.kmaalign_mrc ?: '') + ], + 'kmaalign_mp': [ + clihelp: 'Minimum phred score of trailing and leading bases. ' + + "Default: ${params.kmaalign_mp}", + cliflag: '-mp', + clivalue: (params.kmaalign_mp ?: '') + ], + 'kmaalign_mq': [ + clihelp: 'Set the minimum mapping quality. ' + + "Default: ${params.kmaalign_mq}", + cliflag: '-mq', + clivalue: (params.kmaalign_mq ?: '') + ], + 'kmaalign_eq': [ + clihelp: 'Minimum average quality score. ' + + "Default: ${params.kmaalign_eq}", + cliflag: '-eq', + clivalue: (params.kmaalign_eq ?: '') + ], + 'kmaalign_5p': [ + clihelp: 'Trim 5 prime by this many bases. ' + + "Default: ${params.kmaalign_5p}", + cliflag: '-5p', + clivalue: (params.kmaalign_5p ?: '') + ], + 'kmaalign_3p': [ + clihelp: 'Trim 3 prime by this many bases ' + + "Default: ${params.kmaalign_3p}", + cliflag: '-3p', + clivalue: (params.kmaalign_3p ?: '') + ], + 'kmaalign_apm': [ + clihelp: 'Sets both -pm and -fpm ' + + "Default: ${params.kmaalign_apm}", + cliflag: '-apm', + clivalue: (params.kmaalign_apm ?: '') + ], + 'kmaalign_cge': [ + clihelp: 'Set CGE penalties and rewards ' + + "Default: ${params.kmaalign_cge}", + cliflag: '-cge', + clivalue: (params.kmaalign_cge ? ' ' : '') + ], + + ] + + toolspecs.each { + k, v -> tool.text['--' + k] = "${v.clihelp}" + tool.helpparams[k] = [ cliflag: "${v.cliflag}", clivalue: v.clivalue ] + } + + return tool +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/lib/help/kraken2.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,72 @@ +// Help text for kraken2 within CPIPES. + +def kraken2Help(params) { + + Map tool = [:] + Map toolspecs = [:] + tool.text = [:] + tool.helpparams = [:] + + toolspecs = [ + 'kraken2_db': [ + clihelp: "Absolute path to kraken database. Default: ${params.kraken2_db}", + cliflag: '--db', + clivalue: null + ], + 'kraken2_confidence': [ + clihelp: 'Confidence score threshold which must be ' + + "between 0 and 1. Default: ${params.kraken2_confidence}", + cliflag: '--confidence', + clivalue: (params.kraken2_confidence ?: '') + ], + 'kraken2_quick': [ + clihelp: "Quick operation (use first hit or hits). Default: ${params.kraken2_quick}", + cliflag: '--quick', + clivalue: (params.kraken2_quick ? ' ' : '') + ], + 'kraken2_use_mpa_style': [ + clihelp: "Report output like Kraken 1's " + + "kraken-mpa-report. Default: ${params.kraken2_use_mpa_style}", + cliflag: '--use-mpa-style', + clivalue: (params.kraken2_use_mpa_style ? ' ' : '') + ], + 'kraken2_minimum_base_quality': [ + clihelp: 'Minimum base quality used in classification ' + + " which is only effective with FASTQ input. Default: ${params.kraken2_minimum_base_quality}", + cliflag: '--minimum-base-quality', + clivalue: (params.kraken2_minimum_base_quality ?: '') + ], + 'kraken2_report_zero_counts': [ + clihelp: 'Report counts for ALL taxa, even if counts are zero. ' + + "Default: ${params.kraken2_report_zero_counts}", + cliflag: '--report-zero-counts', + clivalue: (params.kraken2_report_zero_counts ? ' ' : '') + ], + 'kraken2_report_minmizer_data': [ + clihelp: 'Report minimizer and distinct minimizer count' + + ' information in addition to normal Kraken report. ' + + "Default: ${params.kraken2_report_minimizer_data}", + cliflag: '--report-minimizer-data', + clivalue: (params.kraken2_report_minimizer_data ? ' ' : '') + ], + 'kraken2_use_names': [ + clihelp: 'Print scientific names instead of just taxids. ' + + "Default: ${params.kraken2_use_names}", + cliflag: '--use-names', + clivalue: (params.kraken2_use_names ? ' ' : '') + ], + 'kraken2_extract_bug': [ + clihelp: 'Extract the reads or contigs beloging to this bug. ' + + "Default: ${params.kraken2_extract_bug}", + cliflag: null, + clivalue: null + ] + ] + + toolspecs.each { + k, v -> tool.text['--' + k] = "${v.clihelp}" + tool.helpparams[k] = [ cliflag: "${v.cliflag}", clivalue: v.clivalue ] + } + + return tool +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/lib/help/kronaktimporttext.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,44 @@ +// Help text for ktImportText (krona) within CPIPES. + +def kronaktimporttextHelp(params) { + + Map tool = [:] + Map toolspecs = [:] + tool.text = [:] + tool.helpparams = [:] + + toolspecs = [ + 'krona_ktIT_run': [ + clihelp: 'Run the ktImportText (ktIT) from krona. Default: ' + + (params.krona_ktIT_run ?: false), + cliflag: null, + clivalue: null + ], + 'krona_ktIT_n': [ + clihelp: 'Name of the highest level. ' + + "Default: ${params.krona_ktIT_n}", + cliflag: '-n', + clivalue: (params.krona_ktIT_n ?: '') + ], + 'krona_ktIT_q': [ + clihelp: 'Input file(s) do not have a field for quantity. ' + + "Default: ${params.krona_ktIT_q}", + cliflag: '-q', + clivalue: (params.krona_ktIT_q ? ' ' : '') + ], + 'krona_ktIT_c': [ + clihelp: 'Combine data from each file, rather than creating separate datasets ' + + 'within the chart. ' + + "Default: ${params.krona_ktIT_c}", + cliflag: '-c', + clivalue: (params.krona_ktIT_c ? ' ' : '') + ] + ] + + toolspecs.each { + k, v -> tool.text['--' + k] = "${v.clihelp}" + tool.helpparams[k] = [ cliflag: "${v.cliflag}", clivalue: v.clivalue ] + } + + return tool +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/lib/help/salmonidx.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,91 @@ +// Help text for salmon index within CPIPES. + +def salmonidxHelp(params) { + + Map tool = [:] + Map toolspecs = [:] + tool.text = [:] + tool.helpparams = [:] + + toolspecs = [ + 'salmonidx_run': [ + clihelp: 'Run `salmon index` tool. Default: ' + + (params.salmonidx_run ?: false), + cliflag: null, + clivalue: null + ], + 'salmonidx_k': [ + clihelp: 'The size of k-mers that should be used for the ' + + " quasi index. Default: ${params.salmonidx_k}", + cliflag: '-k', + clivalue: (params.salmonidx_k ?: '') + ], + 'salmonidx_gencode': [ + clihelp: 'This flag will expect the input transcript FASTA ' + + 'to be in GENCODE format, and will split the transcript ' + + 'name at the first `|` character. These reduced names ' + + 'will be used in the output and when looking for these ' + + 'transcripts in a gene to transcript GTF.' + + " Default: ${params.salmonidx_gencode}", + cliflag: '--gencode', + clivalue: (params.salmonidx_gencode ? ' ' : '') + ], + 'salmonidx_features': [ + clihelp: 'This flag will expect the input reference to be in the ' + + 'tsv file format, and will split the feature name at the first ' + + '`tab` character. These reduced names will be used in the output ' + + 'and when looking for the sequence of the features. GTF.' + + " Default: ${params.salmonidx_features}", + cliflag: '--features', + clivalue: (params.salmonidx_features ? ' ' : '') + ], + 'salmonidx_keepDuplicates': [ + clihelp: 'This flag will disable the default indexing behavior of ' + + 'discarding sequence-identical duplicate transcripts. If this ' + + 'flag is passed then duplicate transcripts that appear in the ' + + 'input will be retained and quantified separately.' + + " Default: ${params.salmonidx_keepDuplicates}", + cliflag: '--keepDuplicates', + clivalue: (params.salmonidx_keepDuplicates ? ' ' : '') + ], + 'salmonidx_keepFixedFasta': [ + clihelp: 'Retain the fixed fasta file (without short ' + + 'transcripts and duplicates, clipped, etc.) generated ' + + "during indexing. Default: ${params.salmonidx_keepFixedFasta}", + cliflag: '--keepFixedFasta', + clivalue: (params.salmonidx_keepFixedFasta ?: '') + ], + 'salmonidx_filterSize': [ + clihelp: 'The size of the Bloom filter that will be used ' + + 'by TwoPaCo during indexing. The filter will be of ' + + 'size 2^{filterSize}. A value of -1 means that the ' + + 'filter size will be automatically set based on the ' + + 'number of distinct k-mers in the input, as estimated by ' + + "nthll. Default: ${params.salmonidx_filterSize}", + cliflag: '--filterSize', + clivalue: (params.salmonidx_filterSize ?: '') + ], + 'salmonidx_sparse': [ + clihelp: 'Build the index using a sparse sampling of k-mer ' + + 'positions This will require less memory (especially ' + + 'during quantification), but will take longer to construct' + + 'and can slow down mapping / alignment.' + + " Default: ${params.salmonidx_sparse}", + cliflag: '--sparse', + clivalue: (params.salmonidx_sparse ? ' ' : '') + ], + 'salmonidx_n': [ + clihelp: 'Do not clip poly-A tails from the ends of target ' + + "sequences. Default: ${params.salmonidx_n}", + cliflag: '-n', + clivalue: (params.salmonidx_n ? ' ' : '') + ] + ] + + toolspecs.each { + k, v -> tool.text['--' + k] = "${v.clihelp}" + tool.helpparams[k] = [ cliflag: "${v.cliflag}", clivalue: v.clivalue ] + } + + return tool +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/lib/help/seqkitgrep.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,75 @@ +// Help text for seqkit `grep` within CPIPES. + +def seqkitgrepHelp(params) { + + Map tool = [:] + Map toolspecs = [:] + tool.text = [:] + tool.helpparams = [:] + + toolspecs = [ + 'seqkit_grep_run': [ + clihelp: 'Run the seqkit `grep` tool. Default: ' + + (params.seqkit_grep_run ?: false), + cliflag: null, + clivalue: null + ], + 'seqkit_grep_n': [ + clihelp: 'Match by full name instead of just ID. ' + + "Default: " + (params.seqkit_grep_n ?: 'undefined'), + cliflag: '--seqkit_grep_n', + clivalue: (params.seqkit_grep_n ? ' ' : '') + ], + 'seqkit_grep_s': [ + clihelp: 'Search subseq on seq, both positive and negative ' + + 'strand are searched, and mismatch allowed using flag --seqkit_grep_m. ' + + "Default: " + (params.seqkit_grep_s ?: 'undefined'), + cliflag: '--seqkit_grep_s', + clivalue: (params.seqkit_grep_s ? ' ' : '') + ], + 'seqkit_grep_c': [ + clihelp: 'Input is circular genome ' + + "Default: " + (params.seqkit_grep_c ?: 'undefined'), + cliflag: '--seqkit_grep_c', + clivalue: (params.seqkit_grep_c ? ' ' : '') + ], + 'seqkit_grep_C': [ + clihelp: 'Just print a count of matching records. With the ' + + '--seqkit_grep_v flag, count non-matching records. ' + + "Default: " + (params.seqkit_grep_v ?: 'undefined'), + cliflag: '--seqkit_grep_v', + clivalue: (params.seqkit_grep_v ? ' ' : '') + ], + 'seqkit_grep_i': [ + clihelp: 'Ignore case while using seqkit grep. ' + + "Default: " + (params.seqkit_grep_i ?: 'undefined'), + cliflag: '--seqkit_grep_i', + clivalue: (params.seqkit_grep_i ? ' ' : '') + ], + 'seqkit_grep_v': [ + clihelp: 'Invert the match i.e. select non-matching records. ' + + "Default: " + (params.seqkit_grep_v ?: 'undefined'), + cliflag: '--seqkit_grep_v', + clivalue: (params.seqkit_grep_v ? ' ' : '') + ], + 'seqkit_grep_m': [ + clihelp: 'Maximum mismatches when matching by sequence. ' + + "Default: " + (params.seqkit_grep_m ?: 'undefined'), + cliflag: '--seqkit_grep_m', + clivalue: (params.seqkit_grep_v ?: '') + ], + 'seqkit_grep_r': [ + clihelp: 'Input patters are regular expressions. ' + + "Default: " + (params.seqkit_grep_m ?: 'undefined'), + cliflag: '--seqkit_grep_m', + clivalue: (params.seqkit_grep_v ?: '') + ] + ] + + toolspecs.each { + k, v -> tool.text['--' + k] = "${v.clihelp}" + tool.helpparams[k] = [ cliflag: "${v.cliflag}", clivalue: v.clivalue ] + } + + return tool +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/lib/help/seqkitrmdup.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,61 @@ +// Help text for seqkit rmdup within CPIPES. + +def seqkitrmdupHelp(params) { + + Map tool = [:] + Map toolspecs = [:] + tool.text = [:] + tool.helpparams = [:] + + toolspecs = [ + 'seqkit_rmdup_run': [ + clihelp: 'Remove duplicate sequences using seqkit rmdup. Default: ' + + (params.seqkit_rmdup_run ?: false), + cliflag: null, + clivalue: null + ], + 'seqkit_rmdup_n': [ + clihelp: 'Match and remove duplicate sequences by full name instead of just ID. ' + + "Default: ${params.seqkit_rmdup_n}", + cliflag: '-n', + clivalue: (params.seqkit_rmdup_n ? ' ' : '') + ], + 'seqkit_rmdup_s': [ + clihelp: 'Match and remove duplicate sequences by sequence content. ' + + "Default: ${params.seqkit_rmdup_s}", + cliflag: '-s', + clivalue: (params.seqkit_rmdup_s ? ' ' : '') + ], + 'seqkit_rmdup_d': [ + clihelp: 'Save the duplicated sequences to a file. ' + + "Default: ${params.seqkit_rmdup_d}", + cliflag: null, + clivalue: null + ], + 'seqkit_rmdup_D': [ + clihelp: 'Save the number and list of duplicated sequences to a file. ' + + "Default: ${params.seqkit_rmdup_D}", + cliflag: null, + clivalue: null + ], + 'seqkit_rmdup_i': [ + clihelp: 'Ignore case while using seqkit rmdup. ' + + "Default: ${params.seqkit_rmdup_i}", + cliflag: '-i', + clivalue: (params.seqkit_rmdup_i ? ' ' : '') + ], + 'seqkit_rmdup_P': [ + clihelp: "Only consider positive strand (i.e. 5') when comparing by sequence content. " + + "Default: ${params.seqkit_rmdup_P}", + cliflag: '-P', + clivalue: (params.seqkit_rmdup_P ? ' ' : '') + ] + ] + + toolspecs.each { + k, v -> tool.text['--' + k] = "${v.clihelp}" + tool.helpparams[k] = [ cliflag: "${v.cliflag}", clivalue: v.clivalue ] + } + + return tool +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/lib/help/sfhpy.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,58 @@ +// Help text for sourmash_filter_hits.py (sfhpy) within CPIPES. +def sfhpyHelp(params) { + + Map tool = [:] + Map toolspecs = [:] + tool.text = [:] + tool.helpparams = [:] + + toolspecs = [ + 'sfhpy_run': [ + clihelp: 'Run the sourmash_filter_hits.py ' + + 'script. Default: ' + + (params.sfhpy_run ?: false), + cliflag: null, + clivalue: null + ], + 'sfhpy_fcn': [ + clihelp: 'Column name by which filtering of rows should be applied. ' + + "Default: ${params.sfhpy_fcn}", + cliflag: '-fcn', + clivalue: (params.sfhpy_fcn ?: '') + ], + 'sfhpy_fcv': [ + clihelp: 'Remove genomes whose match with the query FASTQ is less than ' + + 'this much. ' + + "Default: ${params.sfhpy_fcv}", + cliflag: '-fcv', + clivalue: (params.sfhpy_fcv ?: '') + ], + 'sfhpy_gt': [ + clihelp: 'Apply greather than or equal to condition on numeric values of ' + + '--sfhpy_fcn column. ' + + "Default: ${params.sfhpy_gt}", + cliflag: '-gt', + clivalue: (params.sfhpy_gt ? ' ' : '') + ], + 'sfhpy_lt': [ + clihelp: 'Apply less than or equal to condition on numeric values of ' + + '--sfhpy_fcn column. ' + + "Default: ${params.sfhpy_lt}", + cliflag: '-gt', + clivalue: (params.sfhpy_lt ? ' ' : '') + ], + 'sfhpy_all': [ + clihelp: 'Instead of just the column value, print entire row. ' + + "Default: ${params.sfhpy_all}", + cliflag: '-all', + clivalue: (params.sfhpy_all ? ' ' : '') + ], + ] + + toolspecs.each { + k, v -> tool.text['--' + k] = "${v.clihelp}" + tool.helpparams[k] = [ cliflag: "${v.cliflag}", clivalue: v.clivalue ] + } + + return tool +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/lib/help/sourmashgather.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,86 @@ +// Help text for sourmash gather within CPIPES.mashsketch + +def sourmashgatherHelp(params) { + + Map tool = [:] + Map toolspecs = [:] + tool.text = [:] + tool.helpparams = [:] + + toolspecs = [ + 'sourmashgather_run': [ + clihelp: 'Run `sourmash gather` tool. Default: ' + + (params.sourmashgather_run ?: false), + cliflag: null, + clivalue: null + ], + 'sourmashgather_n': [ + clihelp: 'Number of results to report. ' + + 'By default, will terminate at --sourmashgather_thr_bp value. ' + + "Default: ${params.sourmashgather_n}", + cliflag: '-n', + clivalue: (params.sourmashgather_n ?: '') + ], + 'sourmashgather_thr_bp': [ + clihelp: 'Reporting threshold (in bp) for estimated overlap with remaining query. ' + + "Default: ${params.sourmashgather_thr_bp}", + cliflag: '--threshold-bp', + clivalue: (params.sourmashgather_thr_bp ?: '') + ], + 'sourmashgather_ani_ci': [ + clihelp: 'Output confidence intervals for ANI estimates. ' + + "Default: ${params.sourmashgather_ani_ci}", + cliflag: '--estimate-ani-ci', + clivalue: (params.sourmashgather_ani_ci ? ' ' : '') + ], + 'sourmashgather_k': [ + clihelp: 'The k-mer size to select. ' + + "Default: ${params.sourmashgather_k}", + cliflag: '-k', + clivalue: (params.sourmashgather_k ?: '') + ], + 'sourmashgather_dna': [ + clihelp: 'Choose DNA signature. ' + + "Default: ${params.sourmashgather_dna}", + cliflag: '--dna', + clivalue: (params.sourmashgather_dna ? ' ' : '') + ], + 'sourmashgather_rna': [ + clihelp: 'Choose RNA signature. ' + + "Default: ${params.sourmashgather_rna}", + cliflag: '--rna', + clivalue: (params.sourmashgather_rna ? ' ' : '') + ], + 'sourmashgather_nuc': [ + clihelp: 'Choose Nucleotide signature. ' + + "Default: ${params.sourmashgather_nuc}", + cliflag: '--nucleotide', + clivalue: (params.sourmashgather_nuc ? ' ' : '') + ], + 'sourmashgather_scaled': [ + clihelp: 'Scaled value should be between 100 and 1e6. ' + + "Default: ${params.sourmashgather_scaled}", + cliflag: '--scaled', + clivalue: (params.sourmashgather_scaled ?: '') + ], + 'sourmashgather_inc_pat': [ + clihelp: 'Search only signatures that match this pattern in name, filename, or md5. ' + + "Default: ${params.sourmashgather_inc_pat}", + cliflag: '--include-db-pattern', + clivalue: (params.sourmashgather_inc_pat ?: '') + ], + 'sourmashgather_exc_pat': [ + clihelp: 'Search only signatures that do not match this pattern in name, filename, or md5. ' + + "Default: ${params.sourmashgather_exc_pat}", + cliflag: '--exclude-db-pattern', + clivalue: (params.sourmashgather_exc_pat ?: '') + ] + ] + + toolspecs.each { + k, v -> tool.text['--' + k] = "${v.clihelp}" + tool.helpparams[k] = [ cliflag: "${v.cliflag}", clivalue: v.clivalue ] + } + + return tool +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/lib/help/sourmashsearch.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,134 @@ +// Help text for sourmash search within CPIPES. + +def sourmashsearchHelp(params) { + + Map tool = [:] + Map toolspecs = [:] + tool.text = [:] + tool.helpparams = [:] + + toolspecs = [ + 'sourmashsearch_run': [ + clihelp: 'Run `sourmash search` tool. Default: ' + + (params.sourmashsearch_run ?: false), + cliflag: null, + clivalue: null + ], + 'sourmashsearch_n': [ + clihelp: 'Number of results to report. ' + + 'By default, will terminate at --sourmashsearch_thr value. ' + + "Default: ${params.sourmashsearch_n}", + cliflag: '-n', + clivalue: (params.sourmashsearch_n ?: '') + ], + 'sourmashsearch_thr': [ + clihelp: 'Reporting threshold (similarity) to return results. ' + + "Default: ${params.sourmashsearch_thr}", + cliflag: '--threshold', + clivalue: (params.sourmashsearch_thr ?: '') + ], + 'sourmashsearch_contain': [ + clihelp: 'Score based on containment rather than similarity. ' + + "Default: ${params.sourmashsearch_contain}", + cliflag: '--containment', + clivalue: (params.sourmashsearch_contain ? ' ' : '') + ], + 'sourmashsearch_maxcontain': [ + clihelp: 'Score based on max containment rather than similarity. ' + + "Default: ${params.sourmashsearch_contain}", + cliflag: '--max-containment', + clivalue: (params.sourmashsearch_maxcontain ? ' ' : '') + ], + 'sourmashsearch_ignoreabn': [ + clihelp: 'Do NOT use k-mer abundances if present. ' + + "Default: ${params.sourmashsearch_ignoreabn}", + cliflag: '--ignore-abundance', + clivalue: (params.sourmashsearch_ignoreabn ? ' ' : '') + ], + 'sourmashsearch_ani_ci': [ + clihelp: 'Output confidence intervals for ANI estimates. ' + + "Default: ${params.sourmashsearch_ani_ci}", + cliflag: '--estimate-ani-ci', + clivalue: (params.sourmashsearch_ani_ci ? ' ' : '') + ], + 'sourmashsearch_k': [ + clihelp: 'The k-mer size to select. ' + + "Default: ${params.sourmashsearch_k}", + cliflag: '-k', + clivalue: (params.sourmashsearch_k ?: '') + ], + 'sourmashsearch_protein': [ + clihelp: 'Choose a protein signature. ' + + "Default: ${params.sourmashsearch_protein}", + cliflag: '--protein', + clivalue: (params.sourmashsearch_protein ? ' ' : '') + ], + 'sourmashsearch_noprotein': [ + clihelp: 'Do not choose a protein signature. ' + + "Default: ${params.sourmashsearch_noprotein}", + cliflag: '--no-protein', + clivalue: (params.sourmashsearch_noprotein ? ' ' : '') + ], + 'sourmashsearch_dayhoff': [ + clihelp: 'Choose Dayhoff-encoded amino acid signatures. ' + + "Default: ${params.sourmashsearch_dayhoff}", + cliflag: '--dayhoff', + clivalue: (params.sourmashsearch_dayhoff ? ' ' : '') + ], + 'sourmashsearch_nodayhoff': [ + clihelp: 'Do not choose Dayhoff-encoded amino acid signatures. ' + + "Default: ${params.sourmashsearch_nodayhoff}", + cliflag: '--no-dayhoff', + clivalue: (params.sourmashsearch_nodayhoff ? ' ' : '') + ], + 'sourmashsearch_hp': [ + clihelp: 'Choose hydrophobic-polar-encoded amino acid signatures. ' + + "Default: ${params.sourmashsearch_hp}", + cliflag: '--hp', + clivalue: (params.sourmashsearch_hp ? ' ' : '') + ], + 'sourmashsearch_nohp': [ + clihelp: 'Do not choose hydrophobic-polar-encoded amino acid signatures. ' + + "Default: ${params.sourmashsearch_nohp}", + cliflag: '--no-hp', + clivalue: (params.sourmashsearch_nohp ? ' ' : '') + ], + 'sourmashsearch_dna': [ + clihelp: 'Choose DNA signature. ' + + "Default: ${params.sourmashsearch_dna}", + cliflag: '--dna', + clivalue: (params.sourmashsearch_dna ? ' ' : '') + ], + 'sourmashsearch_nodna': [ + clihelp: 'Do not choose DNA signature. ' + + "Default: ${params.sourmashsearch_nodna}", + cliflag: '--no-dna', + clivalue: (params.sourmashsearch_nodna ? ' ' : '') + ], + 'sourmashsearch_scaled': [ + clihelp: 'Scaled value should be between 100 and 1e6. ' + + "Default: ${params.sourmashsearch_scaled}", + cliflag: '--scaled', + clivalue: (params.sourmashsearch_scaled ?: '') + ], + 'sourmashsearch_inc_pat': [ + clihelp: 'Search only signatures that match this pattern in name, filename, or md5. ' + + "Default: ${params.sourmashsearch_inc_pat}", + cliflag: '--include-db-pattern', + clivalue: (params.sourmashsearch_inc_pat ?: '') + ], + 'sourmashsearch_exc_pat': [ + clihelp: 'Search only signatures that do not match this pattern in name, filename, or md5. ' + + "Default: ${params.sourmashsearch_exc_pat}", + cliflag: '--exclude-db-pattern', + clivalue: (params.sourmashsearch_exc_pat ?: '') + ] + ] + + toolspecs.each { + k, v -> tool.text['--' + k] = "${v.clihelp}" + tool.helpparams[k] = [ cliflag: "${v.cliflag}", clivalue: v.clivalue ] + } + + return tool +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/lib/help/sourmashsketch.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,61 @@ +// Help text for sourmash sketch dna within CPIPES. + +def sourmashsketchHelp(params) { + + Map tool = [:] + Map toolspecs = [:] + tool.text = [:] + tool.helpparams = [:] + + toolspecs = [ + 'sourmashsketch_run': [ + clihelp: 'Run `sourmash sketch dna` tool. Default: ' + + (params.sourmashsketch_run ?: false), + cliflag: null, + clivalue: null + ], + 'sourmashsketch_mode': [ + clihelp: "Select which type of signatures to be created: dna, protein, fromfile or translate. " + + "Default: ${params.sourmashsketch_mode}", + cliflag: "${params.sourmashsketch_mode}", + clivalue: ' ' + ], + 'sourmashsketch_p': [ + clihelp: 'Signature parameters to use. ' + + "Default: ${params.sourmashsketch_p}", + cliflag: '-p', + clivalue: (params.sourmashsketch_p ?: '') + ], + 'sourmashsketch_file': [ + clihelp: '<path> A text file containing a list of sequence files to load. ' + + "Default: ${params.sourmashsketch_file}", + cliflag: '--from-file', + clivalue: (params.sourmashsketch_file ?: '') + ], + 'sourmashsketch_f': [ + clihelp: 'Recompute signatures even if the file exists. ' + + "Default: ${params.sourmashsketch_f}", + cliflag: '-f', + clivalue: (params.sourmashsketch_f ? ' ' : '') + ], + 'sourmashsketch_name': [ + clihelp: 'Name the signature generated from each file after the first record in the file. ' + + "Default: ${params.sourmashsketch_name}", + cliflag: '--name-from-first', + clivalue: (params.sourmashsketch_name ? ' ' : '') + ], + 'sourmashsketch_randomize': [ + clihelp: 'Shuffle the list of input files randomly. ' + + "Default: ${params.sourmashsketch_randomize}", + cliflag: '--randomize', + clivalue: (params.sourmashsketch_randomize ? ' ' : '') + ] + ] + + toolspecs.each { + k, v -> tool.text['--' + k] = "${v.clihelp}" + tool.helpparams[k] = [ cliflag: "${v.cliflag}", clivalue: v.clivalue ] + } + + return tool +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/lib/help/sourmashtaxmetagenome.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,69 @@ +// Help text for sourmash tax metagenome within CPIPES. + +def sourmashtaxmetagenomeHelp(params) { + + Map tool = [:] + Map toolspecs = [:] + tool.text = [:] + tool.helpparams = [:] + + toolspecs = [ + 'sourmashtaxmetagenome_run': [ + clihelp: 'Run `sourmash tax metagenome` tool. Default: ' + + (params.sourmashtaxmetagenome_run ?: false), + cliflag: null, + clivalue: null + ], + 'sourmashtaxmetagenome_t': [ + clihelp: "Taxonomy CSV file. " + + "Default: ${params.sourmashtaxmetagenome_t}", + cliflag: '-t', + clivalue: (params.sourmashtaxmetagenome_t ?: '') + ], + 'sourmashtaxmetagenome_r': [ + clihelp: 'For non-default output formats: Summarize genome' + + ' taxonomy at this rank and above. Note that the taxonomy CSV must' + + ' contain lineage information at this rank.' + + " Default: ${params.sourmashtaxmetagenome_r}", + cliflag: '-r', + clivalue: (params.sourmashtaxmetagenome_r ?: '') + ], + 'sourmashtaxmetagenome_F': [ + clihelp: 'Choose output format. ' + + "Default: ${params.sourmashtaxmetagenome_F}", + cliflag: '--output-format', + clivalue: (params.sourmashtaxmetagenome_F ?: '') + ], + 'sourmashtaxmetagenome_f': [ + clihelp: 'Continue past errors in taxonomy database loading. ' + + "Default: ${params.sourmashtaxmetagenome_f}", + cliflag: '-f', + clivalue: (params.sourmashtaxmetagenome_f ?: '') + ], + 'sourmashtaxmetagenome_kfi': [ + clihelp: 'Do not split identifiers on whitespace. ' + + "Default: ${params.sourmashtaxmetagenome_kfi}", + cliflag: '--keep-full-identifiers', + clivalue: (params.sourmashtaxmetagenome_kfi ? ' ' : '') + ], + 'sourmashtaxmetagenome_kiv': [ + clihelp: 'After splitting identifiers do not remove accession versions. ' + + "Default: ${params.sourmashtaxmetagenome_kiv}", + cliflag: '--keep-identifier-versions', + clivalue: (params.sourmashtaxmetagenome_kiv ?: '') + ], + 'sourmashtaxmetagenome_fomt': [ + clihelp: 'Fail quickly if taxonomy is not available for an identifier. ' + + "Default: ${params.sourmashtaxmetagenome_fomt}", + cliflag: '--fail-on-missing-taxonomy', + clivalue: (params.sourmashtaxmetagenome_fomt ? ' ' : '') + ] + ] + + toolspecs.each { + k, v -> tool.text['--' + k] = "${v.clihelp}" + tool.helpparams[k] = [ cliflag: "${v.cliflag}", clivalue: v.clivalue ] + } + + return tool +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/lib/routines.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,391 @@ +// Hold methods to print: +// 1. Colored logo. +// 2. Summary of parameters. +// 3. Single dashed line. +// 4. Double dashed line. +// + +import groovy.json.JsonSlurper +import nextflow.config.ConfigParser +// import groovy.json.JsonOutput + +// ASCII logo +def pipelineBanner() { + + def padding = (params.pad) ?: 30 + Map fgcolors = getANSIColors() + + def banner = [ + name: "${fgcolors.magenta}${workflow.manifest.name}${fgcolors.reset}", + author: "${fgcolors.cyan}${workflow.manifest.author}${fgcolors.reset}", + // workflow: "${fgcolors.magenta}${params.pipeline}${fgcolors.reset}", + version: "${fgcolors.green}${workflow.manifest.version}${fgcolors.reset}", + center: "${fgcolors.green}${params.center}${fgcolors.reset}", + pad: padding + ] + + manifest = addPadding(banner) + + return """${fgcolors.white}${dashedLine(type: '=')}${fgcolors.magenta} + (o) + ___ _ __ _ _ __ ___ ___ + / __|| '_ \\ | || '_ \\ / _ \\/ __| +| (__ | |_) || || |_) || __/\\__ \\ + \\___|| .__/ |_|| .__/ \\___||___/ + | | | | + |_| |_|${fgcolors.reset} +${dashedLine()} +${fgcolors.blue}A collection of modular pipelines at CFSAN, FDA.${fgcolors.reset} +${dashedLine()} +${manifest} +${dashedLine(type: '=')} +""".stripIndent() +} + +// Add padding to keys so that +// they indent nicely on the +// terminal +def addPadding(values) { + + def pad = (params.pad) ?: 30 + values.pad = pad + + def padding = values.pad.toInteger() + def nocapitalize = values.nocapitalize + def stopnow = values.stopNow + def help = values.help + + values.removeAll { + k, v -> [ + 'nocapitalize', + 'pad', + 'stopNow', + 'help' + ].contains(k) + } + + values.keySet().each { k -> + v = values[k] + s = params.linewidth - (pad + 5) + if (v.toString().size() > s && !stopnow) { + def sen = '' + // v.toString().findAll(/.{1,${s}}\b(?:\W*|\s*)/).each { + // sen += ' '.multiply(padding + 2) + it + '\n' + // } + v.toString().eachMatch(/.{1,${s}}(?=.*)\b|\w+/) { + sen += ' '.multiply(padding + 2) + it.trim() + '\n' + } + values[k] = ( + help ? sen.replaceAll(/^(\n|\s)*/, '') : sen.trim() + ) + } else { + values[k] = (help ? v + "\n" : v) + } + k = k.replaceAll(/\./, '_') + } + + return values.findResults { + k, v -> nocapitalize ? + k.padRight(padding) + ': ' + v : + k.capitalize().padRight(padding) + ': ' + v + }.join("\n") +} + +// Method for error messages +def stopNow(msg) { + + Map fgcolors = getANSIColors() + Map errors = [:] + + if (msg == null) { + msg = "Unknown error" + } + + errors['stopNow'] = true + errors["${params.cfsanpipename} - ${params.pipeline} - ERROR"] = """ +${fgcolors.reset}${dashedLine()} +${fgcolors.red}${msg}${fgcolors.reset} +${dashedLine()} +""".stripIndent() + // println dashedLine() // defaults to stdout + // log.info addPadding(errors) // prints to stdout + exit 1, "\n" + dashedLine() + + "${fgcolors.red}\n" + addPadding(errors) +} + +// Method to validate 4 required parameters +// if input for entry point is FASTQ files +def validateParamsForFASTQ() { + switch (params) { + case { params.metadata == null && params.input == null }: + stopNow("Either metadata CSV file with 5 required columns\n" + + "in order: sample, fq1, fq2, strandedness, single_end or \n" + + "input directory of only FASTQ files (gzipped or unzipped) should be provided\n" + + "using --metadata or --input options.\n" + + "None of these two options were provided!") + break + case { params.metadata != null && params.input != null }: + stopNow("Either metadata or input directory of FASTQ files\n" + + "should be provided using --metadata or --input options.\n" + + "Using both these options is not allowed!") + break + case { params.output == null }: + stopNow("Please mention output directory to store all results " + + "using --output option!") + break + } + return 1 +} + +// Method to print summary of parameters +// before running +def summaryOfParams() { + + def pipeline_specific_config = new ConfigParser().setIgnoreIncludes(true).parse( + file("${params.workflowsconf}${params.fs}${params.pipeline}.config").text + ) + Map fgcolors = getANSIColors() + Map globalparams = [:] + Map localparams = params.subMap( + pipeline_specific_config.params.keySet().toList() + params.logtheseparams + ) + + if (localparams !instanceof Map) { + stopNow("Need a Map of paramters. We got: " + localparams.getClass()) + } + + if (localparams.size() != 0) { + localparams['nocapitalize'] = true + globalparams['nocapitalize'] = true + globalparams['nextflow_version'] = "${nextflow.version}" + globalparams['nextflow_build'] = "${nextflow.build}" + globalparams['nextflow_timestamp'] = "${nextflow.timestamp}" + globalparams['workflow_projectDir'] = "${workflow.projectDir}" + globalparams['workflow_launchDir'] = "${workflow.launchDir}" + globalparams['workflow_workDir'] = "${workflow.workDir}" + globalparams['workflow_container'] = "${workflow.container}" + globalparams['workflow_containerEngine'] = "${workflow.containerEngine}" + globalparams['workflow_runName'] = "${workflow.runName}" + globalparams['workflow_sessionId'] = "${workflow.sessionId}" + globalparams['workflow_profile'] = "${workflow.profile}" + globalparams['workflow_start'] = "${workflow.start}" + globalparams['workflow_commandLine'] = "${workflow.commandLine}" + return """${dashedLine()} +Summary of the current workflow (${fgcolors.magenta}${params.pipeline}${fgcolors.reset}) parameters +${dashedLine()} +${addPadding(localparams)} +${dashedLine()} +${fgcolors.cyan}N E X T F L O W${fgcolors.reset} - ${fgcolors.magenta}${params.cfsanpipename}${fgcolors.reset} - Runtime metadata +${dashedLine()} +${addPadding(globalparams)} +${dashedLine()}""".stripIndent() + } + return 1 +} + +// Method to display +// Return dashed line either '-' +// type or '=' type +def dashedLine(Map defaults = [:]) { + + Map fgcolors = getANSIColors() + def line = [color: 'white', type: '-'] + + if (!defaults.isEmpty()) { + line.putAll(defaults) + } + + return fgcolors."${line.color}" + + "${line.type}".multiply(params.linewidth) + + fgcolors.reset +} + +// Return slurped keys parsed from JSON +def slurpJson(file) { + def slurped = null + def jsonInst = new JsonSlurper() + + try { + slurped = jsonInst.parse(new File ("${file}")) + } + catch (Exception e) { + log.error 'Please check your JSON schema. Invalid JSON file: ' + file + } + + // Declare globals for the nanofactory + // workflow. + return [keys: slurped.keySet().toList(), cparams: slurped] +} + +// Default help text in a map if the entry point +// to a pipeline is FASTQ files. +def fastqEntryPointHelp() { + + Map helptext = [:] + Map fgcolors = getANSIColors() + + helptext['Workflow'] = "${fgcolors.magenta}${params.pipeline}${fgcolors.reset}" + helptext['Author'] = "${fgcolors.cyan}${params.workflow_built_by}${fgcolors.reset}" + helptext['Version'] = "${fgcolors.green}${params.workflow_version}${fgcolors.reset}\n" + helptext['Usage'] = "cpipes --pipeline ${params.pipeline} [options]\n" + helptext['Required'] = "" + helptext['--input'] = "Absolute path to directory containing FASTQ files. " + + "The directory should contain only FASTQ files as all the " + + "files within the mentioned directory will be read. " + + "Ex: --input /path/to/fastq_pass" + helptext['--output'] = "Absolute path to directory where all the pipeline " + + "outputs should be stored. Ex: --output /path/to/output" + helptext['Other options'] = "" + helptext['--metadata'] = "Absolute path to metadata CSV file containing five " + + "mandatory columns: sample,fq1,fq2,strandedness,single_end. The fq1 and fq2 " + + "columns contain absolute paths to the FASTQ files. This option can be used in place " + + "of --input option. This is rare. Ex: --metadata samplesheet.csv" + helptext['--fq_suffix'] = "The suffix of FASTQ files (Unpaired reads or R1 reads or Long reads) if " + + "an input directory is mentioned via --input option. Default: ${params.fq_suffix}" + helptext['--fq2_suffix'] = "The suffix of FASTQ files (Paired-end reads or R2 reads) if an input directory is mentioned via " + + "--input option. Default: ${params.fq2_suffix}" + helptext['--fq_filter_by_len'] = "Remove FASTQ reads that are less than this many bases. " + + "Default: ${params.fq_filter_by_len}" + helptext['--fq_strandedness'] = "The strandedness of the sequencing run. This is mostly needed " + + "if your sequencing run is RNA-SEQ. For most of the other runs, it is probably safe to use " + + "unstranded for the option. Default: ${params.fq_strandedness}" + helptext['--fq_single_end'] = "SINGLE-END information will be auto-detected but this option forces " + + "PAIRED-END FASTQ files to be treated as SINGLE-END so only read 1 information is included in " + + "auto-generated samplesheet. Default: ${params.fq_single_end}" + helptext['--fq_filename_delim'] = "Delimiter by which the file name is split to obtain sample name. " + + "Default: ${params.fq_filename_delim}" + helptext['--fq_filename_delim_idx'] = "After splitting FASTQ file name by using the --fq_filename_delim option," + + " all elements before this index (1-based) will be joined to create final sample name." + + " Default: ${params.fq_filename_delim_idx}" + + return helptext +} + +// Show concise help text if configured within the main workflow. +def conciseHelp(def tool = null) { + Map fgcolors = getANSIColors() + + tool ?= "fastp" + tools = tool?.tokenize(',') + + return """ +${dashedLine()} +Show configurable CLI options for each tool within ${fgcolors.magenta}${params.pipeline}${fgcolors.reset} +${dashedLine()} +Ex: cpipes --pipeline ${params.pipeline} --help +""" + (tools.size() > 1 ? "Ex: cpipes --pipeline ${params.pipeline} --help ${tools[0]}" + + """ +Ex: cpipes --pipeline ${params.pipeline} --help ${tools[0]},${tools[1]} +${dashedLine()}""".stripIndent() : """Ex: cpipes --pipeline ${params.pipeline} --help ${tool} +${dashedLine()}""".stripIndent()) + +} + +// Wrap help text with the following options +def wrapUpHelp() { + + return [ + 'Help options' : "", + '--help': "Display this message.\n", + 'help': true, + 'nocapitalize': true + ] +} + +// Method to send email on workflow complete. +def sendMail() { + + if (params.user_email == null) { + return 1 + } + + def pad = (params.pad) ?: 30 + def contact_emails = [ + stakeholder: (params.workflow_blueprint_by ?: 'Not defined'), + author: (params.workflow_built_by ?: 'Not defined') + ] + def msg = """ +${pipelineBanner()} +${summaryOfParams()} +${params.cfsanpipename} - ${params.pipeline} +${dashedLine()} +Please check the following directory for N E X T F L O W +reports. You can view the HTML files directly by double clicking +them on your workstation. +${dashedLine()} +${params.tracereportsdir} +${dashedLine()} +Please send any bug reports to CFSAN Dev Team or the author or +the stakeholder of the current pipeline. +${dashedLine()} +Error messages (if any) +${dashedLine()} +${workflow.errorMessage} +${workflow.errorReport} +${dashedLine()} +Contact emails +${dashedLine()} +${addPadding(contact_emails)} +${dashedLine()} +Thank you for using ${params.cfsanpipename} - ${params.pipeline}! +${dashedLine()} +""".stripIndent() + + def mail_cmd = [ + 'sendmail', + '-f', 'noreply@gmail.com', + '-F', 'noreply', + '-t', "${params.user_email}" + ] + + def email_subject = "${params.cfsanpipename} - ${params.pipeline}" + Map fgcolors = getANSIColors() + + if (workflow.success) { + email_subject += ' completed successfully!' + } + else if (!workflow.success) { + email_subject += ' has failed!' + } + + try { + ['env', 'bash'].execute() << """${mail_cmd.join(' ')} +Subject: ${email_subject} +Mime-Version: 1.0 +Content-Type: text/html +<pre> +${msg.replaceAll(/\x1b\[[0-9;]*m/, '')} +</pre> +""".stripIndent() + } catch (all) { + def warning_msg = "${fgcolors.yellow}${params.cfsanpipename} - ${params.pipeline} - WARNING" + .padRight(pad) + ':' + log.info """ +${dashedLine()} +${warning_msg} +${dashedLine()} +Could not send mail with the sendmail command! +${dashedLine()} +""".stripIndent() + } + return 1 +} + +// Set ANSI colors for any and all +// STDOUT or STDERR +def getANSIColors() { + + Map fgcolors = [:] + + fgcolors['reset'] = "\033[0m" + fgcolors['black'] = "\033[0;30m" + fgcolors['red'] = "\033[0;31m" + fgcolors['green'] = "\033[0;32m" + fgcolors['yellow'] = "\033[0;33m" + fgcolors['blue'] = "\033[0;34m" + fgcolors['magenta'] = "\033[0;35m" + fgcolors['cyan'] = "\033[0;36m" + fgcolors['white'] = "\033[0;37m" + + return fgcolors +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/bwa/mem/main.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,50 @@ +process BWA_MEM { + tag "$meta.id" + label 'process_micro' + + module (params.enable_module ? "${params.swmodulepath}${params.fs}bwa${params.fs}0.7.17" : null) + conda (params.enable_conda ? "bioconda::bwa=0.7.17 conda-forge::perl" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/bwa:0.7.17--he4a0461_11' : + 'quay.io/biocontainers/bwa:0.7.17--he4a0461_11' }" + + input: + tuple val(meta), path(reads), path(index) + val index2 + + output: + tuple val(meta), path("*.sam"), emit: aligned_sam + path "versions.yml" , emit: versions + + when: + + + script: + def args = task.ext.args ?: '' + def args2 = task.ext.args2 ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def this_index = (index ?: index2) + """ + + if [ "${params.fq_single_end}" = "false" ]; then + bwa mem \\ + $args \\ + -t $task.cpus \\ + $this_index \\ + ${reads[0]} ${reads[1]} > ${prefix}.aligned.sam + else + bwa mem \\ + $args \\ + -t $task.cpus \\ + -a \\ + $this_index \\ + $reads > ${prefix}.aligned.sam + + fi + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bwa: \$(echo \$(bwa 2>&1) | sed 's/^.*Version: //; s/Contact:.*\$//') + END_VERSIONS + """ +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/cat/fastq/README.md Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,96 @@ +# NextFlow DSL2 Module + +```bash +CAT_FASTQ +``` + +## Description + +Concatenates a list of FASTQ files. Produces 2 files per sample (`id:`) if `single_end` is `false` as mentioned in the metadata Groovy Map. + +\ + + +### `input:` + +___ + +Type: `tuple` + +Takes in the following tuple of metadata (`meta`) and a list of FASTQ files of input type `path` (`reads`) to be concatenated. + +Ex: + +```groovy +[ [id: 'sample1', single_end: true], ['/data/sample1/f_L001.fq', '/data/sample1/f_L002.fq'] ] +``` + +\ + + +#### `meta` + +Type: Groovy Map + +A Groovy Map containing the metadata about the FASTQ file. + +Ex: + +```groovy +[ id: 'FAL00870', strandedness: 'unstranded', single_end: true ] +``` + +\ + + +#### `reads` + +Type: `path` + +NextFlow input type of `path` pointing to list of FASTQ files. + +\ + + +#### `args` + +Type: Groovy String + +String of optional command-line arguments to be passed to the tool. This can be mentioned in `process` scope within `withName:process_name` block using `ext.args` option within your `nextflow.config` file. + +Ex: + +```groovy +withName: 'CAT_FASTQ' { + ext.args = '--genome_size 5.5m' +} +``` + +\ + + +### `output:` + +___ + +Type: `tuple` + +Outputs a tuple of metadata (`meta` from `input:`) and list of concatenated FASTQ files (`catted_reads`). + +\ + + +#### `catted_reads` + +Type: `path` + +NextFlow output type of `path` pointing to the concatenated FASTQ files per sample (`id:`). + +\ + + +#### `versions` + +Type: `path` + +NextFlow output type of `path` pointing to the `.yml` file storing software versions for this process.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/cat/fastq/main.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,89 @@ +process CAT_FASTQ { + tag "$meta.id" + label 'process_micro' + + conda (params.enable_conda ? "conda-forge::sed=4.7 conda-forge::gzip" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://containers.biocontainers.pro/s3/SingImgsRepo/biocontainers/v1.2.0_cv1/biocontainers_v1.2.0_cv1.img' : + 'biocontainers/biocontainers:v1.2.0_cv1' }" + + input: + tuple val(meta), path(reads, stageAs: "input*/*") + + output: + tuple val(meta), path("*.merged.fastq.gz"), emit: catted_reads + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def readList = reads.collect{ it.toString() } + def is_in_gz = readList[0].endsWith('.gz') + def gz_or_ungz = (is_in_gz ? '' : ' | gzip') + def pigz_or_ungz = (is_in_gz ? '' : " | pigz -p ${task.cpus}") + if (meta.single_end) { + if (readList.size > 1) { + """ + zcmd="gzip" + zver="" + + if type pigz > /dev/null 2>&1; then + cat ${readList.join(' ')} ${pigz_or_ungz} > ${prefix}.merged.fastq.gz + zcmd="pigz" + zver=\$( echo \$( \$zcmd --version 2>&1 ) | sed -e '1!d' | sed "s/\$zcmd //" ) + else + cat ${readList.join(' ')} ${gz_or_ungz} > ${prefix}.merged.fastq.gz + zcmd="gzip" + + if [ "${workflow.containerEngine}" != "null" ]; then + zver=\$( echo \$( \$zcmd --help 2>&1 ) | sed -e '1!d; s/ (.*\$//' ) + else + zver=\$( echo \$( \$zcmd --version 2>&1 ) | sed "s/^.*(\$zcmd) //; s/\$zcmd //; s/ Copyright.*\$//" ) + fi + fi + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + cat: \$( echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//' ) + \$zcmd: \$zver + END_VERSIONS + """ + } + } else { + if (readList.size > 2) { + def read1 = [] + def read2 = [] + readList.eachWithIndex{ v, ix -> ( ix & 1 ? read2 : read1 ) << v } + """ + zcmd="gzip" + zver="" + + if type pigz > /dev/null 2>&1; then + cat ${read1.join(' ')} ${pigz_or_ungz} > ${prefix}_1.merged.fastq.gz + cat ${read2.join(' ')} ${pigz_or_ungz} > ${prefix}_2.merged.fastq.gz + zcmd="pigz" + zver=\$( echo \$( \$zcmd --version 2>&1 ) | sed -e '1!d' | sed "s/\$zcmd //" ) + else + cat ${read1.join(' ')} ${gz_or_ungz} > ${prefix}_1.merged.fastq.gz + cat ${read2.join(' ')} ${gz_or_ungz} > ${prefix}_2.merged.fastq.gz + zcmd="gzip" + + if [ "${workflow.containerEngine}" != "null" ]; then + zver=\$( echo \$( \$zcmd --help 2>&1 ) | sed -e '1!d; s/ (.*\$//' ) + else + zver=\$( echo \$( \$zcmd --version 2>&1 ) | sed "s/^.*(\$zcmd) //; s/\$zcmd //; s/ Copyright.*\$//" ) + fi + fi + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + cat: \$( echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//' ) + \$zcmd: \$zver + END_VERSIONS + """ + } + } +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/cat/tables/README.md Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,88 @@ +# NextFlow DSL2 Module + +```bash +TABLE_SUMMARY +``` + +## Description + +Concatenates a list of tables (CSV or TAB delimited) in `.txt` or `.csv` format. The table files to be concatenated **must** have a header as the header from one of the table files will be used as the header for the concatenated result table file. + +\ + + +### `input:` + +___ + +Type: `tuple` + +Takes in the following tuple of `val` table key (`table_sum_on`) and a list of table files of input type `path` (`tables`) to be concatenated. For this module to work, a `bin` directory with the script `create_mqc_data_table.py` should be present where the NextFlow script using this DSL2 module will be run. This `python` script will convert the aggregated table to `.yml` format to be used with `multiqc`. + +Ex: + +```groovy +[ ['ectyper'], ['/data/sample1/f1_ectyper.txt', '/data/sample2/f2_ectyper.txt'] ] +``` + +\ + + +#### `table_sum_on` + +Type: `val` + +A single key defining what tables are being concatenated. For example, if all the `ectyper` results are being concatenated for all samples, then this can be `ectyper`. + +Ex: + +```groovy +[ ['ectyper'], ['/data/sample1/f1_ectyper.txt', '/data/sample2/f2_ectyper.txt'] ] +``` + +\ + + +#### `tables` + +Type: `path` + +NextFlow input type of `path` pointing to a list of tables (files) to be concatenated. + +\ + + +### `output:` + +___ + +Type: `tuple` + +Outputs a tuple of table key (`table_sum_on` from `input:`) and list of concatenated table files (`tblsummed`). + +\ + + +#### `tblsummed` + +Type: `path` + +NextFlow output type of `path` pointing to the concatenated table files per table key (Ex: `ectyper`). + +\ + + +#### `mqc_yml` + +Type: `path` + +NextFlow output type of `path` pointing to the `.yml` file storing table contents in `YAML` format which can be used to inject this table as part of the `multiqc` report. + +\ + + +#### `versions` + +Type: `path` + +NextFlow output type of `path` pointing to the `.yml` file storing software versions for this process.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/cat/tables/main.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,58 @@ +process TABLE_SUMMARY { + tag "$table_sum_on" + label 'process_low' + + // Requires `pyyaml` which does not have a dedicated container but is in the MultiQC container + module (params.enable_module ? "${params.swmodulepath}${params.fs}python${params.fs}3.8.1" : null) + conda (params.enable_conda ? "conda-forge::python=3.9 conda-forge::pyyaml conda-forge::coreutils" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/multiqc:1.11--pyhdfd78af_0' : + 'quay.io/biocontainers/multiqc:1.11--pyhdfd78af_0' }" + + input: + tuple val(table_sum_on), path(tables) + + output: + tuple val(table_sum_on), path("*.tblsum.txt"), emit: tblsummed + path "*_mqc.yml" , emit: mqc_yml + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when || tables + + script: + def args = task.ext.args ?: '' + def onthese = tables.collect().join('\\n') + """ + filenum="1" + header="" + + echo -e "$onthese" | while read -r file; do + + if [ "\${filenum}" == "1" ]; then + header=\$( head -n1 "\${file}" ) + echo -e "\${header}" > ${table_sum_on}.tblsum.txt + fi + + tail -n+2 "\${file}" >> ${table_sum_on}.tblsum.txt + + filenum=\$((filenum+1)) + done + + create_mqc_data_table.py $table_sum_on ${workflow.manifest.name} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bash: \$( bash --version 2>&1 | sed '1!d; s/^.*version //; s/ (.*\$//' ) + python: \$( python --version | sed 's/Python //g' ) + END_VERSIONS + + headver=\$( head --version 2>&1 | sed '1!d; s/^.*(GNU coreutils//; s/) //;' ) + tailver=\$( tail --version 2>&1 | sed '1!d; s/^.*(GNU coreutils//; s/) //;' ) + + cat <<-END_VERSIONS >> versions.yml + head: \$headver + tail: \$tailver + END_VERSIONS + """ +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/custom/dump_software_versions/README.md Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,57 @@ +# NextFlow DSL2 Module + +```bash +DUMP_SOFTWARE_VERSIONS +``` + +## Description + +Given an `YAML` format file, produce a final `.yml` file which has unique entries and a corresponding `.mqc.yml` file for use with `multiqc`. + +\ + + +### `input:` + +___ + +Type: `path` + +Takes in a `path` (`versions`) type pointing to the file to be used to produce a final `.yml` file without any duplicate entries and a `.mqc.yml` file. Generally, this is passed by mixing `versions` from various run time channels and finally passed to this module to produce a final software versions list. + +Ex: + +```groovy +[ '/hpc/scratch/test/work/9b/e7bf7e28806419c1c9a571dacd1f67/versions.yml' ] +``` + +\ + + +### `output:` + +___ + +#### `yml` + +Type: `path` + +NextFlow output type of `path` type pointing to an `YAML` file with software versions. + +\ + + +#### `mqc_yml` + +Type: `path` + +NextFlow output type of `path` pointing to `.mqc.yml` file which can be used to produce a software versions' table with `multiqc`. + +\ + + +#### `versions` + +Type: `path` + +NextFlow output type of `path` pointing to the `.yml` file storing software versions for this process.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/custom/dump_software_versions/main.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,26 @@ +process DUMP_SOFTWARE_VERSIONS { + tag "${params.pipeline} software versions" + label 'process_pico' + + // Requires `pyyaml` which does not have a dedicated container but is in the MultiQC container + module (params.enable_module ? "${params.swmodulepath}${params.fs}python${params.fs}3.8.1" : null) + conda (params.enable_conda ? "conda-forge::python=3.9 conda-forge::pyyaml" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/mulled-v2-ca258a039fcd88610bc4e297b13703e8be53f5ca:d638c4f85566099ea0c74bc8fddc6f531fe56753-0' : + 'quay.io/biocontainers/mulled-v2-ca258a039fcd88610bc4e297b13703e8be53f5ca:d638c4f85566099ea0c74bc8fddc6f531fe56753-0' }" + + input: + path versions + + output: + path "software_versions.yml" , emit: yml + path "software_versions_mqc.yml", emit: mqc_yml + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + template 'dumpsoftwareversions.py' +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/custom/dump_software_versions/templates/dumpsoftwareversions.py Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,101 @@ +#!/usr/bin/env python + +import platform +import subprocess +from textwrap import dedent + +import yaml + + +def _make_versions_html(versions): + html = [ + dedent( + """\\ + <link rel="stylesheet" type="text/css" href="https://cdn.datatables.net/v/dt/jszip-2.5.0/dt-1.12.1/b-2.2.3/b-colvis-2.2.3/b-html5-2.2.3/b-print-2.2.3/fc-4.1.0/r-2.3.0/sc-2.0.6/sb-1.3.3/sp-2.0.1/datatables.min.css"/> + <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/pdfmake/0.1.36/pdfmake.min.js"></script> + <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/pdfmake/0.1.36/vfs_fonts.js"></script> + <script type="text/javascript" src="https://cdn.datatables.net/v/dt/jszip-2.5.0/dt-1.12.1/b-2.2.3/b-colvis-2.2.3/b-html5-2.2.3/b-print-2.2.3/fc-4.1.0/r-2.3.0/sc-2.0.6/sb-1.3.3/sp-2.0.1/datatables.min.js"></script> + <style> + #cpipes-software-versions tbody:nth-child(even) { + background-color: #f2f2f2; + } + </style> + <table class="table" style="width:100%" id="cpipes-software-versions"> + <thead> + <tr> + <th> Process Name </th> + <th> Software </th> + <th> Version </th> + </tr> + </thead> + """ + ) + ] + for process, tmp_versions in sorted(versions.items()): + html.append("<tbody>") + for i, (tool, version) in enumerate(sorted(tmp_versions.items())): + html.append( + dedent( + f"""\\ + <tr> + <td><samp>{process if (i == 0) else ''}</samp></td> + <td><samp>{tool}</samp></td> + <td><samp>{version}</samp></td> + </tr> + """ + ) + ) + html.append("</tbody>") + html.append("</table>") + return "\\n".join(html) + + +versions_this_module = {} +versions_this_module["${task.process}"] = { + "python": platform.python_version(), + "yaml": yaml.__version__, +} + +with open("$versions") as f: + versions_by_process = yaml.load(f, Loader=yaml.BaseLoader) + versions_by_process.update(versions_this_module) + +# aggregate versions by the module name (derived from fully-qualified process name) +versions_by_module = {} +for process, process_versions in versions_by_process.items(): + module = process.split(":")[-1] + try: + assert versions_by_module[module] == process_versions, ( + "We assume that software versions are the same between all modules. " + "If you see this error-message it means you discovered an edge-case " + "and should open an issue in nf-core/tools. " + ) + except KeyError: + versions_by_module[module] = process_versions + +versions_by_module["CPIPES"] = { + "Nextflow": "$workflow.nextflow.version", + "$workflow.manifest.name": "$workflow.manifest.version", + "${params.pipeline}": "${params.workflow_version}", +} + +versions_mqc = { + "id": "software_versions", + "section_name": "${workflow.manifest.name} Software Versions", + "section_href": "https://cfsan-git.fda.gov/Kranti.Konganti/${workflow.manifest.name.toLowerCase()}", + "plot_type": "html", + "description": "Collected at run time from the software output (STDOUT/STDERR).", + "data": _make_versions_html(versions_by_module), +} + +with open("software_versions.yml", "w") as f: + yaml.dump(versions_by_module, f, default_flow_style=False) + +# print('sed -i -e "' + "s%'%%g" + '" *.yml') +subprocess.run('sed -i -e "' + "s%'%%g" + '" software_versions.yml', shell=True) + +with open("software_versions_mqc.yml", "w") as f: + yaml.dump(versions_mqc, f, default_flow_style=False) + +with open("versions.yml", "w") as f: + yaml.dump(versions_this_module, f, default_flow_style=False)
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/fastp/main.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,96 @@ +process FASTP { + tag "$meta.id" + label 'process_low' + + module (params.enable_module ? "${params.swmodulepath}${params.fs}fastp${params.fs}0.23.2" : null) + conda (params.enable_conda ? "bioconda::fastp=0.23.2 conda-forge::isa-l" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/fastp:0.23.2--h79da9fb_0' : + 'quay.io/biocontainers/fastp:0.23.2--h79da9fb_0' }" + + input: + tuple val(meta), path(reads) + + output: + tuple val(meta), path('*.fastp.fastq.gz') , emit: passed_reads, optional: true + tuple val(meta), path('*.fail.fastq.gz') , emit: failed_reads, optional: true + tuple val(meta), path('*.merged.fastq.gz'), emit: merged_reads, optional: true + tuple val(meta), path('*.json') , emit: json + tuple val(meta), path('*.html') , emit: html + tuple val(meta), path('*.log') , emit: log + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def fail_fastq = params.fastp_failed_out && meta.single_end ? "--failed_out ${prefix}.fail.fastq.gz" : params.fastp_failed_out && !meta.single_end ? "--unpaired1 ${prefix}_1.fail.fastq.gz --unpaired2 ${prefix}_2.fail.fastq.gz" : '' + // Added soft-links to original fastqs for consistent naming in MultiQC + // Use single ended for interleaved. Add --interleaved_in in config. + if ( task.ext.args?.contains('--interleaved_in') ) { + """ + [ ! -f ${prefix}.fastq.gz ] && ln -sf $reads ${prefix}.fastq.gz + + fastp \\ + --stdout \\ + --in1 ${prefix}.fastq.gz \\ + --thread $task.cpus \\ + --json ${prefix}.fastp.json \\ + --html ${prefix}.fastp.html \\ + $fail_fastq \\ + $args \\ + 2> ${prefix}.fastp.log \\ + | gzip -c > ${prefix}.fastp.fastq.gz + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + fastp: \$(fastp --version 2>&1 | sed -e "s/fastp //g") + END_VERSIONS + """ + } else if (meta.single_end) { + """ + [ ! -f ${prefix}.fastq.gz ] && ln -sf $reads ${prefix}.fastq.gz + + fastp \\ + --in1 ${prefix}.fastq.gz \\ + --out1 ${prefix}.fastp.fastq.gz \\ + --thread $task.cpus \\ + --json ${prefix}.fastp.json \\ + --html ${prefix}.fastp.html \\ + $fail_fastq \\ + $args \\ + 2> ${prefix}.fastp.log + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + fastp: \$(fastp --version 2>&1 | sed -e "s/fastp //g") + END_VERSIONS + """ + } else { + def merge_fastq = params.fastp_merged_out ? "-m --merged_out ${prefix}.merged.fastq.gz" : '' + """ + [ ! -f ${prefix}_1.fastq.gz ] && ln -sf ${reads[0]} ${prefix}_1.fastq.gz + [ ! -f ${prefix}_2.fastq.gz ] && ln -sf ${reads[1]} ${prefix}_2.fastq.gz + fastp \\ + --in1 ${prefix}_1.fastq.gz \\ + --in2 ${prefix}_2.fastq.gz \\ + --out1 ${prefix}_1.fastp.fastq.gz \\ + --out2 ${prefix}_2.fastp.fastq.gz \\ + --json ${prefix}.fastp.json \\ + --html ${prefix}.fastp.html \\ + $fail_fastq \\ + $merge_fastq \\ + --thread $task.cpus \\ + --detect_adapter_for_pe \\ + $args \\ + 2> ${prefix}.fastp.log + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + fastp: \$(fastp --version 2>&1 | sed -e "s/fastp //g") + END_VERSIONS + """ + } +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/fastqc/README.md Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,113 @@ +# NextFlow DSL2 Module + +```bash +FASTQC +``` + +## Description + +Run `fastqc` tool on reads in FASTQ format. Produces a HTML report file and a `.zip` file containing plots and data used to produce the plots. + +\ + + +### `input:` + +___ + +Type: `tuple` + +Takes in the following tuple of metadata (`meta`) and a list of reads of type `path` (`reads`) per sample (`id:`). + +Ex: + +```groovy +[ + [ id: 'FAL00870', + strandedness: 'unstranded', + single_end: true, + centrifuge_x: '/hpc/db/centrifuge/2022-04-12/ab' + ], + '/hpc/scratch/test/FAL000870/f1.merged.fq.gz' +] +``` + +\ + + +#### `meta` + +Type: Groovy Map + +A Groovy Map containing the metadata about the FASTQ file. + +Ex: + +```groovy +[ + id: 'FAL00870', + strandedness: 'unstranded', + single_end: true +] +``` + +\ + + +#### `reads` + +Type: `path` + +NextFlow input type of `path` pointing to FASTQ files on which `fastqc` classification should be run. + +\ + + +#### `args` + +Type: Groovy String + +String of optional command-line arguments to be passed to the tool. This can be mentioned in `process` scope within `withName:process_name` block using `ext.args` option within your `nextflow.config` file. + +Ex: + +```groovy +withName: 'FASTQC' { + ext.args = '--nano' +} +``` + +### `output:` + +___ + +Type: `tuple` + +Outputs a tuple of metadata (`meta` from `input:`) and list of `fastqc` result files. + +\ + + +#### `html` + +Type: `path` + +NextFlow output type of `path` pointing to the `fastqc` report file in HTML format per sample (`id:`). + +\ + + +#### `zip` + +Type: `path` + +NextFlow output type of `path` pointing to the zipped `fastqc` results per sample (`id:`). + +\ + + +#### `versions` + +Type: `path` + +NextFlow output type of `path` pointing to the `.yml` file storing software versions for this process.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/fastqc/main.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,48 @@ +process FASTQC { + tag "$meta.id" + label 'process_low' + + module (params.enable_module ? "${params.swmodulepath}${params.fs}fastqc${params.fs}0.11.9" : null) + conda (params.enable_conda ? "conda-forge::perl bioconda::fastqc=0.11.9" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0' : + 'quay.io/biocontainers/fastqc:0.11.9--0' }" + + input: + tuple val(meta), path(reads) + + output: + tuple val(meta), path("*.html"), emit: html + tuple val(meta), path("*.zip") , emit: zip + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + // Add soft-links to original FastQs for consistent naming in pipeline + def prefix = task.ext.prefix ?: "${meta.id}" + if (meta.single_end) { + """ + [ ! -f ${prefix}.fastq.gz ] && ln -s $reads ${prefix}.fastq.gz + fastqc $args --threads $task.cpus ${prefix}.fastq.gz + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" ) + END_VERSIONS + """ + } else { + """ + [ ! -f ${prefix}_1.fastq.gz ] && ln -s ${reads[0]} ${prefix}_1.fastq.gz + [ ! -f ${prefix}_2.fastq.gz ] && ln -s ${reads[1]} ${prefix}_2.fastq.gz + fastqc $args --threads $task.cpus ${prefix}_1.fastq.gz ${prefix}_2.fastq.gz + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" ) + END_VERSIONS + """ + } +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/gen_samplesheet/README.md Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,55 @@ +# NextFlow DSL2 Module + +```bash +GEN_SAMPLESHEET +``` + +## Description + +Generates a sample sheet in CSV format that contains required fields to be used to construct a Groovy Map of metadata. It requires as input, an absolute UNIX path to a folder containing only FASTQ files. This module requires the `fastq_dir_to_samplesheet.py` script to be present in the `bin` folder from where the NextFlow script including this module will be executed. + +\ + + +### `input:` + +___ + +Type: `val` + +Takes in the absolute UNIX path to a folder containing only FASTQ files (`inputdir`). + +Ex: + +```groovy +'/hpc/scratch/test/reads' +``` + +\ + + +### `output:` + +___ + +Type: `path` + +NextFlow output of type `path` pointing to auto-generated CSV sample sheet (`csv`). + +\ + + +#### `csv` + +Type: `path` + +NextFlow output type of `path` pointing to auto-generated CSV sample sheet for all FASTQ files present in the folder given by NextFlow input type of `val` (`inputdir`). + +\ + + +#### `versions` + +Type: `path` + +NextFlow output type of `path` pointing to the `.yml` file storing software versions for this process.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/gen_samplesheet/main.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,41 @@ +process GEN_SAMPLESHEET { + tag "${inputdir.simpleName}" + label "process_pico" + + module (params.enable_module ? "${params.swmodulepath}${params.fs}python${params.fs}3.8.1" : null) + conda (params.enable_conda ? "conda-forge::python=3.9.5" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/python:3.9--1' : + 'quay.io/biocontainers/python:3.9--1' }" + + input: + val inputdir + + output: + path '*.csv' , emit: csv + path 'versions.yml', emit: versions + + when: + task.ext.when == null || task.ext.when + + // This script (fastq_dir_to_samplesheet.py) is distributed + // as part of the pipeline nf-core/rnaseq/bin/. MIT License. + script: + def this_script_args = (params.fq_single_end ? ' -se' : '') + this_script_args += (params.fq_suffix ? " -r1 '${params.fq_suffix}'" : '') + this_script_args += (params.fq2_suffix ? " -r2 '${params.fq2_suffix}'" : '') + + """ + fastq_dir_to_samplesheet.py -sn \\ + -st '${params.fq_strandedness}' \\ + -sd '${params.fq_filename_delim}' \\ + -si ${params.fq_filename_delim_idx} \\ + ${this_script_args} \\ + ${inputdir} autogen_samplesheet.csv + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$( python --version | sed 's/Python //g' ) + END_VERSIONS + """ +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/kma/align/README.md Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,135 @@ +# NextFlow DSL2 Module + +```bash +KMA_ALIGN +``` + +## Description + +Run `kma` alinger on input FASTQ files with a pre-formatted `kma` index. + +\ + + +### `input:` + +___ + +Type: `tuple` + +Takes in the following tuple of metadata (`meta`) and a list of reads of type `path` (`reads`) and a correspondonding `kma` pre-formatted index folder per sample (`id:`). + +Ex: + +```groovy +[ + [ + id: 'FAL00870', + strandedness: 'unstranded', + single_end: false + ], + [ + '/hpc/scratch/test/f1.R1.fq.gz', + '/hpc/scratch/test/f1.R2.fq.gz' + ], + '/path/to/kma/index/folder' +] +``` + +\ + + +#### `meta` + +Type: Groovy Map + +A Groovy Map containing the metadata about the FASTQ file. + +Ex: + +```groovy +[ + id: 'FAL00870', + strandedness: 'unstranded', + single_end: true +] +``` + +\ + + +#### `reads` + +Type: `path` + +NextFlow input type of `path` pointing to paired-end FASTQ files on which `bbmerge.sh` should be run. + +\ + + +#### `index` + +Type: `path` + +NextFlow input type of `path` pointing to folder containing `kma` index files. + +\ + + +#### `args` + +Type: Groovy String + +String of optional command-line arguments to be passed to the tool. This can be mentioned in `process` scope within `withName:process_name` block using `ext.args` option within your `nextflow.config` file. + +Ex: + +```groovy +withName: 'KMA_ALIGN' { + ext.args = '-mint2' +} +``` + +### `output:` + +___ + +Type: `tuple` + +Outputs a tuple of metadata (`meta` from `input:`) and `kma` result files. + +\ + + +#### `res` + +Type: `path` + +NextFlow output type of `path` pointing to the `.res` file from `kma` per sample (`id:`). + +\ + + +#### `mapstat` + +Type: `path` + +NextFlow output type of `path` pointing to the `.map` file from `kma` per sample (`id:`). Optional: `true` + +\ + + +#### `hits` + +Type: `path` + +NextFlow output type of `path` pointing to a `*_template_hits.txt` file containing only hit IDs. Optional: `true` + +\ + + +#### `versions` + +Type: `path` + +NextFlow output type of `path` pointing to the `.yml` file storing software versions for this process.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/kma/align/main.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,73 @@ +process KMA_ALIGN { + tag "$meta.id" + label 'process_low' + + module (params.enable_module ? "${params.swmodulepath}${params.fs}kma${params.fs}1.4.4" : null) + conda (params.enable_conda ? "conda-forge::libgcc-ng bioconda::kma=1.4.3" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/kma:1.4.3--h7132678_1': + 'quay.io/biocontainers/kma:1.4.3--h7132678_1' }" + + input: + tuple val(meta), path(reads), path(index) + + output: + path "${meta.id}_kma_res" + tuple val(meta), path("${meta.id}_kma_res${params.fs}*.res") , emit: res + tuple val(meta), path("${meta.id}_kma_res${params.fs}*.mapstat") , emit: mapstat, optional: true + tuple val(meta), path("${meta.id}_kma_res${params.fs}*.frag.gz") , emit: frags, optional: true + tuple val(meta), path("${meta.id}_kma_res${params.fs}*_template_hits.txt"), emit: hits, optional: true + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def reads_in = (meta.single_end ? "-i $reads" : "-ipe ${reads[0]} ${reads[1]}") + def db = (meta.kma_t_db ?: "${index}") + def db_basename = (meta.kma_t_db ? '' : "${params.fs}${index.baseName}") + def get_hit_accs = (meta.get_kma_hit_accs ? 'true' : 'false') + def res_dir = prefix + '_kma_res' + reads_in = (params.kmaalign_int ? "-int $reads" : "$reads_in") + """ + mkdir -p $res_dir || exit 1 + kma \\ + $args \\ + -t_db $db$db_basename \\ + -t $task.cpus \\ + -o $res_dir${params.fs}$prefix \\ + $reads_in + + if [ "$get_hit_accs" == "true" ]; then + grep -v '^#' $res_dir${params.fs}${prefix}.res | \\ + grep -E -o '^[[:alnum:]]+\\-*\\.*[0-9]+' > $res_dir${params.fs}${prefix}_template_hits.txt || true + fi + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + kma: \$( kma -v | sed -e 's%KMA-%%' ) + END_VERSIONS + + mkdirver="" + cutver="" + grepver="" + + if [ "${workflow.containerEngine}" != "null" ]; then + mkdirver=\$( mkdir --help 2>&1 | sed -e '1!d; s/ (.*\$//' | cut -f1-2 -d' ' ) + cutver="\$mkdirver" + grepver="\$mkdirver" + else + mkdirver=\$( mkdir --version 2>&1 | sed '1!d; s/^.*(GNU coreutils//; s/) //;' ) + cutver=\$( cut --version 2>&1 | sed '1!d; s/^.*(GNU coreutils//; s/) //;' ) + grepver=\$( echo \$(grep --version 2>&1) | sed 's/^.*(GNU grep) //; s/ Copyright.*\$//' ) + fi + + cat <<-END_VERSIONS >> versions.yml + mkdir: \$mkdirver + cut: \$cutver + grep: \$grepver + END_VERSIONS + """ +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/kma/index/README.md Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,86 @@ +# NextFlow DSL2 Module + +```bash +KMA_INDEX +``` + +## Description + +Run `kma index` alinger on input FASTA files. + +\ + + +### `input:` + +___ + +Type: `tuple` + +Takes in the following tuple of metadata (`meta`) and a FASTA file of type `path` (`fasta`) per sample (`id:`). + +Ex: + +```groovy +[ + [ + id: 'FAL00870', + ], + '/path/to/FAL00870_contigs.fasta' +] +``` + +\ + + +#### `meta` + +Type: Groovy Map + +A Groovy Map containing the metadata about the FASTA file. + +Ex: + +```groovy +[ + id: 'FAL00870' +] +``` + +\ + + +#### `fasta` + +Type: `path` + +NextFlow input type of `path` pointing to the FASTA file on which the `kma index` command should be run. + +\ + + +### `output:` + +___ + +Type: `tuple` + +Outputs a tuple of metadata (`meta` from `input:`) and a folder containing `kma index` files. + +\ + + +#### `idx` + +Type: `path` + +NextFlow output type of `path` pointing to the folder containing `kma index` files per sample (`id:`). + +\ + + +#### `versions` + +Type: `path` + +NextFlow output type of `path` pointing to the `.yml` file storing software versions for this process.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/kma/index/main.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,57 @@ +process KMA_INDEX { + tag "$meta.id" + label 'process_nano' + + module (params.enable_module ? "${params.swmodulepath}${params.fs}kma${params.fs}1.4.4" : null) + conda (params.enable_conda ? "conda-forge::libgcc-ng bioconda::kma=1.4.3 conda-forge::coreutils" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/kma:1.4.3--h7132678_1': + 'quay.io/biocontainers/kma:1.4.3--h7132678_1' }" + + input: + tuple val(meta), path(fasta) + + output: + tuple val(meta), path("${meta.id}_kma_idx"), emit: idx + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}_kma_idx" + def add_to_db = (meta.kmaindex_t_db ? "-t_db ${meta.kmaindex_t_db}" : '') + """ + mkdir -p $prefix && cd $prefix || exit 1 + kma \\ + index \\ + $args \\ + $add_to_db \\ + -i ../$fasta \\ + -o $prefix + cd .. || exit 1 + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + kma: \$( kma -v | sed -e 's%KMA-%%' ) + END_VERSIONS + + mkdirver="" + cutver="" + + if [ "${workflow.containerEngine}" != "null" ]; then + mkdirver=\$( mkdir --help 2>&1 | sed -e '1!d; s/ (.*\$//' | cut -f1-2 -d' ' ) + cutver="\$mkdirver" + else + mkdirver=\$( mkdir --version 2>&1 | sed '1!d; s/^.*(GNU coreutils//; s/) //;' ) + cutver=\$( cut --version 2>&1 | sed '1!d; s/^.*(GNU coreutils//; s/) //;' ) + fi + + cat <<-END_VERSIONS >> versions.yml + mkdir: \$mkdirver + cut: \$cutver + cd: \$( bash --version 2>&1 | sed '1!d; s/^.*version //; s/ (.*\$//' ) + END_VERSIONS + """ +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/krona/ktimporttext/main.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,39 @@ +process KRONA_KTIMPORTTEXT { + tag "$meta.id" + label 'process_nano' + + module (params.enable_module ? "${params.swmodulepath}${params.fs}krona${params.fs}2.8.1" : null) + conda (params.enable_conda ? "conda-forge::curl bioconda::krona=2.8.1" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/krona:2.8.1--pl5321hdfd78af_1': + 'quay.io/biocontainers/krona:2.8.1--pl5321hdfd78af_1' }" + + input: + tuple val(meta), path(report) + + output: + tuple val(meta), path ('*.html'), emit: html + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def krona_suffix = params.krona_res_suffix ?: '.krona.tsv' + def reports = report.collect { + it = it.toString() + ',' + it.toString().replaceAll(/(.*)${krona_suffix}$/, /$1/) + }.sort().join(' ') + """ + ktImportText \\ + $args \\ + -o ${prefix}.html \\ + $reports + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + krona: \$( echo \$(ktImportText 2>&1) | sed 's/^.*KronaTools //g; s/- ktImportText.*\$//g') + END_VERSIONS + """ +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/multiqc/README.md Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,67 @@ +# NextFlow DSL2 Module + +```bash +MULTIQC +``` + +## Description + +Generate an aggregated [**MultiQC**](https://multiqc.info/) report. This particular module **will only work** within the framework of `cpipes` as in, it uses many `cpipes` related UNIX absolute paths to store and retrieve **MultiQC** related configration files and `cpipes` context aware metadata. It also uses a custom logo with filename `FDa-Logo-Blue---medium-01.png` which should be located inside an `assets` folder from where the NextFlow script including this module will be executed. + +\ + + +### `input:` + +___ + +Type: `path` + +Takes in NextFlow input type of `path` which points to many log files that **MultiQC** should parse. + +Ex: + +```groovy +[ '/data/sample1/centrifuge/cent_output.txt', '/data/sample1/kraken/kraken_output.txt'] ] +``` + +\ + + +### `output:` + +___ + +#### `report` + +Type: `path` + +Outputs a NextFlow output type of `path` pointing to the location of **MultiQC** final HTML report. + +\ + + +#### `data` + +Type: `path` + +NextFlow output type of `path` pointing to the data files folder generated by **MultiQC** which were used to generate plots and HTML report. + +\ + + +#### `plots` + +Type: `path` +Optional: `true` + +NextFlow output type of `path` pointing to the plots folder generated by **MultiQC**. + +\ + + +#### `versions` + +Type: `path` + +NextFlow output type of `path` pointing to the `.yml` file storing software versions for this process.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/multiqc/main.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,51 @@ +process MULTIQC { + label 'process_micro' + tag 'MultiQC' + + module (params.enable_module ? "${params.swmodulepath}${params.fs}multiqc${params.fs}1.19" : null) + conda (params.enable_conda ? 'conda-forge::python=3.11 conda-forge::spectra conda-forge::lzstring conda-forge::imp bioconda::multiqc=1.19' : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/multiqc:1.19--pyhdfd78af_0' : + 'quay.io/biocontainers/multiqc:1.19--pyhdfd78af_0' }" + + input: + path multiqc_files + + output: + path "*multiqc*" + path "*multiqc_report.html", emit: report + path "*_data" , emit: data + path "*_plots" , emit: plots, optional: true + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + """ + cp ${params.projectconf}${params.fs}multiqc${params.fs}${params.pipeline}_mqc.yml cpipes_mqc_config.yml + sed -i -e 's/Workflow_Name_Placeholder/${params.pipeline}/g; s/Workflow_Version_Placeholder/${params.workflow_version}/g' cpipes_mqc_config.yml + sed -i -e 's/CPIPES_Version_Placeholder/${workflow.manifest.version}/g; s%Workflow_Output_Placeholder%${params.output}%g' cpipes_mqc_config.yml + sed -i -e 's%Workflow_Input_Placeholder%${params.input}%g' cpipes_mqc_config.yml + + multiqc --interactive -c cpipes_mqc_config.yml -f $args . + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" ) + END_VERSIONS + + sedver="" + + if [ "${workflow.containerEngine}" != "null" ]; then + sedver=\$( sed --help 2>&1 | sed -e '1!d; s/ (.*\$//' ) + else + sedver=\$( echo \$(sed --version 2>&1) | sed 's/^.*(GNU sed) //; s/ Copyright.*\$//' ) + fi + + cat <<-END_VERSIONS >> versions.yml + sed: \$sedver + END_VERSIONS + """ +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/nowayout_results/main.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,45 @@ +process NOWAYOUT_RESULTS { + tag "nowayout aggregate" + label "process_pico" + + module (params.enable_module ? "${params.swmodulepath}${params.fs}python${params.fs}3.8.1" : null) + conda (params.enable_conda ? 'conda-forge::python=3.11 conda-forge::spectra conda-forge::lzstring conda-forge::imp bioconda::multiqc=1.19' : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/multiqc:1.19--pyhdfd78af_0' : + 'quay.io/biocontainers/multiqc:1.19--pyhdfd78af_0' }" + + input: + path pass_and_fail_rel_abn_files + path lineage_csv + + output: + path '*.tblsum.txt', emit: mqc_txt, optional: true + path '*_mqc.json' , emit: mqc_json, optional: true + path '*_mqc.yml' , emit: mqc_yml, optional: true + path '*.tsv' , emit: tsv, optional: true + path 'versions.yml', emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + """ + gen_salmon_tph_and_krona_tsv.py \\ + $args \\ + -sal "." \\ + -smres "." \\ + -lin $lineage_csv + + create_mqc_data_table.py \\ + "nowayout" "The results shown here are <code>salmon quant</code> TPM values scaled down by a factor of ${params.gsalkronapy_sf}." + + create_mqc_data_table.py \\ + "nowayout_indiv_reads_mapped" "The results shown here are the number of reads mapped (post threshold filters) per taxon to the <code>nowayout</code>'s custom <code>${params.db_mode}</code> database for each sample." + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$( python --version | sed 's/Python //g' ) + END_VERSIONS + """ +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/otf_genome/main.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,39 @@ +process OTF_GENOME { + tag "$meta.id" + label "process_nano" + + module (params.enable_module ? "${params.swmodulepath}${params.fs}python${params.fs}3.8.1" : null) + conda (params.enable_conda ? "conda-forge::python=3.10.4" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/python:3.10.4' : + 'quay.io/biocontainers/python:3.10.4' }" + + input: + tuple val(meta), path(kma_hits), path(kma_fragz) + + output: + tuple val(meta), path('*_scaffolded_genomic.fna.gz'), emit: genomes_fasta, optional: true + tuple val(meta), path('*_aln_reads.fna.gz') , emit: reads_extracted, optional: true + path '*FAILED.txt' , emit: failed, optional: true + path 'versions.yml' , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + args += (kma_hits ? " -txt ${kma_hits}" : '') + args += (params.tuspy_gd ? " -gd ${params.tuspy_gd}" : '') + args += (prefix ? " -op ${prefix}" : '') + + """ + gen_otf_genome.py \\ + $args + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$( python --version | sed 's/Python //g' ) + END_VERSIONS + """ +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/salmon/index/README.md Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,88 @@ +# NextFlow DSL2 Module + +```bash +SALMON_INDEX +``` + +## Description + +Run `salmon index` command on input FASTA file. + +\ + + +### `input:` + +___ + +Type: `tuple` + +Takes in the following tuple of metadata (`meta`) and a FASTA file of type `path` (`genome_fasta`) per sample (`id:`). + +Ex: + +```groovy +[ + [ + id: 'FAL00870' + ], + [ + '/hpc/scratch/test/FAL00870_contigs.fasta', + ] +] +``` + +\ + + +#### `meta` + +Type: Groovy Map + +A Groovy Map containing the metadata about the genome FASTA file. + +Ex: + +```groovy +[ + id: 'FAL00870' +] +``` + +\ + + +#### `genome_fasta` + +Type: `path` + +NextFlow input type of `path` pointing to the FASTA file (gzipped or unzipped) on which `salmon index` should be run. + +\ + + +### `output:` + +___ + +Type: `tuple` + +Outputs a tuple of metadata (`meta` from `input:`) and a folder containing `salmon index` result files. + +\ + + +#### `idx` + +Type: `path` + +NextFlow output type of `path` pointing to the `salmon index` result files per sample (`id:`). + +\ + + +#### `versions` + +Type: `path` + +NextFlow output type of `path` pointing to the `.yml` file storing software versions for this process.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/salmon/index/main.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,40 @@ +process SALMON_INDEX { + tag "$meta.id" + label "process_micro" + + module (params.enable_module ? "${params.swmodulepath}${params.fs}salmon${params.fs}1.10.0" : null) + conda (params.enable_conda ? 'conda-forge::libgcc-ng bioconda::salmon=1.10.1' : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/salmon:1.10.1--h7e5ed60_1' : + 'quay.io/biocontainers/salmon:1.10.1--h7e5ed60_1' }" + + input: + tuple val(meta), path(genome_fasta) + + output: + tuple val(meta), path("${meta.id}_salmon_idx"), emit: idx + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}_salmon_idx" + def decoys_file = file( meta.salmon_decoys ) + def decoys = !("${decoys_file.simpleName}" ==~ 'dummy_file.*') && decoys_file.exits() ? "--decoys ${meta.salmon_decoys}" : '' + """ + salmon \\ + index \\ + $decoys \\ + --threads $task.cpus \\ + $args \\ + --index $prefix \\ + --transcripts $genome_fasta + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + salmon: \$(echo \$(salmon --version) | sed -e "s/salmon //g") + END_VERSIONS + """ +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/salmon/quant/README.md Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,118 @@ +# NextFlow DSL2 Module + +```bash +SALMON_QUANT +``` + +## Description + +Run `salmon quant` in `reads` or `alignments` mode. The inputs can be either the alignment (Ex: `.bam`) files or read (Ex: `.fastq.gz`) files. + +\ + + +### `input:` + +___ + +Type: `tuple` + +Takes in the following tuple of metadata (`meta`) and either an alignment file or reads file and a `salmon index` or a transcript FASTA file per sample (`id:`). + +Ex: + +```groovy +[ + [ + id: 'FAL00870', + strandedness: 'unstranded', + single_end: true + ], + [ + '/hpc/scratch/test/FAL00870_R1.fastq.gz' + ], + [ + '/hpc/scratch/test/salmon_idx_for_FAL00870' + ] +] +``` + +\ + + +#### `meta` + +Type: Groovy Map + +A Groovy Map containing the metadata about the input setup for `salmon quant`. + +Ex: + +```groovy +[ + id: 'FAL00870', + strandedness: 'unstranded', + single_end: true +] +``` + +\ + + +#### `reads_or_bam` + +Type: `path` + +NextFlow input type of `path` pointing to either an alignment file (Ex: `.bam`) or a reads file (Ex: `.fastq.gz`) on which `salmon quant` should be run. + +\ + + +#### `index_or_tr_fasta` + +Type: `path` + +NextFlow input type of `path` pointing to either a folder containing `salmon index` files or a trasnscript FASTA file. + +\ + + +#### `args` + +Type: Groovy String + +String of optional command-line arguments to be passed to the tool. This can be mentioned in `process` scope within `withName:process_name` block using `ext.args` option within your `nextflow.config` file. + +Ex: + +```groovy +withName: 'SALMON_QUANT' { + ext.args = '--vbPrior 0.02' +} +``` + +### `output:` + +___ + +Type: `tuple` + +Outputs a tuple of metadata (`meta` from `input:`) and a folder containing `salmon quant` result files. + +\ + + +#### `results` + +Type: `path` + +NextFlow output type of `path` pointing to the `salmon quant` result files per sample (`id:`). + +\ + + +#### `versions` + +Type: `path` + +NextFlow output type of `path` pointing to the `.yml` file storing software versions for this process.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/salmon/quant/main.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,75 @@ +process SALMON_QUANT { + tag "$meta.id" + label "process_micro" + + module (params.enable_module ? "${params.swmodulepath}${params.fs}salmon${params.fs}1.10.0" : null) + conda (params.enable_conda ? 'conda-forge::libgcc-ng bioconda::salmon=1.10.1' : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/salmon:1.10.1--h7e5ed60_1' : + 'quay.io/biocontainers/salmon:1.10.1--h7e5ed60_1' }" + input: + tuple val(meta), path(reads_or_bam), path(index_or_tr_fasta) + + output: + tuple val(meta), path("${meta.id}_salmon_res"), emit: results + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}_salmon_res" + def reference = "--index $index_or_tr_fasta" + def lib_type = (meta.salmon_lib_type ?: '') + def alignment_mode = (meta.salmon_alignment_mode ?: '') + def gtf = (meta.salmon_gtf ? "--geneMap ${meta.salmon_gtf}" : '') + def input_reads =(meta.single_end || !reads_or_bam[1] ? "-r $reads_or_bam" : "-1 ${reads_or_bam[0]} -2 ${reads_or_bam[1]}") + + // Use path(reads_or_bam) to point to BAM and path(index_or_tr_fasta) to point to transcript fasta + // if using salmon DSL2 module in alignment-based mode. + // By default, this module will be run in selective-alignment-based mode of salmon. + if (alignment_mode) { + reference = "-t $index_or_tr_fasta" + input_reads = "-a $reads_or_bam" + } + + def strandedness_opts = [ + 'A', 'U', 'SF', 'SR', + 'IS', 'IU' , 'ISF', 'ISR', + 'OS', 'OU' , 'OSF', 'OSR', + 'MS', 'MU' , 'MSF', 'MSR' + ] + + def strandedness = 'A' + + if (lib_type) { + if (strandedness_opts.contains(lib_type)) { + strandedness = lib_type + } else { + log.info "[Salmon Quant] Invalid library type specified '--libType=${lib_type}', defaulting to auto-detection with '--libType=A'." + } + } else { + strandedness = meta.single_end ? 'U' : 'IU' + if (meta.strandedness == 'forward') { + strandedness = meta.single_end ? 'SF' : 'ISF' + } else if (meta.strandedness == 'reverse') { + strandedness = meta.single_end ? 'SR' : 'ISR' + } + } + """ + salmon quant \\ + --threads $task.cpus \\ + --libType=$strandedness \\ + $gtf \\ + $args \\ + -o $prefix \\ + $reference \\ + $input_reads + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + salmon: \$(echo \$(salmon --version) | sed -e "s/salmon //g") + END_VERSIONS + """ +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/samplesheet_check/README.md Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,55 @@ +# NextFlow DSL2 Module + +```bash +SAMPLESHEET_CHECK +``` + +## Description + +Checks the validity of the sample sheet in CSV format to make sure there are required mandatory fields. This module generally succeeds `GEN_SAMPLESHEET` module as part of the `cpipes` pipelines to make sure that all fields of the columns are properly formatted to be used as Groovy Map for `meta` which is of input type `val`. This module requires the `check_samplesheet.py` script to be present in the `bin` folder from where the NextFlow script including this module will be executed + +\ + + +### `input:` + +___ + +Type: `path` + +Takes in the absolute UNIX path to the sample sheet in CSV format (`samplesheet`). + +Ex: + +```groovy +'/hpc/scratch/test/reads/output/gen_samplesheet/autogen_samplesheet.csv' +``` + +\ + + +### `output:` + +___ + +Type: `path` + +NextFlow output of type `path` pointing to properly formatted CSV sample sheet (`csv`). + +\ + + +#### `csv` + +Type: `path` + +NextFlow output type of `path` pointing to auto-generated CSV sample sheet for all FASTQ files present in the folder given by NextFlow input type of `val` (`inputdir`). + +\ + + +#### `versions` + +Type: `path` + +NextFlow output type of `path` pointing to the `.yml` file storing software versions for this process.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/samplesheet_check/main.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,32 @@ +process SAMPLESHEET_CHECK { + tag "$samplesheet" + label "process_femto" + + module (params.enable_module ? "${params.swmodulepath}${params.fs}python${params.fs}3.8.1" : null) + conda (params.enable_conda ? "conda-forge::python=3.9.5" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/python:3.9--1' : + 'quay.io/biocontainers/python:3.9--1' }" + + input: + path samplesheet + + output: + path '*.csv' , emit: csv + path "versions.yml", emit: versions + + when: + task.ext.when == null || task.ext.when + + script: // This script is bundled with the pipeline, in nf-core/rnaseq/bin/ + """ + check_samplesheet.py \\ + $samplesheet \\ + samplesheet.valid.csv + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$( python --version | sed 's/Python //g' ) + END_VERSIONS + """ +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/samtools/fastq/main.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,53 @@ +process SAMTOOLS_FASTQ { + tag "$meta.id" + label 'process_micro' + + module (params.enable_module ? "${params.swmodulepath}${params.fs}samtools${params.fs}1.13" : null) + conda (params.enable_conda ? "bioconda::samtools=1.18 bioconda::htslib=1.18 conda-forge::bzip2" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/samtools:1.18--h50ea8bc_1' : + 'quay.io/biocontainers/samtools:1.18--h50ea8bc_1' }" + + input: + tuple val(meta), path(input) + val(interleave) + + output: + tuple val(meta), path("*_{1,2}.fastq.gz") , optional:true, emit: fastq + tuple val(meta), path("*_{1,2}.fastq.gz") , optional:true, emit: mapped_refs + tuple val(meta), path("*_interleaved.fastq") , optional:true, emit: interleaved + tuple val(meta), path("*_singleton.fastq.gz") , optional:true, emit: singleton + tuple val(meta), path("*_other.fastq.gz") , optional:true, emit: other + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def output = ( interleave && ! meta.single_end ) ? "> ${prefix}_interleaved.fastq" : + meta.single_end ? "-1 ${prefix}_1.fastq.gz -s ${prefix}_singleton.fastq.gz" : + "-1 ${prefix}_1.fastq.gz -2 ${prefix}_2.fastq.gz -s ${prefix}_singleton.fastq.gz" + """ + samtools \\ + fastq \\ + $args \\ + --threads ${task.cpus-1} \\ + -0 ${prefix}_other.fastq.gz \\ + $input \\ + $output + + samtools \\ + view \\ + $args2 \\ + --threads ${task.cpus-1} \\ + $input \\ + | grep -v '*' | cut -f3 | sort -u > mapped_refs.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') + END_VERSIONS + """ +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/seqkit/grep/README.md Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,113 @@ +# NextFlow DSL2 Module + +```bash +SEQKIT_GREP +``` + +## Description + +Run `seqkit grep` command on reads in FASTQ format. Produces a filtered FASTQ file as per the filter strategy in the supplied input file. + +\ + + +### `input:` + +___ + +Type: `tuple` + +Takes in the following tuple of metadata (`meta`) and a list of reads of type `path` (`reads`) per sample (`id:`). + +Ex: + +```groovy +[ + [ id: 'FAL00870', + strandedness: 'unstranded', + single_end: true, + centrifuge_x: '/hpc/db/centrifuge/2022-04-12/ab' + ], + '/hpc/scratch/test/FAL000870/f1.merged.fq.gz' +] +``` + +\ + + +#### `meta` + +Type: Groovy Map + +A Groovy Map containing the metadata about the FASTQ file. + +Ex: + +```groovy +[ + id: 'FAL00870', + strandedness: 'unstranded', + single_end: true +] +``` + +\ + + +#### `reads` + +Type: `path` + +NextFlow input type of `path` pointing to FASTQ files on which `seqkit grep` should be run. + +\ + + +#### `pattern_file` + +Type: path + +NextFlow input type of `path` pointing to the pattern file which has the patterns, one per line, by which FASTQ sequence ids should be searched and whose reads will be extracted. + +\ + + +#### `args` + +Type: Groovy String + +String of optional command-line arguments to be passed to the tool. This can be mentioned in `process` scope within `withName:process_name` block using `ext.args` option within your `nextflow.config` file. + +Ex: + +```groovy +withName: 'SEQKIT_GREP' { + ext.args = '--only-positive-strand' +} +``` + +### `output:` + +___ + +Type: `tuple` + +Outputs a tuple of metadata (`meta` from `input:`) and and filtered gzipped FASTQ file. + +\ + + +#### `fastx` + +Type: `path` + +NextFlow output type of `path` pointing to the FASTQ format filtered gzipped file per sample (`id:`). + +\ + + +#### `versions` + +Type: `path` + +NextFlow output type of `path` pointing to the `.yml` file storing software versions for this process.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/seqkit/grep/main.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,90 @@ +process SEQKIT_GREP { + tag "$meta.id" + label 'process_low' + + module (params.enable_module ? "${params.swmodulepath}${params.fs}seqkit${params.fs}2.2.0" : null) + conda (params.enable_conda ? "bioconda::seqkit=2.2.0 conda-forge::sed=4.7 conda-forge::coreutils" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/seqkit:2.1.0--h9ee0642_0': + 'quay.io/biocontainers/seqkit:2.1.0--h9ee0642_0' }" + + input: + tuple val(meta), path(reads), path(pattern_file) + + output: + tuple val(meta), path("*.gz"), emit: fastx + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def num_read_files = reads.toList().size() + def extension = "fastq" + if ("$reads" ==~ /.+\.fasta|.+\.fasta.gz|.+\.fa|.+\.fa.gz|.+\.fas|.+\.fas.gz|.+\.fna|.+\.fna.gz/) { + extension = "fasta" + } + + if (meta.single_end || num_read_files == 1) { + """ + pattern_file_contents=\$(sed '1!d' $pattern_file) + if [ "\$pattern_file_contents" != "DuMmY" ]; then + cut -f1 -d " " $pattern_file > ${prefix}.seqids.txt + additional_args="-f ${prefix}.seqids.txt $args" + else + additional_args="$args" + fi + + seqkit \\ + grep \\ + -j $task.cpus \\ + -o ${prefix}.seqkit-grep.${extension}.gz \\ + \$additional_args \\ + $reads + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + seqkit: \$( seqkit | sed '3!d; s/Version: //' ) + END_VERSIONS + """ + } else { + """ + pattern_file_contents=\$(sed '1!d' $pattern_file) + if [ "\$pattern_file_contents" != "DuMmY" ]; then + additional_args="-f $pattern_file $args" + else + additional_args="$args" + fi + + seqkit \\ + grep \\ + -j $task.cpus \\ + -o ${prefix}.R1.seqkit-grep.${extension}.gz \\ + \$additional_args \\ + ${reads[0]} + + seqkit \\ + grep \\ + -j $task.cpus \\ + -o ${prefix}.R2.seqkit-grep.${extension}.gz \\ + \$additional_args \\ + ${reads[1]} + + seqkit \\ + pair \\ + -j $task.cpus \\ + -1 ${prefix}.R1.seqkit-grep.${extension}.gz \\ + -2 ${prefix}.R2.seqkit-grep.${extension}.gz + + rm ${prefix}.R1.seqkit-grep.${extension}.gz + rm ${prefix}.R2.seqkit-grep.${extension}.gz + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + seqkit: \$( seqkit | sed '3!d; s/Version: //' ) + END_VERSIONS + """ + } +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/seqkit/seq/README.md Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,104 @@ +# NextFlow DSL2 Module + +```bash +SEQKIT_SEQ +``` + +## Description + +Run `seqkit seq` command on reads in FASTQ format. Produces a filtered FASTQ file as per the filter strategy mentioned using the `ext.args` within the process scope. + +\ + + +### `input:` + +___ + +Type: `tuple` + +Takes in the following tuple of metadata (`meta`) and a list of reads of type `path` (`reads`) per sample (`id:`). + +Ex: + +```groovy +[ + [ id: 'FAL00870', + strandedness: 'unstranded', + single_end: true, + centrifuge_x: '/hpc/db/centrifuge/2022-04-12/ab' + ], + '/hpc/scratch/test/FAL000870/f1.merged.fq.gz' +] +``` + +\ + + +#### `meta` + +Type: Groovy Map + +A Groovy Map containing the metadata about the FASTQ file. + +Ex: + +```groovy +[ + id: 'FAL00870', + strandedness: 'unstranded', + single_end: true +] +``` + +\ + + +#### `reads` + +Type: `path` + +NextFlow input type of `path` pointing to FASTQ files on which `seqkit seq` should be run. + +\ + + +#### `args` + +Type: Groovy String + +String of optional command-line arguments to be passed to the tool. This can be mentioned in `process` scope within `withName:process_name` block using `ext.args` option within your `nextflow.config` file. + +Ex: + +```groovy +withName: 'SEQKIT_SEQ' { + ext.args = '--max-len 4000' +} +``` + +### `output:` + +___ + +Type: `tuple` + +Outputs a tuple of metadata (`meta` from `input:`) and filtered gzipped FASTQ file. + +\ + + +#### `fastx` + +Type: `path` + +NextFlow output type of `path` pointing to the FASTQ format filtered gzipped file per sample (`id:`). + +\ + + +#### `versions` + +Type: `path` + +NextFlow output type of `path` pointing to the `.yml` file storing software versions for this process.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/seqkit/seq/main.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,75 @@ +process SEQKIT_SEQ { + tag "$meta.id" + label 'process_micro' + + module (params.enable_module ? "${params.swmodulepath}${params.fs}seqkit${params.fs}2.2.0" : null) + conda (params.enable_conda ? "bioconda::seqkit=2.2.0" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/seqkit:2.1.0--h9ee0642_0': + 'quay.io/biocontainers/seqkit:2.1.0--h9ee0642_0' }" + + input: + tuple val(meta), path(reads) + + output: + tuple val(meta), path("*.gz"), emit: fastx + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + + def extension = "fastq" + if ("$reads" ==~ /.+\.fasta|.+\.fasta.gz|.+\.fa|.+\.fa.gz|.+\.fas|.+\.fas.gz|.+\.fna|.+\.fna.gz/) { + extension = "fasta" + } + + if (meta.single_end) { + """ + seqkit \\ + seq \\ + -j $task.cpus \\ + -o ${prefix}.seqkit-seq.${extension}.gz \\ + $args \\ + $reads + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + seqkit: \$( seqkit | sed '3!d; s/Version: //' ) + END_VERSIONS + """ + } else { + """ + seqkit \\ + seq \\ + -j $task.cpus \\ + -o ${prefix}.R1.seqkit-seq.${extension}.gz \\ + $args \\ + ${reads[0]} + + seqkit \\ + seq \\ + -j $task.cpus \\ + -o ${prefix}.R2.seqkit-seq.${extension}.gz \\ + $args \\ + ${reads[1]} + + seqkit \\ + pair \\ + -j $task.cpus \\ + -1 ${prefix}.R1.seqkit-seq.${extension}.gz \\ + -2 ${prefix}.R2.seqkit-seq.${extension}.gz + + rm ${prefix}.R1.seqkit-seq.${extension}.gz + rm ${prefix}.R2.seqkit-seq.${extension}.gz + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + seqkit: \$( seqkit | sed '3!d; s/Version: //' ) + END_VERSIONS + """ + } +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/sourmash/gather/main.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,60 @@ +process SOURMASH_GATHER { + tag "$meta.id" + label 'process_nano' + + module (params.enable_module ? "${params.swmodulepath}${params.fs}sourmash${params.fs}4.6.1" : null) + conda (params.enable_conda ? "conda-forge::python bioconda::sourmash=4.6.1" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/sourmash:4.6.1--hdfd78af_0': + 'quay.io/biocontainers/sourmash:4.6.1--hdfd78af_0' }" + + input: + tuple val(meta), path(signature), path(database) + val save_unassigned + val save_matches_sig + val save_prefetch + val save_prefetch_csv + + output: + tuple val(meta), path("*_hits.csv") , emit: result , optional: true + tuple val(meta), path("*_unassigned.sig.zip"), emit: unassigned , optional: true + tuple val(meta), path("*_matches.sig.zip") , emit: matches , optional: true + tuple val(meta), path("*_prefetch.sig.zip") , emit: prefetch , optional: true + tuple val(meta), path("*_prefetch.csv.gz") , emit: prefetchcsv , optional: true + tuple val(meta), path("*FAILED.txt") , emit: failed , optional: true + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def args2 = task.ext.args2 ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def unassigned = save_unassigned ? "--output-unassigned ${prefix}_unassigned.sig.zip" : '' + def matches = save_matches_sig ? "--save-matches ${prefix}_matches.sig.zip" : '' + def prefetch = save_prefetch ? "--save-prefetch ${prefix}_prefetch.sig.zip" : '' + def prefetchcsv = save_prefetch_csv ? "--save-prefetch-csv ${prefix}_prefetch.csv.gz" : '' + + """ + sourmash gather \\ + $args \\ + --output ${prefix}.csv.gz \\ + ${unassigned} \\ + ${matches} \\ + ${prefetch} \\ + ${prefetchcsv} \\ + ${signature} \\ + ${database} + + sourmash_filter_hits.py \\ + $args2 \\ + -csv ${prefix}.csv.gz + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + sourmash: \$(echo \$(sourmash --version 2>&1) | sed 's/^sourmash //' ) + python: \$( python --version | sed 's/Python //g' ) + END_VERSIONS + """ +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/sourmash/search/main.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,55 @@ +process SOURMASH_SEARCH { + tag "$meta.id" + label 'process_micro' + + module (params.enable_module ? "${params.swmodulepath}${params.fs}sourmash${params.fs}4.6.1" : null) + conda (params.enable_conda ? "conda-forge::python bioconda::sourmash=4.6.1" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/sourmash:4.6.1--hdfd78af_0': + 'quay.io/biocontainers/sourmash:4.6.1--hdfd78af_0' }" + + input: + tuple val(meta), path(signature), path(database) + val save_matches_sig + + output: + tuple val(meta), path("*.csv.gz") , emit: result , optional: true + tuple val(meta), path("*_scaffolded_genomic.fna.gz"), emit: genomes_fasta, optional: true + tuple val(meta), path("*_matches.sig.zip") , emit: matches , optional: true + path "*FAILED.txt" , emit: failed , optional: true + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def args2 = task.ext.args2 ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def matches = save_matches_sig ? "--save-matches ${prefix}_matches.sig.zip" : '' + def gd = params.tuspy_gd ? "-gd ${params.tuspy_gd}" : '' + + """ + sourmash search \\ + $args \\ + --output ${prefix}.csv.gz \\ + ${matches} \\ + ${signature} \\ + ${database} + + sourmash_filter_hits.py \\ + $args2 \\ + -csv ${prefix}.csv.gz + + gen_otf_genome.py \\ + $gd \\ + -op ${prefix} \\ + -txt ${prefix}_template_hits.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + sourmash: \$(echo \$(sourmash --version 2>&1) | sed 's/^sourmash //' ) + python: \$( python --version | sed 's/Python //g' ) + END_VERSIONS + """ +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/sourmash/sketch/main.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,43 @@ +process SOURMASH_SKETCH { + tag "$meta.id" + label 'process_nano' + + module (params.enable_module ? "${params.swmodulepath}${params.fs}sourmash${params.fs}4.6.1" : null) + conda (params.enable_conda ? "conda-forge::python bioconda::sourmash=4.6.1" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/sourmash:4.6.1--hdfd78af_0': + 'quay.io/biocontainers/sourmash:4.6.1--hdfd78af_0' }" + + input: + tuple val(meta), path(sequence) + val singleton + val merge + val db_or_query + + output: + tuple val(meta), path("*.{query,db}.sig"), emit: signatures + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + // required defaults for the tool to run, but can be overridden + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def merge_sig = merge ? "--merge ${meta.id}" : '' + def singleton = singleton ? '--singleton' : '' + """ + sourmash sketch \\ + $args \\ + $merge_sig \\ + $singleton \\ + --output "${prefix}.${db_or_query}.sig" \\ + $sequence + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + sourmash: \$(echo \$(sourmash --version 2>&1) | sed 's/^sourmash //' ) + END_VERSIONS + """ +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/modules/sourmash/tax/metagenome/main.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,40 @@ +process SOURMASH_TAX_METAGENOME { + tag "$meta.id" + label 'process_nano' + + module (params.enable_module ? "${params.swmodulepath}${params.fs}sourmash${params.fs}4.6.1" : null) + conda (params.enable_conda ? "conda-forge::python bioconda::sourmash=4.6.1" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/sourmash:4.6.1--hdfd78af_0': + 'quay.io/biocontainers/sourmash:4.6.1--hdfd78af_0' }" + + input: + tuple val(meta), path(csv), path(lineage) + + output: + tuple val(meta), path("*.txt"), emit: txt, optional: true + tuple val(meta), path("*.tsv"), emit: tsv, optional: true + tuple val(meta), path("*.csv"), emit: csv, optional: true + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + // required defaults for the tool to run, but can be overridden + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def output_format = args.findAll(/(--output-format\s+[\w\,]+)\s*/).join("").replaceAll(/\,/, / --output-format /) + args = args.replaceAll(/--output-format\s+[\w\,]+\s*/, /${output_format}/) + """ + sourmash tax metagenome \\ + $args \\ + -g $csv \\ + --output-base $prefix \\ + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + sourmash: \$(echo \$(sourmash --version 2>&1) | sed 's/^sourmash //' ) + END_VERSIONS + """ +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/nextflow.config Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,41 @@ +// Main driver script +manifest.mainScript = 'cpipes' + +def fs = File.separator +def pd = "${projectDir}" + +// Global parameters +includeConfig "${pd}${fs}conf${fs}manifest.config" +includeConfig "${pd}${fs}conf${fs}base.config" + +// Include FASTQ config to prepare for a case when the entry point is +// FASTQ metadata CSV or FASTQ input directory +includeConfig "${pd}${fs}conf${fs}fastq.config" + +if (params.pipeline != null) { + try { + includeConfig "${params.workflowsconf}${fs}${params.pipeline}.config" + } catch (Exception e) { + System.err.println('-'.multiply(params.linewidth) + "\n" + + "\033[0;31m${params.cfsanpipename} - ERROR\033[0m\n" + + '-'.multiply(params.linewidth) + "\n" + "\033[0;31mCould not load " + + "default pipeline configuration. Please provide a pipeline \n" + + "name using the --pipeline option.\n\033[0m" + '-'.multiply(params.linewidth) + "\n") + System.exit(1) + } +} + +// Include modules' config last. +includeConfig "${pd}${fs}conf${fs}logtheseparams.config" +includeConfig "${pd}${fs}conf${fs}modules.config" + +// Nextflow runtime profiles +conda.cacheDir = "${pd}${fs}kondagac_cache" +singularity.cacheDir = "${pd}${fs}cingularitygac_cache" + +// Clean up after successfull run +// cleanup = true + +profiles { + includeConfig "${pd}${fs}conf${fs}computeinfra.config" +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/readme/centriflaken.md Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,276 @@ +# CPIPES (CFSAN PIPELINES) + +## The modular pipeline repository at CFSAN, FDA + +**CPIPES** (CFSAN PIPELINES) is a collection of modular pipelines based on **NEXTFLOW**, +mostly for bioinformatics data analysis at **CFSAN, FDA.** + +--- + +### **centriflaken** + +--- +Precision long-read metagenomics sequencing for food safety by detection and assembly of Shiga toxin-producing Escherichia coli. + +#### Workflow Usage + +```bash +module load cpipes/0.4.0 + +cpipes --pipeline centriflaken [options] +``` + +Example: Run the default `centriflaken` pipeline with taxa of interest as *E. coli*. + +```bash +cd /hpc/scratch/$USER +mkdir nf-cpipes +cd nf-cpipes +cpipes --pipeline centriflaken --input /path/to/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov' +``` + +Example: Run the `centriflaken` pipeline with taxa of interest as *Salmonella*. In this mode, `SerotypeFinder` tool will be replaced with `SeqSero2` tool. + +```bash +cd /hpc/scratch/$USER +mkdir nf-cpipes +cd nf-cpipes +cpipes --pipeline centriflaken --centrifuge_extract_bug 'Salmonella' --input /path/to/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov' +``` + +#### `centriflaken` Help + +```text +[Kranti.Konganti@login2-slurm ]$ cpipes --pipeline centriflaken --help +N E X T F L O W ~ version 21.12.1-edge +Launching `/nfs/software/apps/cpipes/0.4.0/cpipes` [crazy_euler] - revision: 72db279311 +================================================================================ + (o) + ___ _ __ _ _ __ ___ ___ + / __|| '_ \ | || '_ \ / _ \/ __| +| (__ | |_) || || |_) || __/\__ \ + \___|| .__/ |_|| .__/ \___||___/ + | | | | + |_| |_| +-------------------------------------------------------------------------------- +A collection of modular pipelines at CFSAN, FDA. +-------------------------------------------------------------------------------- +Name : CPIPES +Author : Kranti.Konganti@fda.hhs.gov +Version : 0.4.0 +Center : CFSAN, FDA. +================================================================================ + +Workflow : centriflaken + +Author : Kranti.Konganti@fda.hhs.gov + +Version : 0.2.1 + + +Usage : cpipes --pipeline centriflaken [options] + + +Required : + +--input : Absolute path to directory containing FASTQ + files. The directory should contain only + FASTQ files as all the files within the + mentioned directory will be read. Ex: -- + input /path/to/fastq_pass + +--output : Absolute path to directory where all the + pipeline outputs should be stored. Ex: -- + output /path/to/output + +Other options : + +--metadata : Absolute path to metadata CSV file + containing five mandatory columns: sample, + fq1,fq2,strandedness,single_end. The fq1 + and fq2 columns contain absolute paths to + the FASTQ files. This option can be used in + place of --input option. This is rare. Ex: -- + metadata samplesheet.csv + +--fq_suffix : The suffix of FASTQ files (Unpaired reads + or R1 reads or Long reads) if an input + directory is mentioned via --input option. + Default: .fastq.gz + +--fq2_suffix : The suffix of FASTQ files (Paired-end reads + or R2 reads) if an input directory is + mentioned via --input option. Default: + false + +--fq_filter_by_len : Remove FASTQ reads that are less than this + many bases. Default: 4000 + +--fq_strandedness : The strandedness of the sequencing run. + This is mostly needed if your sequencing + run is RNA-SEQ. For most of the other runs, + it is probably safe to use unstranded for + the option. Default: unstranded + +--fq_single_end : SINGLE-END information will be auto- + detected but this option forces PAIRED-END + FASTQ files to be treated as SINGLE-END so + only read 1 information is included in auto- + generated samplesheet. Default: false + +--fq_filename_delim : Delimiter by which the file name is split + to obtain sample name. Default: _ + +--fq_filename_delim_idx : After splitting FASTQ file name by using + the --fq_filename_delim option, all + elements before this index (1-based) will + be joined to create final sample name. + Default: 1 + +--kraken2_db : Absolute path to kraken database. Default: / + hpc/db/kraken2/standard-210914 + +--kraken2_confidence : Confidence score threshold which must be + between 0 and 1. Default: 0.0 + +--kraken2_quick : Quick operation (use first hit or hits). + Default: false + +--kraken2_use_mpa_style : Report output like Kraken 1's kraken-mpa- + report. Default: false + +--kraken2_minimum_base_quality : Minimum base quality used in classification + which is only effective with FASTQ input. + Default: 0 + +--kraken2_report_zero_counts : Report counts for ALL taxa, even if counts + are zero. Default: false + +--kraken2_report_minmizer_data : Report minimizer and distinct minimizer + count information in addition to normal + Kraken report. Default: false + +--kraken2_use_names : Print scientific names instead of just + taxids. Default: true + +--kraken2_extract_bug : Extract the reads or contigs beloging to + this bug. Default: Escherichia coli + +--centrifuge_x : Absolute path to centrifuge database. + Default: /hpc/db/centrifuge/2022-04-12/ab + +--centrifuge_save_unaligned : Save SINGLE-END reads that did not align. + For PAIRED-END reads, save read pairs that + did not align concordantly. Default: false + +--centrifuge_save_aligned : Save SINGLE-END reads that aligned. For + PAIRED-END reads, save read pairs that + aligned concordantly. Default: false + +--centrifuge_out_fmt_sam : Centrifuge output should be in SAM. Default: + false + +--centrifuge_extract_bug : Extract this bug from centrifuge results. + Default: Escherichia coli + +--centrifuge_ignore_quals : Treat all quality values as 30 on Phred + scale. Default: false + +--flye_pacbio_raw : Input FASTQ reads are PacBio regular CLR + reads (<20% error) Defaut: false + +--flye_pacbio_corr : Input FASTQ reads are PacBio reads that + were corrected with other methods (<3% + error). Default: false + +--flye_pacbio_hifi : Input FASTQ reads are PacBio HiFi reads (<1% + error). Default: false + +--flye_nano_raw : Input FASTQ reads are ONT regular reads, + pre-Guppy5 (<20% error). Default: true + +--flye_nano_corr : Input FASTQ reads are ONT reads that were + corrected with other methods (<3% error). + Default: false + +--flye_nano_hq : Input FASTQ reads are ONT high-quality + reads: Guppy5+ SUP or Q20 (<5% error). + Default: false + +--flye_genome_size : Estimated genome size (for example, 5m or 2. + 6g). Default: 5.5m + +--flye_polish_iter : Number of genome polishing iterations. + Default: false + +--flye_meta : Do a metagenome assembly (unenven coverage + mode). Default: true + +--flye_min_overlap : Minimum overlap between reads. Default: + false + +--flye_scaffold : Enable scaffolding using assembly graph. + Default: false + +--serotypefinder_run : Run SerotypeFinder tool. Default: true + +--serotypefinder_x : Generate extended output files. Default: + true + +--serotypefinder_db : Path to SerotypeFinder databases. Default: / + hpc/db/serotypefinder/2.0.2 + +--serotypefinder_min_threshold : Minimum percent identity (in float) + required for calling a hit. Default: 0.85 + +--serotypefinder_min_cov : Minumum percent coverage (in float) + required for calling a hit. Default: 0.80 + +--seqsero2_run : Run SeqSero2 tool. Default: false + +--seqsero2_t : '1' for interleaved paired-end reads, '2' + for separated paired-end reads, '3' for + single reads, '4' for genome assembly, '5' + for nanopore reads (fasta/fastq). Default: + 4 + +--seqsero2_m : Which workflow to apply, 'a'(raw reads + allele micro-assembly), 'k'(raw reads and + genome assembly k-mer). Default: k + +--seqsero2_c : SeqSero2 will only output serotype + prediction without the directory containing + log files. Default: false + +--seqsero2_s : SeqSero2 will not output header in + SeqSero_result.tsv. Default: false + +--mlst_run : Run MLST tool. Default: true + +--mlst_minid : DNA %identity of full allelle to consider ' + similar' [~]. Default: 95 + +--mlst_mincov : DNA %cov to report partial allele at all [?]. + Default: 10 + +--mlst_minscore : Minumum score out of 100 to match a scheme. + Default: 50 + +--abricate_run : Run ABRicate tool. Default: true + +--abricate_minid : Minimum DNA %identity. Defaut: 90 + +--abricate_mincov : Minimum DNA %coverage. Defaut: 80 + +--abricate_datadir : ABRicate databases folder. Defaut: /hpc/db/ + abricate/1.0.1/db + +Help options : + +--help : Display this message. +``` + +### **BETA** + +--- +The development of the modular structure and flow is an ongoing effort and may change depending on assessment of various computational topics and other considerations.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/readme/centriflaken_hy.md Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,367 @@ +# CPIPES (CFSAN PIPELINES) + +## The modular pipeline repository at CFSAN, FDA + +**CPIPES** (CFSAN PIPELINES) is a collection of modular pipelines based on **NEXTFLOW**, +mostly for bioinformatics data analysis at **CFSAN, FDA.** + +--- + +### **centriflaken_hy** + +--- +`centriflaken_hy` is a variant of the original `centriflaken` pipeline but for Illumina short reads either single-end or paired-end. + +#### Workflow Usage + +```bash +module load cpipes/0.4.0 + +cpipes --pipeline centriflaken_hy [options] +``` + +Example: Run the default `centriflaken_hy` pipeline with taxa of interest as *E. coli*. + +```bash +cd /hpc/scratch/$USER +mkdir nf-cpipes +cd nf-cpipes +cpipes --pipeline centriflaken_hy --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov' +``` + +Example: Run the `centriflaken_hy` pipeline with taxa of interest as *Salmonella*. In this mode, `SerotypeFinder` tool will be replaced with `SeqSero2` tool. + +```bash +cd /hpc/scratch/$USER +mkdir nf-cpipes +cd nf-cpipes +cpipes --pipeline centriflaken_hy --centrifuge_extract_bug 'Salmonella' --input /path/to/illumina/fastq/dir --output /path/to/output --user_email 'Kranti.Konganti@fda.hhs.gov' +``` + +#### `centriflaken_hy` Help + +```text +[Kranti.Konganti@login2-slurm ]$ cpipes --pipeline centriflaken_hy --help +N E X T F L O W ~ version 21.12.1-edge +Launching `/home/Kranti.Konganti/apps/cpipes/cpipes` [soggy_curie] - revision: 72db279311 +================================================================================ + (o) + ___ _ __ _ _ __ ___ ___ + / __|| '_ \ | || '_ \ / _ \/ __| +| (__ | |_) || || |_) || __/\__ \ + \___|| .__/ |_|| .__/ \___||___/ + | | | | + |_| |_| +-------------------------------------------------------------------------------- +A collection of modular pipelines at CFSAN, FDA. +-------------------------------------------------------------------------------- +Name : CPIPES +Author : Kranti.Konganti@fda.hhs.gov +Version : 0.4.0 +Center : CFSAN, FDA. +================================================================================ + +Workflow : centriflaken_hy + +Author : Kranti.Konganti@fda.hhs.gov + +Version : 0.4.0 + + +Usage : cpipes --pipeline centriflaken_hy [options] + + +Required : + +--input : Absolute path to directory containing FASTQ + files. The directory should contain only + FASTQ files as all the files within the + mentioned directory will be read. Ex: -- + input /path/to/fastq_pass + +--output : Absolute path to directory where all the + pipeline outputs should be stored. Ex: -- + output /path/to/output + +Other options : + +--metadata : Absolute path to metadata CSV file + containing five mandatory columns: sample, + fq1,fq2,strandedness,single_end. The fq1 + and fq2 columns contain absolute paths to + the FASTQ files. This option can be used in + place of --input option. This is rare. Ex: -- + metadata samplesheet.csv + +--fq_suffix : The suffix of FASTQ files (Unpaired reads + or R1 reads or Long reads) if an input + directory is mentioned via --input option. + Default: _R1_001.fastq.gz + +--fq2_suffix : The suffix of FASTQ files (Paired-end reads + or R2 reads) if an input directory is + mentioned via --input option. Default: + _R2_001.fastq.gz + +--fq_filter_by_len : Remove FASTQ reads that are less than this + many bases. Default: 75 + +--fq_strandedness : The strandedness of the sequencing run. + This is mostly needed if your sequencing + run is RNA-SEQ. For most of the other runs, + it is probably safe to use unstranded for + the option. Default: unstranded + +--fq_single_end : SINGLE-END information will be auto- + detected but this option forces PAIRED-END + FASTQ files to be treated as SINGLE-END so + only read 1 information is included in auto- + generated samplesheet. Default: false + +--fq_filename_delim : Delimiter by which the file name is split + to obtain sample name. Default: _ + +--fq_filename_delim_idx : After splitting FASTQ file name by using + the --fq_filename_delim option, all + elements before this index (1-based) will + be joined to create final sample name. + Default: 1 + +--seqkit_rmdup_run : Remove duplicate sequences using seqkit + rmdup. Default: false + +--seqkit_rmdup_n : Match and remove duplicate sequences by + full name instead of just ID. Defaut: false + +--seqkit_rmdup_s : Match and remove duplicate sequences by + sequence content. Defaut: true + +--seqkit_rmdup_d : Save the duplicated sequences to a file. + Defaut: false + +--seqkit_rmdup_D : Save the number and list of duplicated + sequences to a file. Defaut: false + +--seqkit_rmdup_i : Ignore case while using seqkit rmdup. + Defaut: false + +--seqkit_rmdup_P : Only consider positive strand (i.e. 5') + when comparing by sequence content. Defaut: + false + +--kraken2_db : Absolute path to kraken database. Default: / + hpc/db/kraken2/standard-210914 + +--kraken2_confidence : Confidence score threshold which must be + between 0 and 1. Default: 0.0 + +--kraken2_quick : Quick operation (use first hit or hits). + Default: false + +--kraken2_use_mpa_style : Report output like Kraken 1's kraken-mpa- + report. Default: false + +--kraken2_minimum_base_quality : Minimum base quality used in classification + which is only effective with FASTQ input. + Default: 0 + +--kraken2_report_zero_counts : Report counts for ALL taxa, even if counts + are zero. Default: false + +--kraken2_report_minmizer_data : Report minimizer and distinct minimizer + count information in addition to normal + Kraken report. Default: false + +--kraken2_use_names : Print scientific names instead of just + taxids. Default: true + +--kraken2_extract_bug : Extract the reads or contigs beloging to + this bug. Default: Escherichia coli + +--centrifuge_x : Absolute path to centrifuge database. + Default: /hpc/db/centrifuge/2022-04-12/ab + +--centrifuge_save_unaligned : Save SINGLE-END reads that did not align. + For PAIRED-END reads, save read pairs that + did not align concordantly. Default: false + +--centrifuge_save_aligned : Save SINGLE-END reads that aligned. For + PAIRED-END reads, save read pairs that + aligned concordantly. Default: false + +--centrifuge_out_fmt_sam : Centrifuge output should be in SAM. Default: + false + +--centrifuge_extract_bug : Extract this bug from centrifuge results. + Default: Escherichia coli + +--centrifuge_ignore_quals : Treat all quality values as 30 on Phred + scale. Default: false + +--megahit_run : Run MEGAHIT assembler. Default: true + +--megahit_min_count : <int>. Minimum multiplicity for filtering ( + k_min+1)-mers. Defaut: false + +--megahit_k_list : Comma-separated list of kmer size. All + values must be odd, in the range 15-255, + increment should be <= 28. Ex: '21,29,39,59, + 79,99,119,141'. Default: false + +--megahit_no_mercy : Do not add mercy k-mers. Default: false + +--megahit_bubble_level : <int>. Intensity of bubble merging (0-2), 0 + to disable. Default: false + +--megahit_merge_level : <l,s>. Merge complex bubbles of length <= l* + kmer_size and similarity >= s. Default: + false + +--megahit_prune_level : <int>. Strength of low depth pruning (0-3). + Default: false + +--megahit_prune_depth : <int>. Remove unitigs with avg k-mer depth + less than this value. Default: false + +--megahit_low_local_ratio : <float>. Ratio threshold to define low + local coverage contigs. Default: false + +--megahit_max_tip_len : <int>. remove tips less than this value [< + int> * k]. Default: false + +--megahit_no_local : Disable local assembly. Default: false + +--megahit_kmin_1pass : Use 1pass mode to build SdBG of k_min. + Default: false + +--megahit_preset : <str>. Override a group of parameters. + Valid values are meta-sensitive which + enforces '--min-count 1 --k-list 21,29,39, + 49,...,129,141', meta-large (large & + complex metagenomes, like soil) which + enforces '--k-min 27 --k-max 127 --k-step + 10'. Default: meta-sensitive + +--megahit_mem_flag : <int>. SdBG builder memory mode. 0: minimum; + 1: moderate; 2: use all memory specified. + Default: 2 + +--megahit_min_contig_len : <int>. Minimum length of contigs to output. + Default: false + +--spades_run : Run SPAdes assembler. Default: false + +--spades_isolate : This flag is highly recommended for high- + coverage isolate and multi-cell data. + Defaut: false + +--spades_sc : This flag is required for MDA (single-cell) + data. Default: false + +--spades_meta : This flag is required for metagenomic data. + Default: true + +--spades_bio : This flag is required for biosytheticSPAdes + mode. Default: false + +--spades_corona : This flag is required for coronaSPAdes mode. + Default: false + +--spades_rna : This flag is required for RNA-Seq data. + Default: false + +--spades_plasmid : Runs plasmidSPAdes pipeline for plasmid + detection. Default: false + +--spades_metaviral : Runs metaviralSPAdes pipeline for virus + detection. Default: false + +--spades_metaplasmid : Runs metaplasmidSPAdes pipeline for plasmid + detection in metagenomics datasets. Default: + false + +--spades_rnaviral : This flag enables virus assembly module + from RNA-Seq data. Default: false + +--spades_iontorrent : This flag is required for IonTorrent data. + Default: false + +--spades_only_assembler : Runs only the SPAdes assembler module ( + without read error correction). Default: + false + +--spades_careful : Tries to reduce the number of mismatches + and short indels in the assembly. Default: + false + +--spades_cov_cutoff : Coverage cutoff value (a positive float + number). Default: false + +--spades_k : List of k-mer sizes (must be odd and less + than 128). Default: false + +--spades_hmm : Directory with custom hmms that replace the + default ones (very rare). Default: false + +--serotypefinder_run : Run SerotypeFinder tool. Default: true + +--serotypefinder_x : Generate extended output files. Default: + true + +--serotypefinder_db : Path to SerotypeFinder databases. Default: / + hpc/db/serotypefinder/2.0.2 + +--serotypefinder_min_threshold : Minimum percent identity (in float) + required for calling a hit. Default: 0.85 + +--serotypefinder_min_cov : Minumum percent coverage (in float) + required for calling a hit. Default: 0.80 + +--seqsero2_run : Run SeqSero2 tool. Default: false + +--seqsero2_t : '1' for interleaved paired-end reads, '2' + for separated paired-end reads, '3' for + single reads, '4' for genome assembly, '5' + for nanopore reads (fasta/fastq). Default: + 4 + +--seqsero2_m : Which workflow to apply, 'a'(raw reads + allele micro-assembly), 'k'(raw reads and + genome assembly k-mer). Default: k + +--seqsero2_c : SeqSero2 will only output serotype + prediction without the directory containing + log files. Default: false + +--seqsero2_s : SeqSero2 will not output header in + SeqSero_result.tsv. Default: false + +--mlst_run : Run MLST tool. Default: true + +--mlst_minid : DNA %identity of full allelle to consider ' + similar' [~]. Default: 95 + +--mlst_mincov : DNA %cov to report partial allele at all [?]. + Default: 10 + +--mlst_minscore : Minumum score out of 100 to match a scheme. + Default: 50 + +--abricate_run : Run ABRicate tool. Default: true + +--abricate_minid : Minimum DNA %identity. Defaut: 90 + +--abricate_mincov : Minimum DNA %coverage. Defaut: 80 + +--abricate_datadir : ABRicate databases folder. Defaut: /hpc/db/ + abricate/1.0.1/db + +Help options : + +--help : Display this message. +``` + +### **BETA** + +--- +The development of the modular structure and flow is an ongoing effort and may change depending on assessment of various computational topics and other considerations.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/subworkflows/process_fastq.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,144 @@ +// Include any necessary methods and modules +include { stopNow; validateParamsForFASTQ } from "${params.routines}" +include { GEN_SAMPLESHEET } from "${params.modules}${params.fs}gen_samplesheet${params.fs}main" +include { SAMPLESHEET_CHECK } from "${params.modules}${params.fs}samplesheet_check${params.fs}main" +include { CAT_FASTQ } from "${params.modules}${params.fs}cat${params.fs}fastq${params.fs}main" +include { SEQKIT_SEQ } from "${params.modules}${params.fs}seqkit${params.fs}seq${params.fs}main" + +// Validate 4 required workflow parameters if +// FASTQ files are the input for the +// entry point. +validateParamsForFASTQ() + +// Start the subworkflow +workflow PROCESS_FASTQ { + main: + versions = Channel.empty() + input_ch = Channel.empty() + reads = Channel.empty() + + def input = file( (params.input ?: params.metadata) ) + + if (params.input) { + def fastq_files = [] + + if (params.fq_suffix == null) { + stopNow("We need to know what suffix the FASTQ files ends with inside the\n" + + "directory. Please use the --fq_suffix option to indicate the file\n" + + "suffix by which the files are to be collected to run the pipeline on.") + } + + if (params.fq_strandedness == null) { + stopNow("We need to know if the FASTQ files inside the directory\n" + + "are sequenced using stranded or non-stranded sequencing. This is generally\n" + + "required if the sequencing experiment is RNA-SEQ. For almost all of the other\n" + + "cases, you can probably use the --fq_strandedness unstranded option to indicate\n" + + "that the reads are unstranded.") + } + + if (params.fq_filename_delim == null || params.fq_filename_delim_idx == null) { + stopNow("We need to know the delimiter of the filename of the FASTQ files.\n" + + "By default the filename delimiter is _ (underscore). This delimiter character\n" + + "is used to split and assign a group name. The group name can be controlled by\n" + + "using the --fq_filename_delim_idx option (1-based). For example, if the FASTQ\n" + + "filename is WT_REP1_001.fastq, then to create a group WT, use the following\n" + + "options: --fq_filename_delim _ --fq_filename_delim_idx 1") + } + + if (!input.exists()) { + stopNow("The input directory,\n${params.input}\ndoes not exist!") + } + + input.eachFileRecurse { + it.name.endsWith("${params.fq_suffix}") ? fastq_files << it : fastq_files << null + } + + if (fastq_files.findAll{ it != null }.size() == 0) { + stopNow("The input directory,\n${params.input}\nis empty! or does not " + + "have FASTQ files ending with the suffix: ${params.fq_suffix}") + } + + GEN_SAMPLESHEET( Channel.fromPath(params.input, type: 'dir') ) + GEN_SAMPLESHEET.out.csv.set{ input_ch } + versions.mix( GEN_SAMPLESHEET.out.versions ) + .set { versions } + } else if (params.metadata) { + if (!input.exists()) { + stopNow("The metadata CSV file,\n${params.metadata}\ndoes not exist!") + } + + if (input.size() <= 0) { + stopNow("The metadata CSV file,\n${params.metadata}\nis empty!") + } + + Channel.fromPath(params.metadata, type: 'file') + .set { input_ch } + } + + SAMPLESHEET_CHECK( input_ch ) + .csv + .splitCsv( header: true, sep: ',') + .map { create_fastq_channel(it) } + .groupTuple(by: [0]) + .branch { + meta, fastq -> + single : fastq.size() == 1 + return [ meta, fastq.flatten() ] + multiple : fastq.size() > 1 + return [ meta, fastq.flatten() ] + } + .set { reads } + + CAT_FASTQ( reads.multiple ) + .catted_reads + .mix( reads.single ) + .set { processed_reads } + + if (params.fq_filter_by_len.toInteger() > 0) { + SEQKIT_SEQ( processed_reads ) + .fastx + .set { processed_reads } + + versions.mix( SEQKIT_SEQ.out.versions.first().ifEmpty(null) ) + .set { versions } + } + + versions.mix( + SAMPLESHEET_CHECK.out.versions, + CAT_FASTQ.out.versions.first().ifEmpty(null) + ) + .set { versions } + + emit: + processed_reads + versions +} + +// Function to get list of [ meta, [ fq1, fq2 ] ] +def create_fastq_channel(LinkedHashMap row) { + + def meta = [:] + meta.id = row.sample + meta.single_end = row.single_end.toBoolean() + meta.strandedness = row.strandedness + meta.id = meta.id.split(params.fq_filename_delim)[0..params.fq_filename_delim_idx.toInteger() - 1] + .join(params.fq_filename_delim) + meta.id = (meta.id =~ /\./ ? meta.id.take(meta.id.indexOf('.')) : meta.id) + + def array = [] + + if (!file(row.fq1).exists()) { + stopNow("Please check input metadata CSV. The following Read 1 FASTQ file does not exist!" + + "\n${row.fq1}") + } + if (meta.single_end) { + array = [ meta, [ file(row.fq1) ] ] + } else { + if (!file(row.fq2).exists()) { + stopNow("Please check input metadata CSV. The following Read 2 FASTQ file does not exist!" + + "\n${row.fq2}") + } + array = [ meta, [ file(row.fq1), file(row.fq2) ] ] + } + return array +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/subworkflows/prodka.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,45 @@ +// Include any necessary methods and modules +include { PRODIGAL } from "${params.modules}${params.fs}prodigal${params.fs}main" +include { PROKKA } from "${params.modules}${params.fs}prokka${params.fs}main" + +// Start the subworkflow +workflow PRODKA { + take: + trained_asm + predict_asm + + main: + PRODIGAL( + trained_asm, + (params.prodigal_f ?: 'gbk') + ) + + PROKKA( + predict_asm + .join(PRODIGAL.out.proteins) + .join(PRODIGAL.out.trained) + ) + + PRODIGAL.out.versions + .mix( PROKKA.out.versions ) + .set{ versions } + emit: + prodigal_gene_annots = PRODIGAL.out.gene_annotations + prodigal_fna = PRODIGAL.out.cds + prodigal_faa = PRODIGAL.out.proteins + prodigal_all_gene_annots = PRODIGAL.out.all_gene_annotations + prodigal_trained = PRODIGAL.out.trained + prokka_gff = PROKKA.out.gff + prokka_gbk = PROKKA.out.gbk + prokka_fna = PROKKA.out.fna + prokka_sqn = PROKKA.out.sqn + prokka_ffn = PROKKA.out.ffn + prokka_fsa = PROKKA.out.fsa + prokka_faa = PROKKA.out.faa + prokka_tbl = PROKKA.out.tbl + prokka_err = PROKKA.out.err + prokka_log = PROKKA.out.log + prokka_txt = PROKKA.out.txt + prokka_tsv = PROKKA.out.tsv + versions +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/workflows/conf/nowayout.config Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,180 @@ +params { + workflow_conceived_by = 'Kranti Konganti' + workflow_built_by = 'Kranti Konganti' + workflow_version = '0.5.0' + db_mode = 'mitomine' + db_root = '/galaxy/cfsan-centriflaken-db/nowayout' + nowo_thresholds = 'strict' + fastp_run = true + fastp_failed_out = false + fastp_merged_out = false + fastp_overlapped_out = false + fastp_6 = false + fastp_reads_to_process = 0 + fastp_fix_mgi_id = false + fastp_A = false + fastp_use_custom_adapters = false + fastp_adapter_fasta = (params.fastp_use_custom_adapters ? "${projectDir}" + + File.separator + + 'assets' + + File.separator + + 'adaptors.fa' : false) + fastp_f = 0 + fastp_t = 0 + fastp_b = 0 + fastp_F = 0 + fastp_T = 0 + fastp_B = 0 + fastp_dedup = true + fastp_dup_calc_accuracy = 6 + fastp_poly_g_min_len = 10 + fastp_G = true + fastp_x = false + fastp_poly_x_min_len = 10 + fastp_cut_front = true + fastp_cut_tail = false + fastp_cut_right = true + fastp_W = 20 + fastp_M = 30 + fastp_q = 30 + fastp_u = 40 + fastp_n = 5 + fastp_e = 0 + fastp_l = 35 + fastp_max_len = 0 + fastp_y = true + fastp_Y = 30 + fastp_U = false + fastp_umi_loc = false + fastp_umi_len = false + fastp_umi_prefix = false + fastp_umi_skip = false + fastp_p = true + fastp_P = 20 + kmaalign_run = true + kmaalign_idx = ("${params.db_root}" + + File.separator + + "kma" + + File.separator + + "${params.db_mode}") + kmaalign_ignorequals = false + kmaalign_int = false + kmaalign_ef = false + kmaalign_vcf = false + kmaalign_sam = false + kmaalign_nc = true + kmaalign_na = true + kmaalign_nf = false + kmaalign_a = false + kmaalign_and = true + kmaalign_oa = false + kmaalign_bc = false + kmaalign_bcNano = false + kmaalign_bcd = false + kmaalign_bcg = false + kmaalign_ID = (params.nowo_thresholds =~ /strict|mild/ ? 85.0 : 50.0) + kmaalign_md = false + kmaalign_dense = false + kmaalign_ref_fsa = false + kmaalign_Mt1 = false + kmaalign_1t1 = false + kmaalign_mrs = (params.nowo_thresholds ==~ /strict/ ? 0.99 : 0.90) + kmaalign_mrc = (params.nowo_thresholds ==~ /strict/ ? 0.99 : 0.90) + kmaalign_mp = (params.nowo_thresholds ==~ /strict/ ? 30 : 20) + kmaalign_eq = (params.nowo_thresholds ==~ /strict/ ? 30 : 20) + kmaalign_mrs = (params.nowo_thresholds ==~ /mild/ ? 0.90 : params.kmaalign_mrs) + kmaalign_mrc = (params.nowo_thresholds ==~ /mild/ ? 0.90 : params.kmaalign_mrc) + kmaalign_mp = (params.nowo_thresholds ==~ /mild/ ? 20 : params.kmaalign_mp) + kmaalign_eq = (params.nowo_thresholds ==~ /mild/ ? 20 : params.kmaalign_eq) + kmaalign_mp = (params.kmaalign_ignorequals ? 0 : params.kmaalign_mp) + kmaalign_eq = (params.kmaalign_ignorequals ? 0 : params.kmaalign_eq) + kmaalign_mq = false + kmaalign_5p = false + kmaalign_3p = false + kmaalign_apm = false + kmaalign_cge = false + tuspy_gd = false + seqkit_grep_run = true + seqkit_grep_n = false + seqkit_grep_s = false + seqkit_grep_c = false + seqkit_grep_C = false + seqkit_grep_i = false + seqkit_grep_v = false + seqkit_grep_m = false + seqkit_grep_r = false + salmonidx_run = true + salmonidx_k = false + salmonidx_gencode = false + salmonidx_features = false + salmonidx_keepDuplicates = true + salmonidx_keepFixedFasta = false + salmonidx_filterSize = false + salmonidx_sparse = false + salmonidx_n = true + salmonidx_decoys = false + salmonalign_libtype = 'SF' + ref_fna = ("${params.db_root}" + + File.separator + + "reference" + + File.separator + + "${params.db_mode}" + + ".fna") + sourmash_k = (params.nowo_thresholds ==~ /strict/ ? 71 : 51) + sourmash_scale = (params.nowo_thresholds ==~ /strict/ ? 100 : 100) + sourmashsketch_run = true + sourmashsketch_mode = 'dna' + sourmashsketch_file = false + sourmashsketch_f = false + sourmashsketch_name = false + sourmashsketch_p = "'abund,scaled=${params.sourmash_scale},k=${params.sourmash_k}'" + sourmashsketch_randomize = false + sourmashgather_run = (params.sourmashsketch_run ?: false) + sourmashgather_n = false + sourmashgather_thr_bp = (params.nowo_thresholds ==~ /strict/ ? 100 : 100) + sourmashgather_ignoreabn = false + sourmashgather_prefetch = false + sourmashgather_noprefetch = false + sourmashgather_ani_ci = true + sourmashgather_k = "${params.sourmash_k}" + sourmashgather_protein = false + sourmashgather_rna = false + sourmashgather_nuc = false + sourmashgather_noprotein = false + sourmashgather_dayhoff = false + sourmashgather_nodayhoff = false + sourmashgather_hp = false + sourmashgather_nohp = false + sourmashgather_dna = true + sourmashgather_nodna = false + sourmashgather_scaled = false + sourmashgather_inc_pat = false + sourmashgather_exc_pat = false + sfhpy_run = true + sfhpy_fcn = 'f_match' + sfhpy_fcv = (params.nowo_thresholds ==~ /strict/ ? "0.8" : "0.5") + sfhpy_gt = true + sfhpy_lt = false + sfhpy_all = true + lineages_csv = ("${params.db_root}" + + File.separator + + "taxonomy" + + File.separator + + "${params.db_mode}" + + File.separator + + "lineages.csv") + gsalkronapy_run = true + gsalkronapy_sf = 10000 + gsalkronapy_smres_suffix = false + gsalkronapy_failed_suffix = false + gsalkronapy_num_lin_cols = false + gsalkronapy_lin_regex = false + krona_ktIT_run = true + krona_ktIT_n = 'all' + krona_ktIT_q = false + krona_ktIT_c = false + krona_res_suffix = '.krona.tsv' + fq_filter_by_len = 0 + fq_suffix = (params.fq_single_end ? '.fastq.gz' : '_R1_001.fastq.gz') + fq2_suffix = '_R2_001.fastq.gz' +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/workflows/conf/process/nowayout.process.config Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,121 @@ +process { + withName: 'SEQKIT_SEQ' { + ext.args = [ + params.fq_filter_by_len ? "-m ${params.fq_filter_by_len}" : '' + ].join(' ').trim() + } + + // withName: 'SAMTOOLS_FASTQ' { + // ext.args = (params.fq_single_end ? '-F 4' : '-f 2') + // } + + if (params.fastp_run) { + withName: 'FASTP' { + ext.args = addParamsToSummary( + loadThisFunction("${params.toolshelp}${params.fs}fastp.nf").fastpHelp(params).helpparams + ) + } + } + + if (params.kmaalign_run) { + withName: 'KMA_ALIGN' { + ext.args = addParamsToSummary( + loadThisFunction("${params.toolshelp}${params.fs}kmaalign.nf").kmaalignHelp(params).helpparams + ) + } + } + + if (params.seqkit_grep_run) { + withName: 'SEQKIT_GREP' { + ext.args = addParamsToSummary( + loadThisFunction("${params.toolshelp}${params.fs}seqkitgrep.nf").seqkitgrepHelp(params).helpparams + ) + } + } + + if (params.salmonidx_run){ + withName: 'SALMON_INDEX' { + ext.args = addParamsToSummary( + loadThisFunction("${params.toolshelp}${params.fs}salmonidx.nf").salmonidxHelp(params).helpparams + ) + } + + withName: 'SALMON_QUANT' { + errorStrategy = 'ignore' + ext.args = '--minAssignedFrags 1' + } + } + + if (params.sourmashsketch_run) { + withName: 'SOURMASH_SKETCH' { + ext.args = addParamsToSummary( + loadThisFunction("${params.toolshelp}${params.fs}sourmashsketch.nf").sourmashsketchHelp(params).helpparams + ) + } + } + + if (params.sourmashgather_run) { + withName: 'SOURMASH_GATHER' { + ext.args = addParamsToSummary( + loadThisFunction("${params.toolshelp}${params.fs}sourmashgather.nf").sourmashgatherHelp(params).helpparams + ) + + if (params.sfhpy_run) { + ext.args2 = addParamsToSummary( + loadThisFunction("${params.toolshelp}${params.fs}sfhpy.nf").sfhpyHelp(params).helpparams + ) + } + } + } + + // if (params.sourmashtaxmetagenome_run) { + // withName: 'SOURMASH_TAX_METAGENOME' { + // ext.args = addParamsToSummary( + // loadThisFunction("${params.toolshelp}${params.fs}sourmashtaxmetagenome.nf").sourmashtaxmetagenomeHelp(params).helpparams + // ) + // } + // } + + if (params.gsalkronapy_run) { + withName: 'NOWAYOUT_RESULTS' { + ext.args = addParamsToSummary( + loadThisFunction("${params.toolshelp}${params.fs}gsalkronapy.nf").gsalkronapyHelp(params).helpparams + ) + } + } + + if (params.krona_ktIT_run) { + withName: 'KRONA_KTIMPORTTEXT' { + ext.args = addParamsToSummary( + loadThisFunction("${params.toolshelp}${params.fs}kronaktimporttext.nf").kronaktimporttextHelp(params).helpparams + ) + } + } +} + +// Method to instantiate a new function parser +// Need to refactor using ScriptParser... another day +def loadThisFunction (func_file) { + GroovyShell grvy_sh = new GroovyShell() + def func = grvy_sh.parse(new File ( func_file ) ) + return func +} + +// Method to add relevant final parameters to summary log +def addParamsToSummary(Map params_to_add = [:]) { + + if (!params_to_add.isEmpty()) { + def not_null_params_to_add = params_to_add.findAll { + it.value.clivalue != null && + it.value.clivalue != '[:]' && + it.value.clivalue != '' + } + + params.logtheseparams += not_null_params_to_add.keySet().toList() + + return not_null_params_to_add.collect { + "${it.value.cliflag} ${it.value.clivalue.toString().replaceAll(/(?:^\s+|\s+$)/, '')}" + }.join(' ').trim() + } + return 1 +} \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/0.5.0/workflows/nowayout.nf Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,340 @@ +// Define any required imports for this specific workflow +import java.nio.file.Paths +import java.util.zip.GZIPInputStream +import java.io.FileInputStream +import nextflow.file.FileHelper + + +// Include any necessary methods +include { \ + summaryOfParams; stopNow; fastqEntryPointHelp; sendMail; \ + addPadding; wrapUpHelp } from "${params.routines}" +include { fastpHelp } from "${params.toolshelp}${params.fs}fastp" +include { kmaalignHelp } from "${params.toolshelp}${params.fs}kmaalign" +include { seqkitgrepHelp } from "${params.toolshelp}${params.fs}seqkitgrep" +include { salmonidxHelp } from "${params.toolshelp}${params.fs}salmonidx" +include { sourmashsketchHelp } from "${params.toolshelp}${params.fs}sourmashsketch" +include { sourmashgatherHelp } from "${params.toolshelp}${params.fs}sourmashgather" +include { sfhpyHelp } from "${params.toolshelp}${params.fs}sfhpy" +include { gsalkronapyHelp } from "${params.toolshelp}${params.fs}gsalkronapy" +include { kronaktimporttextHelp } from "${params.toolshelp}${params.fs}kronaktimporttext" + +// Exit if help requested before any subworkflows +if (params.help) { + log.info help() + exit 0 +} + + +// Include any necessary modules and subworkflows +include { PROCESS_FASTQ } from "${params.subworkflows}${params.fs}process_fastq" +include { FASTP } from "${params.modules}${params.fs}fastp${params.fs}main" +include { KMA_ALIGN } from "${params.modules}${params.fs}kma${params.fs}align${params.fs}main" +include { OTF_GENOME } from "${params.modules}${params.fs}otf_genome${params.fs}main" +include { SEQKIT_GREP } from "${params.modules}${params.fs}seqkit${params.fs}grep${params.fs}main" +include { SALMON_INDEX } from "${params.modules}${params.fs}salmon${params.fs}index${params.fs}main" +include { SALMON_QUANT } from "${params.modules}${params.fs}salmon${params.fs}quant${params.fs}main" +include { SOURMASH_SKETCH } from "${params.modules}${params.fs}sourmash${params.fs}sketch${params.fs}main" +include { SOURMASH_SKETCH \ + as REDUCE_DB_IDX } from "${params.modules}${params.fs}sourmash${params.fs}sketch${params.fs}main" +include { SOURMASH_GATHER } from "${params.modules}${params.fs}sourmash${params.fs}gather${params.fs}main" +include { NOWAYOUT_RESULTS } from "${params.modules}${params.fs}nowayout_results${params.fs}main" +include { KRONA_KTIMPORTTEXT } from "${params.modules}${params.fs}krona${params.fs}ktimporttext${params.fs}main" +include { DUMP_SOFTWARE_VERSIONS } from "${params.modules}${params.fs}custom${params.fs}dump_software_versions${params.fs}main" +include { MULTIQC } from "${params.modules}${params.fs}multiqc${params.fs}main" + +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + INPUTS AND ANY CHECKS FOR THE BETTERCALLSAL WORKFLOW +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/ + +def reads_platform = 0 +reads_platform += (params.input ? 1 : 0) + +if (reads_platform < 1 || reads_platform == 0) { + stopNow("Please mention at least one absolute path to input folder which contains\n" + + "FASTQ files sequenced using the --input option.\n" + + "Ex: --input (Illumina or Generic short reads in FASTQ format)") +} + +params.fastp_adapter_fasta ? checkMetadataExists(params.fastp_adapter_fasta, 'Adapter sequences FASTA') : null +checkMetadataExists(params.lineages_csv, 'Lineages CSV') +checkMetadataExists(params.kmaalign_idx, 'KMA Indices') +checkMetadataExists(params.ref_fna, 'FASTA reference') + +ch_sourmash_lin = file( params.lineages_csv ) + + +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + RUN THE BETTERCALLSAL WORKFLOW +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/ + +workflow NOWAYOUT { + main: + log.info summaryOfParams() + + PROCESS_FASTQ() + + PROCESS_FASTQ.out.versions + .set { software_versions } + + PROCESS_FASTQ.out.processed_reads + .set { ch_processed_reads } + + ch_processed_reads + .map { meta, fastq -> + meta.get_kma_hit_accs = true + meta.salmon_decoys = params.dummyfile + meta.salmon_lib_type = (params.salmonalign_libtype ?: false) + meta.kma_t_db = params.kmaalign_idx + [ meta, fastq ] + } + .filter { meta, fastq -> + fq_file = ( fastq.getClass().toString() =~ /ArrayList/ ? fastq : [ fastq ] ) + fq_gzip = new GZIPInputStream( new FileInputStream( fq_file[0].toAbsolutePath().toString() ) ) + fq_gzip.read() != -1 + } + .set { ch_processed_reads } + + FASTP( ch_processed_reads ) + + FASTP.out.json + .map { meta, json -> + json + } + .collect() + .set { ch_multiqc } + + KMA_ALIGN( + FASTP.out.passed_reads + .map { meta, fastq -> + [meta, fastq, []] + } + ) + + OTF_GENOME( + KMA_ALIGN.out.hits + .join(KMA_ALIGN.out.frags) + ) + + OTF_GENOME.out.reads_extracted + .filter { meta, fasta -> + fa_file = ( fasta.getClass().toString() =~ /ArrayList/ ? fasta : [ fasta ] ) + fa_gzip = new GZIPInputStream( new FileInputStream( fa_file[0].toAbsolutePath().toString() ) ) + fa_gzip.read() != -1 + } + .set { ch_mito_aln_reads } + + SEQKIT_GREP( + KMA_ALIGN.out.hits + .filter { meta, mapped_refs -> + patterns = file( mapped_refs ) + patterns.size() > 0 + } + .map { meta, mapped_refs -> + [meta, params.ref_fna, mapped_refs] + } + ) + + SALMON_INDEX( SEQKIT_GREP.out.fastx ) + + SALMON_QUANT( + ch_mito_aln_reads + .join( SALMON_INDEX.out.idx ) + ) + + REDUCE_DB_IDX( + SEQKIT_GREP.out.fastx, + true, + false, + 'db' + ) + + SOURMASH_SKETCH( + ch_mito_aln_reads, + false, + false, + 'query' + ) + + SOURMASH_GATHER( + SOURMASH_SKETCH.out.signatures + .join( REDUCE_DB_IDX.out.signatures ), + [], [], [], [] + ) + + // SOURMASH_TAX_METAGENOME( + // SOURMASH_GATHER.out.result + // .groupTuple(by: [0]) + // .map { meta, csv -> + // [ meta, csv, ch_sourmash_lin ] + // } + // ) + + // SOURMASH_TAX_METAGENOME.out.csv + // .map { meta, csv -> + // csv + // } + // .set { ch_lin_csv } + + // SOURMASH_TAX_METAGENOME.out.tsv + // .tap { ch_lin_krona } + // .map { meta, tsv -> + // tsv + // } + // .tap { ch_lin_tsv } + + SOURMASH_GATHER.out.result + .groupTuple(by: [0]) + .map { meta, csv -> + [ csv ] + } + .concat( + SALMON_QUANT.out.results + .map { meta, salmon_res -> + [ salmon_res ] + } + ) + .concat( + SOURMASH_GATHER.out.failed + .map { meta, failed -> + [ failed ] + } + ) + .concat( OTF_GENOME.out.failed ) + .collect() + .flatten() + .collect() + .set { ch_gene_abn } + + NOWAYOUT_RESULTS( ch_gene_abn, ch_sourmash_lin ) + + NOWAYOUT_RESULTS.out.tsv + .flatten() + .filter { tsv -> tsv.toString() =~ /.*${params.krona_res_suffix}$/ } + .map { tsv -> + meta = [:] + meta.id = "${params.cfsanpipename}_${params.pipeline}_krona" + [ meta, tsv ] + } + .groupTuple(by: [0]) + .set { ch_lin_krona } + + // ch_lin_tsv + // .mix( ch_lin_csv ) + // .collect() + // .set { ch_lin_summary } + + // SOURMASH_TAX_METAGENOME.out.txt + // .map { meta, txt -> + // txt + // } + // .collect() + // .set { ch_lin_kreport } + + // NOWAYOUT_RESULTS( + // ch_lin_summary + // .concat( SOURMASH_GATHER.out.failed ) + // .concat( OTF_GENOME.out.failed ) + // .collect() + // ) + + KRONA_KTIMPORTTEXT( ch_lin_krona ) + + DUMP_SOFTWARE_VERSIONS( + software_versions + .mix ( + FASTP.out.versions, + KMA_ALIGN.out.versions, + SEQKIT_GREP.out.versions, + REDUCE_DB_IDX.out.versions, + SOURMASH_SKETCH.out.versions, + SOURMASH_GATHER.out.versions, + SALMON_INDEX.out.versions, + SALMON_QUANT.out.versions, + NOWAYOUT_RESULTS.out.versions, + KRONA_KTIMPORTTEXT.out.versions + ) + .unique() + .collectFile(name: 'collected_versions.yml') + ) + + DUMP_SOFTWARE_VERSIONS.out.mqc_yml + .concat( + ch_multiqc, + NOWAYOUT_RESULTS.out.mqc_yml + ) + .collect() + .flatten() + .collect() + .set { ch_multiqc } + + MULTIQC( ch_multiqc ) +} + +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + ON COMPLETE, SHOW GORY DETAILS OF ALL PARAMS WHICH WILL BE HELPFUL TO DEBUG +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/ + +workflow.onComplete { + if (workflow.success) { + sendMail() + } +} + +workflow.onError { + sendMail() +} + +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + METHOD TO CHECK METADATA EXISTENCE +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/ + +def checkMetadataExists(file_path, msg) { + file_path_obj = file( file_path ) + + if (msg.toString().find(/(?i)KMA/)) { + if (!file_path_obj.parent.exists() || file_path_obj.parent.size() == 0) { + stopNow("Please check if your ${msg}\n" + + "[ ${file_path} ]\nexists and that the files are not of size 0.") + } + } + else if (!file_path_obj.exists() || file_path_obj.size() == 0) { + stopNow("Please check if your ${msg} file\n" + + "[ ${file_path} ]\nexists and is not of size 0.") + } +} + +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + HELP TEXT METHODS FOR BETTERCALLSAL WORKFLOW +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/ + +def help() { + + Map helptext = [:] + + helptext.putAll ( + fastqEntryPointHelp() + + fastpHelp(params).text + + kmaalignHelp(params).text + + seqkitgrepHelp(params).text + + salmonidxHelp(params).text + + sourmashsketchHelp(params).text + + sourmashgatherHelp(params).text + + sfhpyHelp(params).text + + gsalkronapyHelp(params).text + + kronaktimporttextHelp(params).text + + wrapUpHelp() + ) + + return addPadding(helptext) +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/hfp_nowayout.xml Mon Mar 31 14:50:40 2025 -0400 @@ -0,0 +1,199 @@ +<tool id="hfp_nowayout" name="nowayout" version="0.5.0+galaxy24"> + <description>An automated workflow to identify Mitochondrial reads and classify Eukaryotes.</description> + <requirements> + <container type="docker">quay.io/biocontainers/nextflow:24.10.4--hdfd78af_0</container> + </requirements> + <version_command>nextflow -version</version_command> + <command detect_errors="exit_code"><![CDATA[ + input_path=\$(pwd)"/cpipes-input"; + mkdir -p "\${input_path}" || exit 1; + #import re + #if (str($input_read_type_cond.input_read_type) == "single_long"): + #for _, $unpaired in enumerate($input_read_type_cond.input): + #set read1 = str($unpaired.name) + #if not str($unpaired.name).endswith(('.fastq', '.fastq.gz')): + #set read1_ext = re.sub('fastqsanger', 'fastq', str($unpaired.ext)) + #set read1 = str($unpaired.name) + str('.') + $read1_ext + #end if + ln -sf '$unpaired' "\${input_path}/$read1"; + #end for + #elif (str($input_read_type_cond.input_read_type) == "paired"): + #for _, $pair in enumerate($input_read_type_cond.input_pair) + #set read_R1 = re.sub('\:forward', '_forward', str($pair.forward.name)) + #set read_R2 = re.sub('\:reverse', '_reverse', str($pair.reverse.name)) + #set read_R1_ext = re.sub('fastqsanger', 'fastq', str($pair.forward.ext)) + #set read_R2_ext = re.sub('fastqsanger', 'fastq', str($pair.reverse.ext)) + #if not str($pair.forward.name).endswith(('.fastq', '.fastq.gz')): + #set read_R1 = $read_R1 + str('.') + $read_R1_ext + #end if + #if not str($pair.reverse.name).endswith(('.fastq', '.fastq.gz')): + #set read_R2 = $read_R2 + str('.') + $read_R2_ext + #end if + ln -sf '$pair.forward' "\${input_path}/$read_R1"; + ln -sf '$pair.reverse' "\${input_path}/$read_R2"; + #end for + #end if + $__tool_directory__/0.5.0/cpipes + --pipeline nowayout + --input \${input_path} + --output cpipes-output + --fq_suffix '${input_read_type_cond.fq_suffix}' + #if (str($input_read_type_cond.input_read_type) == "single_long"): + --fq_single_end true + #elif (str($input_read_type_cond.input_read_type) == "paired"): + --fq_single_end false --fq2_suffix '${input_read_type_cond.fq2_suffix}' + #end if + --db_mode $nowo_db_mode + --nowo_thresholds $nowo_thresholds + --fq_filename_delim '${fq_filename_delim}' + --fq_filename_delim_idx $fq_filename_delim_idx + -profile gxkubernetes; + mv './cpipes-output/nowayout-multiqc/multiqc_report.html' './multiqc_report.html' || exit 1; + mv './cpipes-output/krona_ktimporttext/CPIPES_nowayout_krona.html './CPIPES_nowayout_krona.html' || exit 1; + rm -rf ./cpipes-output || exit 1; + rm -rf ./work || exit 1; + ]]></command> + <inputs> + <conditional name="input_read_type_cond"> + <param name="input_read_type" type="select" label="Select the read collection type"> + <option value="single_long" selected="true">Single-End short reads</option> + <option value="paired">Paired-End short reads</option> + </param> + <when value="single_long"> + <param name="input" type="data_collection" collection_type="list" format="fastq,fastq.gz" + label="Dataset list of unpaired short reads or long reads" /> + <param name="fq_suffix" value=".fastq.gz" type="text" label="Suffix of the Single-End FASTQ"/> + </when> + <when value="paired"> + <param name="input_pair" type="data_collection" collection_type="list:paired" format="fastq,fastq.gz" label="List of Dataset pairs" /> + <param name="fq_suffix" value="_R1_001.fastq.gz" type="text" label="Suffix of the R1 FASTQ" + help="For any data sets downloaded from NCBI into Galaxy, change this to _forward.fastq.gz suffix."/> + <param name="fq2_suffix" value="_R2_001.fastq.gz" type="text" label="Suffix of the R2 FASTQ" + help="For any data sets downloaded from NCBI into Galaxy, change this to _reverse.fastq.gz suffix."/> + </when> + </conditional> + <param name="nowo_db_mode" type="select" label="Select the database with nowayout" + help="Please see below about different databases."> + <option value="mitomine" selected="true">mitomine</option> + <option value="cytox1">cytox1</option> + <option value="voucher">voucher</option> + <option value="ganoderma">ganoderma</option> + <option value="listeria">listeria</option> + </param> + <param name="nowo_thresholds" type="select" label="Enter the type of base quality thresholds to be set with nowayout" + help="The default value sets strictest thresholds that tends to filter out most of the false positive hits."> + <option value="strict" selected="true">strict</option> + <option value="relax">relax</option> + </param> + <param name="fq_filename_delim" type="text" value="_" label="File name delimitor by which samples are grouped together (--fq_filename_delim)" + help="This is the delimitor by which samples are grouped together to display in the final MultiQC report. For example, if your input data sets are mango_replicate1.fastq.gz, mango_replicate2.fastq.gz, orange_replicate1_maryland.fastq.gz, orange_replicate2_maryland.fastq.gz, then to create 2 samples mango and orange, the value for --fq_filename_delim would be _ (underscore) and the value for --fq_filename_delim_idx would be 1, since you want to group by the first word (i.e. mango or orange) after splitting the filename based on _ (underscore)."/> + <param name="fq_filename_delim_idx" type="integer" value="1" label="File name delimitor index (--fq_filename_delim_idx)" /> + </inputs> + <outputs> + <data name="krona_chart" format="html" label="nowayout: Krona Chart on ${on_string}" from_work_dir="CPIPES_nowayout_krona.html"/> + <data name="multiqc_report" format="html" label="nowayout: MultiQC Report on ${on_string}" from_work_dir="multiqc_report.html"/> + </outputs> + <tests> + <!--Test 01: long reads--> + <test expect_num_outputs="2"> + <param name="input"> + <collection type="list"> + <element name="FAL11127.fastq.gz" value="FAL11127.fastq.gz" /> + <element name="FAL11341.fastq.gz" value="FAL11341.fastq.gz" /> + <element name="FAL11342.fastq.gz" value="FAL11342.fastq.gz" /> + </collection> + </param> + <param name="fq_suffix" value=".fastq.gz"/> + <output name="multiqc_report" file="multiqc_report.html" ftype="html" compare="sim_size"/> + <!-- <output name="assembled_mags" file="FAL11127.assembly_filtered.contigs.fasta" ftype="fasta" compare="sim_size"/> --> + </test> + </tests> + <help><![CDATA[ + +.. class:: infomark + +**Purpose** + +nowayout is a mitochondrial metagenomics classifier for Eukaryotes. +It uses a custom kma database to identify mitochondrial reads and +performs read classification followed by further read classification +reinforcement using sourmash. + +It is written in Nextflow and is part of the modular data analysis pipelines (CFSAN PIPELINES or CPIPES for short) at HFP. + + +---- + +.. class:: infomark + +** Databases ** + + - mitomine: Big database that works in almost all scenarios. + - cytox1: Collection of only non-redundant COXI genes from NCBI. + - voucher: Collection of only non-redundant voucher sequences from NCBI. + - ganoderma: Collection of only non-redundant mtDNA sequences of Ganoderma fungi. + - listeria: Collection of organelle sequences and other rRNA genes for Listeria. + + +---- + +.. class:: infomark + +**Testing and Validation** + +The CPIPES - nowayout Nextflow pipeline has been wrapped to make it work in Galaxy. +It takes in either paired or unpaired short reads list as an input and generates a MultiQC report +which contains relative abundances in context of number of mitochondrial reads identified. It also +generates a Krona chart for each sample. The pipeline has been tested on multiple internal insect +mixture samples. All the original testing and validation was done on the command line on the +HFP Reedling HPC Cluster. + + +---- + +.. class:: infomark + +** Please note ** + + :: + + - nowayout only works on Illumina short reads (paired or unpaired). + - nowayout uses a custom kma database named mitomine. + - The custom database will be incrementally augmented and refined over time. + - mitomine stats: + Contains ~ 2.93M non-redundant mitochondrial and voucher sequences. + Represents ~ 717K unique species. + - Other databases are also available but will be seldom updated. + +---- + +.. class:: infomark + +**Outputs** + +The main output file is a: + + :: + + - MultiQC Report: Contains a brief summary report including individual Mitochondrial reads identified + per sample and relative abundances in context of the total number of Mitochondrial reads + identified. + Please note that due to MultiQC customizations, the preview (eye icon) will not + work within Galaxy for the MultiQC report. Please download the file by clicking + on the floppy icon and view it in your browser on your local desktop/workstation. + You can export the tables and plots from the downloaded MultiQC report. + + ]]></help> + <citations> + <citation type="bibtex"> + @article{nowayout, + author = {Konganti, Kranti}, + year = {2025}, + month = {May}, + title = {nowayout: An automated mitrochiondrial read classifier for Eukaryotes.}, + journal = {Manuscript in preparation}, + doi = {10.3389/xxxxxxxxxxxxxxxxxx}, + url = {https://xxxxxxx/articles/10.3389/xxxxxxxxxxxx/full}} + </citation> + </citations> +</tool>