# HG changeset patch # User galaxytrakr # Date 1774296292 0 # Node ID 2897d365dd624b500ae3be76e73042f73e36c9d2 # Parent 76192dc490d29a64588559db5754e753e42ec4b5 planemo upload for repository https://github.com/CFSAN-Biostatistics/galaxytrakr-tools commit 619ebd7e6a24be0ec6c2728511290f43b0bad89f diff -r 76192dc490d2 -r 2897d365dd62 aws_sra.xml --- a/aws_sra.xml Mon Mar 23 19:52:43 2026 +0000 +++ b/aws_sra.xml Mon Mar 23 20:04:52 2026 +0000 @@ -1,5 +1,5 @@ - - Fetch SRA data files from NCBI's public AWS S3 buckets + + Fetches SRA runs from AWS and converts them to FASTQ awscli @@ -7,308 +7,98 @@ pigz - aws --version + fasterq-dump --version + + '$output_list' - - ## ── DOWNLOAD RAW mode ──────────────────────────────────────────────────── - #elif $action.mode == 'copy' - aws s3 cp - --no-sign-request - #if $action.recursive - --recursive - #end if - '${s3_base}/${ $action.s3_key.strip("/") }' - '$output_data' - - ## ── FASTQ DUMP mode (sra-pub-run-odp only) ─────────────────────────────── - #elif $action.mode == 'fastq_dump' - #set $acc = $action.accession.strip() - - mkdir -p sra_cache && - aws s3 cp --no-sign-request '${s3_base}/sra/${acc}/${acc}' ./sra_cache/${acc} && - mkdir -p fastq_out && - fasterq-dump --outdir ./fastq_out --temp . --threads \${GALAXY_SLOTS:-4} --split-files ./sra_cache/${acc} && - pigz -p \${GALAXY_SLOTS:-4} ./fastq_out/*.fastq && - #if $action.layout == 'paired' - cp ./fastq_out/${acc}_1.fastq.gz '$output_r1' && - cp ./fastq_out/${acc}_2.fastq.gz '$output_r2' - #else - cp ./fastq_out/${acc}.fastq.gz '$output_r1' - #end if - #end if + #end for ]]> -

- - - - - - - - - - - - - - -

- - - - - - + + + + + - - - - + + - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + - - - action['mode'] == 'list' - - - - - action['mode'] == 'copy' - - - - - action['mode'] == 'fastq_dump' - - - - - action['mode'] == 'fastq_dump' and action['layout'] == 'paired' - + + + + + + + layout == 'paired' + - - -

- - -

- - - - - - - - - - - - -

- -

- - - - - - - - - - - - - - - - - - - -

- -

- - - - - - - - - - + + + + + + + + + _1`` (R1 / forward) and ``_2`` (R2 / reverse) -for paired-end runs, matching ``fasterq-dump``'s native ``--split-files`` naming. -Single-end runs produce only ``_1``. - -*Fetching multiple accessions and building a paired collection* - -Run this tool **once per accession** — either manually or by using Galaxy's dataset -collection mapping to fan out over a list of accession identifiers. Keeping one job per -accession means a failed download does not affect the others. - -Once all jobs are complete your history will contain datasets labelled:: - - SRR000001_1 SRR000001_2 - SRR000002_1 SRR000002_2 - ... +This tool can be run on a single SRA accession or a list of accessions provided as a text file (one per line). -Use **Galaxy's "Build List of Dataset Pairs"** tool to assemble these into a -``list:paired`` collection. Galaxy will auto-detect the ``_1`` / ``_2`` suffixes -and propose pairings — confirm and name the collection, then pass it directly to -any downstream tool that accepts a paired collection (aligners, QC tools, etc.). - -.. warning:: - - This tool cannot auto-detect read layout from the accession. Check the SRA record - at https://www.ncbi.nlm.nih.gov/sra before running. Selecting the wrong layout will - produce incorrect output. - ------ - -**Notes** - -- All S3 requests are made without AWS credentials (``--no-sign-request``). -- There is typically a **1–2 day lag** between an accession appearing in SRA Search and - being available in the S3 buckets. -- Controlled-access dbGaP data (``sra-ca-run-odp``) requires AWS credentials and is - **not** supported by this tool. -- ``fasterq-dump`` and ``pigz`` both use ``\${GALAXY_SLOTS}`` threads. Allocate more - cores in your job configuration to speed up conversion of large runs. - -.. _NCBI Sequence Read Archive (SRA): https://www.ncbi.nlm.nih.gov/sra +Outputs are automatically organized into collections suitable for downstream analysis. ]]> @@ -316,7 +106,7 @@ @misc{ncbi_sra_aws, title = {{NCBI} {SRA} on {AWS} Open Data}, author = {{National Center for Biotechnology Information}}, - howpublished = {\url{https://registry.opendata.aws/ncbi-sra/}}, + howpublished = {\\url{https://registry.opendata.aws/ncbi-sra/}}, note = {Accessed via AWS S3 without credentials} } @@ -327,12 +117,11 @@ {International Nucleotide Sequence Database Collaboration}}, journal = {Nucleic Acids Research}, volume = {39}, - number = {suppl\_1}, + number = {suppl\\\_1}, pages = {D19--D21}, year = {2011}, doi = {10.1093/nar/gkq1019} } -