# HG changeset patch # User galaxytrakr # Date 1774321781 0 # Node ID 73ee30eb273ad3e3491571abc6c32e09afdfaae7 # Parent 569a598c7e689898ad75923fe13bd14ed1d3c42a planemo upload for repository https://github.com/CFSAN-Biostatistics/galaxytrakr-tools commit d0c08749588099d40db3c23bfd554800ac307a30 diff -r 569a598c7e68 -r 73ee30eb273a aws_sra.xml --- a/aws_sra.xml Tue Mar 24 01:43:32 2026 +0000 +++ b/aws_sra.xml Tue Mar 24 03:09:41 2026 +0000 @@ -1,91 +1,145 @@ - + Fetches one or more SRA runs from AWS S3 and converts them to FASTQ + sra-tools + pigz awscli - sra-tools - pigz fasterq-dump --version - accessions.txt && - - ## Loop over each clean accession - for acc in $(cat accessions.txt); - do - echo "Processing accession: $acc" && - - ## 1. Create unique directories for this accession - mkdir -p sra_cache_${acc} fastq_out_${acc} && - - ## 2. Download the file from S3 using aws s3 cp - aws s3 cp --no-sign-request "s3://sra-pub-run-odp/sra/${acc}/${acc}" ./sra_cache_${acc}/ && - - ## 3. Convert with fasterq-dump - fasterq-dump --outdir ./fastq_out_${acc} --temp . --threads \${GALAXY_SLOTS:-4} --split-files ./sra_cache_${acc}/${acc} && - - ## 4. Compress with pigz - pigz -p \${GALAXY_SLOTS:-4} ./fastq_out_${acc}/*.fastq && - - ## 5. Move outputs for collection discovery - #if str($layout) == 'paired' - mv ./fastq_out_${acc}/${acc}_1.fastq.gz '$output_r1.files_path/${acc}_1.fastq.gz' && - mv ./fastq_out_${acc}/${acc}_2.fastq.gz '$output_r2.files_path/${acc}_2.fastq.gz' - #else - mv ./fastq_out_${acc}/${acc}.fastq.gz '$output_r1.files_path/${acc}.fastq.gz' - #end if && - - ## 6. Clean up - rm -rf sra_cache_${acc} fastq_out_${acc} - done + accessions && + #else: + grep '^[[:space:]]*[ESD]RR[0-9]\{1,\}[[:space:]]*$' '${input.file_list}' > accessions && + #end if + mkdir -p output && + mkdir -p outputOther && + for acc in \$(cat ./accessions); + do ( + echo "Processing accession: \$acc" && + mkdir -p sra_cache_\${acc} && + aws s3 cp --no-sign-request "s3://sra-pub-run-odp/sra/\${acc}/\${acc}" ./sra_cache_\${acc}/\${acc} && + fasterq-dump -e \${GALAXY_SLOTS:-4} -t . --split-3 ./sra_cache_\${acc}/\${acc} && + rm -rf sra_cache_\${acc} && + count="\$(ls \${acc}*.fastq 2>/dev/null | wc -l)" && + echo "Found \$count fastq file(s) for \$acc" && + data=(\$(ls \${acc}*.fastq 2>/dev/null)) && + if [ "\$count" -eq 1 ]; then + pigz -cqp \${GALAXY_SLOTS:-4} "\${data[0]}" > output/"\${acc}".fastqsanger.gz && + rm "\${data[0]}"; + elif [ -e "\${acc}".fastq ]; then + pigz -cqp \${GALAXY_SLOTS:-4} "\${acc}".fastq > outputOther/"\${acc}"__single.fastqsanger.gz && + pigz -cqp \${GALAXY_SLOTS:-4} "\${acc}"_1.fastq > output/"\${acc}"_1.fastqsanger.gz && + pigz -cqp \${GALAXY_SLOTS:-4} "\${acc}"_2.fastq > output/"\${acc}"_2.fastqsanger.gz && + rm "\${acc}"*.fastq; + elif [ "\$count" -eq 2 ]; then + pigz -cqp \${GALAXY_SLOTS:-4} "\${acc}"_1.fastq > output/"\${acc}"_1.fastqsanger.gz && + pigz -cqp \${GALAXY_SLOTS:-4} "\${acc}"_2.fastq > output/"\${acc}"_2.fastqsanger.gz && + rm "\${acc}"*.fastq; + else + for file in \${data[*]}; do + pigz -cqp \${GALAXY_SLOTS:-4} "\$file" > outputOther/"\$file"sanger.gz && + rm "\$file"; + done; + fi + ); done; + echo "Done with all accessions." ]]> - - - - - + + + + + + + + + + + + + + + + + + + + + + - - + + - - - layout == 'paired' + + + + + - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + @@ -93,22 +147,10 @@ @misc{ncbi_sra_aws, title = {{NCBI} {SRA} on {AWS} Open Data}, author = {{National Center for Biotechnology Information}}, - howpublished = {\\url{https://registry.opendata.aws/ncbi-sra/}}, + howpublished = {\url{https://registry.opendata.aws/ncbi-sra/}}, note = {Accessed via AWS S3 without credentials} } - -@article{sra_toolkit, - title = {The {NCBI} {SRA} and portable data in biology}, - author = {Leinonen, Rasko and Sugawara, Hideaki and Shumway, Martin and - {International Nucleotide Sequence Database Collaboration}}, - journal = {Nucleic Acids Research}, - volume = {39}, - number = {suppl\\\_1}, - pages = {D19--D21}, - year = {2011}, - doi = {10.1093/nar/gkq1019} -} - + 10.1093/nar/gkq1019