jpayne@68: Metadata-Version: 2.1 jpayne@68: Name: bio jpayne@68: Version: 1.6.2 jpayne@68: Summary: bio jpayne@68: Home-page: https://github.com/ialbert/bio jpayne@68: Author: Istvan Albert jpayne@68: Author-email: istvan.albert@gmail.com jpayne@68: Classifier: Programming Language :: Python :: 3 jpayne@68: Classifier: License :: OSI Approved :: MIT License jpayne@68: Classifier: Operating System :: OS Independent jpayne@68: Requires-Python: >=3.7 jpayne@68: Description-Content-Type: text/markdown jpayne@68: License-File: LICENSE jpayne@68: Requires-Dist: biopython >=1.80 jpayne@68: Requires-Dist: requests jpayne@68: Requires-Dist: tqdm jpayne@68: Requires-Dist: mygene jpayne@68: Requires-Dist: pandas jpayne@68: Requires-Dist: pooch jpayne@68: Requires-Dist: gprofiler-official jpayne@68: jpayne@68: # bio: making bioinformatics fun again jpayne@68: jpayne@68: `bio` - command-line utilities to make bioinformatics explorations more enjoyable. jpayne@68: jpayne@68: `bio` is a bioinformatics toy to play with. jpayne@68: jpayne@68: Like LEGO pieces that match one another `bio` aims to provide you with commands that naturally fit together and let you express your intent with short, explicit and simple commands. It is a project in an exploratory phase, we'd welcome input and suggestions on what it should grow up into. jpayne@68: jpayne@68: ## What does this software do? jpayne@68: jpayne@68: jpayne@68: If you've ever done bioinformatics, you know how even seemingly straightforward tasks require multiple steps, arcane incantations, and various other preparations that slow down progress. jpayne@68: jpayne@68: Even well-defined, supposedly simple tasks can take a seemingly inordinate number of complicated steps. The `bio` package is meant to solve that tedium. jpayne@68: jpayne@68: ## Usage examples jpayne@68: jpayne@68: # Fetch genbank data jpayne@68: bio fetch NC_045512 MN996532 > genomes.gb jpayne@68: jpayne@68: # Convert the first then bases of the genomes to FASTA. jpayne@68: bio fasta genomes.gb --end 10 jpayne@68: jpayne@68: # Align the coding sequences for the S protein jpayne@68: bio fasta genomes.gb --gene S --protein | bio align | head jpayne@68: jpayne@68: # Print the GFF record that corresponds to the coding sequence for gene S jpayne@68: bio gff genomes.gb --gene S jpayne@68: jpayne@68: # Show the descendants of taxid 117565 jpayne@68: bio taxon 117565 | head jpayne@68: jpayne@68: # Show the lineage of a taxonomic rank. jpayne@68: bio taxon 117565 --lineage | head jpayne@68: jpayne@68: # Get metadata on a viral sample jpayne@68: bio meta 11138 -H | head jpayne@68: jpayne@68: # Define a sequence ontology terms jpayne@68: bio define exon jpayne@68: jpayne@68: # Define a gene ontology terms jpayne@68: bio define food vacuole jpayne@68: jpayne@68: ## Documentation jpayne@68: jpayne@68: Detailed documentation is maintained at jpayne@68: jpayne@68: * https://www.bioinfo.help/ jpayne@68: jpayne@68: ## Quick install jpayne@68: jpayne@68: `bio` works on Linux and Mac computers and on Windows when using the Linux Subsystem. jpayne@68: jpayne@68: pip install bio --upgrade jpayne@68: jpayne@68: See more details in the [documentation][docs]. jpayne@68: jpayne@68: ## `bio` is stream oriented jpayne@68: jpayne@68: `bio` supports stream oriented programming where the output of one task may be chained into the second. Take the example above jpayne@68: but now start with a file `acc.txt` that contains just the accession numbers: jpayne@68: jpayne@68: NC_045512 jpayne@68: MN996532 jpayne@68: jpayne@68: we can run `bio` to generate a VCF file with the variants of the S nucleotides forming the S protein like so: jpayne@68: jpayne@68: cat acc.txt | bio fetch | bio fasta --gene S | bio align --vcf | head jpayne@68: jpayne@68: to print: jpayne@68: jpayne@68: ##fileformat=VCFv4.2 jpayne@68: ##FORMAT= jpayne@68: ##FILTER= jpayne@68: ##INFO= jpayne@68: ##contig= jpayne@68: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT QHR63300.2 jpayne@68: YP_009724390.1 33 33C/T C T . PASS TYPE=SNP GT 1 jpayne@68: YP_009724390.1 54 54T/A T A . PASS TYPE=SNP GT 1 jpayne@68: YP_009724390.1 60 60C/T C T . PASS TYPE=SNP GT 1 jpayne@68: YP_009724390.1 69 69A/G A G . PASS TYPE=SNP GT 1 jpayne@68: jpayne@68: jpayne@68: ## Who is `bio` designed for? jpayne@68: jpayne@68: The software was written to teach bioinformatics and is the companion software to the [Biostar Handbook][handbook] textbook. The targeted audience comprises: jpayne@68: jpayne@68: - Students learning about bioinformatics. jpayne@68: - Bioinformatics educators who need a platform to demonstrate bioinformatics concepts. jpayne@68: - Scientists working with large numbers of similar genomes (bacterial/viral strains). jpayne@68: - Scientists who need to investigate and understand the precise details of a genomic region closely. jpayne@68: jpayne@68: The ideas and motivations fueling `bio` have been developed while educating the many cohorts of students who used the handbook in the classroom. `bio` is an opinionated take on how bioinformatics, particularly data representation and access, should be simplified and streamlined. jpayne@68: jpayne@68: [handbook]: https://www.biostarhandbook.com/ jpayne@68: [docs]: https://www.bioinfo.help/ jpayne@68: jpayne@68: ## Development jpayne@68: jpayne@68: If you clone the repository, we recommend that you install it as a development package with: jpayne@68: jpayne@68: python setup.py develop jpayne@68: jpayne@68: ## Testing jpayne@68: jpayne@68: `bio` can test itself, to run all tests execute: jpayne@68: jpayne@68: bio test jpayne@68: jpayne@68: Tests are automatically built from a shell script that mimics real-life usage scenarios. jpayne@68: jpayne@68: * https://github.com/ialbert/bio/blob/master/test/usage.sh jpayne@68: jpayne@68: ## Generating documentation jpayne@68: jpayne@68: To generate the docs, you will need the `bookdown` package: jpayne@68: jpayne@68: conda install r-bookdown r-servr jpayne@68: jpayne@68: To run the docs in a browse: jpayne@68: jpayne@68: make jpayne@68: jpayne@68: then visit http://localhost:8000 jpayne@68: