jpayne@68: Metadata-Version: 2.1
jpayne@68: Name: bio
jpayne@68: Version: 1.6.2
jpayne@68: Summary: bio
jpayne@68: Home-page: https://github.com/ialbert/bio
jpayne@68: Author: Istvan Albert
jpayne@68: Author-email: istvan.albert@gmail.com
jpayne@68: Classifier: Programming Language :: Python :: 3
jpayne@68: Classifier: License :: OSI Approved :: MIT License
jpayne@68: Classifier: Operating System :: OS Independent
jpayne@68: Requires-Python: >=3.7
jpayne@68: Description-Content-Type: text/markdown
jpayne@68: License-File: LICENSE
jpayne@68: Requires-Dist: biopython >=1.80
jpayne@68: Requires-Dist: requests
jpayne@68: Requires-Dist: tqdm
jpayne@68: Requires-Dist: mygene
jpayne@68: Requires-Dist: pandas
jpayne@68: Requires-Dist: pooch
jpayne@68: Requires-Dist: gprofiler-official
jpayne@68: 
jpayne@68: # bio: making bioinformatics fun again
jpayne@68: 
jpayne@68: `bio` - command-line utilities to make bioinformatics explorations more enjoyable.
jpayne@68: 
jpayne@68: `bio` is a bioinformatics toy to play with.
jpayne@68: 
jpayne@68: Like LEGO pieces that match one another `bio` aims to provide you with commands that naturally fit together and let you express your intent with short, explicit and simple commands. It is a project in an exploratory phase, we'd welcome input and suggestions on what it should grow up into.
jpayne@68: 
jpayne@68: ## What does this software do?
jpayne@68: 
jpayne@68: 
jpayne@68: If you've ever done bioinformatics, you know how even seemingly straightforward tasks require multiple steps, arcane incantations, and various other preparations that slow down progress. 
jpayne@68: 
jpayne@68: Even well-defined, supposedly simple tasks can take a seemingly inordinate number of complicated steps. The `bio` package is meant to solve that tedium. 
jpayne@68: 
jpayne@68: ## Usage examples
jpayne@68: 
jpayne@68:     # Fetch genbank data
jpayne@68:     bio fetch NC_045512 MN996532 > genomes.gb
jpayne@68: 
jpayne@68:     # Convert the first then bases of the genomes to FASTA.
jpayne@68:     bio fasta genomes.gb --end 10
jpayne@68: 
jpayne@68:     # Align the coding sequences for the S protein
jpayne@68:     bio fasta genomes.gb --gene S --protein | bio align | head
jpayne@68: 
jpayne@68:     # Print the GFF record that corresponds to the coding sequence for gene S
jpayne@68:     bio gff genomes.gb --gene S 
jpayne@68: 
jpayne@68:     # Show the descendants of taxid 117565
jpayne@68:     bio taxon 117565 | head
jpayne@68: 
jpayne@68:     # Show the lineage of a taxonomic rank.
jpayne@68:     bio taxon 117565 --lineage | head
jpayne@68: 
jpayne@68:     # Get metadata on a viral sample
jpayne@68:     bio meta 11138 -H | head
jpayne@68: 
jpayne@68:     # Define a sequence ontology terms
jpayne@68:     bio define exon
jpayne@68: 
jpayne@68:     # Define a gene ontology terms
jpayne@68:     bio define food vacuole
jpayne@68: 
jpayne@68: ## Documentation
jpayne@68: 
jpayne@68: Detailed documentation is maintained at
jpayne@68: 
jpayne@68: * https://www.bioinfo.help/
jpayne@68: 
jpayne@68: ## Quick install
jpayne@68:     
jpayne@68: `bio` works on Linux and Mac computers and on Windows when using the Linux Subsystem. 
jpayne@68: 
jpayne@68:     pip install bio --upgrade
jpayne@68:             
jpayne@68: See more details in the [documentation][docs].
jpayne@68: 
jpayne@68: ## `bio` is stream oriented
jpayne@68: 
jpayne@68: `bio` supports stream oriented programming where the output of one task may be chained into the second. Take the example above
jpayne@68: but now start with a file `acc.txt` that contains just the accession numbers:
jpayne@68: 
jpayne@68:     NC_045512
jpayne@68:     MN996532
jpayne@68: 
jpayne@68: we can run `bio` to generate a VCF file with the variants of the S nucleotides forming the S protein like so:
jpayne@68: 
jpayne@68:     cat acc.txt | bio fetch | bio fasta --gene S | bio align --vcf | head
jpayne@68: 
jpayne@68: to print:
jpayne@68: 
jpayne@68:     ##fileformat=VCFv4.2
jpayne@68:     ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
jpayne@68:     ##FILTER=<ID=PASS,Description="All filters passed">
jpayne@68:     ##INFO=<ID=TYPE,Number=1,Type=String,Description="Type of the variant">
jpayne@68:     ##contig=<ID=YP_009724390.1,length=3822,assembly=YP_009724390.1>
jpayne@68:     #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  QHR63300.2
jpayne@68:     YP_009724390.1  33      33C/T   C       T       .       PASS    TYPE=SNP        GT      1
jpayne@68:     YP_009724390.1  54      54T/A   T       A       .       PASS    TYPE=SNP        GT      1
jpayne@68:     YP_009724390.1  60      60C/T   C       T       .       PASS    TYPE=SNP        GT      1
jpayne@68:     YP_009724390.1  69      69A/G   A       G       .       PASS    TYPE=SNP        GT      1
jpayne@68: 
jpayne@68: 
jpayne@68: ## Who is `bio` designed for?
jpayne@68: 
jpayne@68: The software was written to teach bioinformatics and is the companion software to the [Biostar Handbook][handbook] textbook. The targeted audience comprises:
jpayne@68: 
jpayne@68: - Students learning about bioinformatics.
jpayne@68: - Bioinformatics educators who need a platform to demonstrate bioinformatics concepts. 
jpayne@68: - Scientists working with large numbers of similar genomes (bacterial/viral strains).
jpayne@68: - Scientists who need to investigate and understand the precise details of a genomic region closely.
jpayne@68: 
jpayne@68: The ideas and motivations fueling `bio` have been developed while educating the many cohorts of students who used the handbook in the classroom. `bio` is an opinionated take on how bioinformatics, particularly data representation and access, should be simplified and streamlined.
jpayne@68: 
jpayne@68: [handbook]: https://www.biostarhandbook.com/
jpayne@68: [docs]: https://www.bioinfo.help/
jpayne@68: 
jpayne@68: ## Development
jpayne@68: 
jpayne@68: If you clone the repository, we recommend that you install it as a development package with:
jpayne@68: 
jpayne@68:     python setup.py develop
jpayne@68:     
jpayne@68: ## Testing
jpayne@68: 
jpayne@68: `bio` can test itself, to run all tests execute:
jpayne@68: 
jpayne@68:     bio test
jpayne@68: 
jpayne@68: Tests are automatically built from a shell script that mimics real-life usage scenarios.
jpayne@68: 
jpayne@68: * https://github.com/ialbert/bio/blob/master/test/usage.sh
jpayne@68: 
jpayne@68: ## Generating documentation
jpayne@68: 
jpayne@68: To generate the docs, you will need the `bookdown` package:
jpayne@68: 
jpayne@68:     conda install r-bookdown r-servr
jpayne@68:     
jpayne@68: To run the docs in a browse:
jpayne@68:     
jpayne@68:     make 
jpayne@68:     
jpayne@68: then visit http://localhost:8000
jpayne@68: