jpayne@69
|
1 Metadata-Version: 2.1
|
jpayne@69
|
2 Name: bio
|
jpayne@69
|
3 Version: 1.6.2
|
jpayne@69
|
4 Summary: bio
|
jpayne@69
|
5 Home-page: https://github.com/ialbert/bio
|
jpayne@69
|
6 Author: Istvan Albert
|
jpayne@69
|
7 Author-email: istvan.albert@gmail.com
|
jpayne@69
|
8 Classifier: Programming Language :: Python :: 3
|
jpayne@69
|
9 Classifier: License :: OSI Approved :: MIT License
|
jpayne@69
|
10 Classifier: Operating System :: OS Independent
|
jpayne@69
|
11 Requires-Python: >=3.7
|
jpayne@69
|
12 Description-Content-Type: text/markdown
|
jpayne@69
|
13 License-File: LICENSE
|
jpayne@69
|
14 Requires-Dist: biopython >=1.80
|
jpayne@69
|
15 Requires-Dist: requests
|
jpayne@69
|
16 Requires-Dist: tqdm
|
jpayne@69
|
17 Requires-Dist: mygene
|
jpayne@69
|
18 Requires-Dist: pandas
|
jpayne@69
|
19 Requires-Dist: pooch
|
jpayne@69
|
20 Requires-Dist: gprofiler-official
|
jpayne@69
|
21
|
jpayne@69
|
22 # bio: making bioinformatics fun again
|
jpayne@69
|
23
|
jpayne@69
|
24 `bio` - command-line utilities to make bioinformatics explorations more enjoyable.
|
jpayne@69
|
25
|
jpayne@69
|
26 `bio` is a bioinformatics toy to play with.
|
jpayne@69
|
27
|
jpayne@69
|
28 Like LEGO pieces that match one another `bio` aims to provide you with commands that naturally fit together and let you express your intent with short, explicit and simple commands. It is a project in an exploratory phase, we'd welcome input and suggestions on what it should grow up into.
|
jpayne@69
|
29
|
jpayne@69
|
30 ## What does this software do?
|
jpayne@69
|
31
|
jpayne@69
|
32
|
jpayne@69
|
33 If you've ever done bioinformatics, you know how even seemingly straightforward tasks require multiple steps, arcane incantations, and various other preparations that slow down progress.
|
jpayne@69
|
34
|
jpayne@69
|
35 Even well-defined, supposedly simple tasks can take a seemingly inordinate number of complicated steps. The `bio` package is meant to solve that tedium.
|
jpayne@69
|
36
|
jpayne@69
|
37 ## Usage examples
|
jpayne@69
|
38
|
jpayne@69
|
39 # Fetch genbank data
|
jpayne@69
|
40 bio fetch NC_045512 MN996532 > genomes.gb
|
jpayne@69
|
41
|
jpayne@69
|
42 # Convert the first then bases of the genomes to FASTA.
|
jpayne@69
|
43 bio fasta genomes.gb --end 10
|
jpayne@69
|
44
|
jpayne@69
|
45 # Align the coding sequences for the S protein
|
jpayne@69
|
46 bio fasta genomes.gb --gene S --protein | bio align | head
|
jpayne@69
|
47
|
jpayne@69
|
48 # Print the GFF record that corresponds to the coding sequence for gene S
|
jpayne@69
|
49 bio gff genomes.gb --gene S
|
jpayne@69
|
50
|
jpayne@69
|
51 # Show the descendants of taxid 117565
|
jpayne@69
|
52 bio taxon 117565 | head
|
jpayne@69
|
53
|
jpayne@69
|
54 # Show the lineage of a taxonomic rank.
|
jpayne@69
|
55 bio taxon 117565 --lineage | head
|
jpayne@69
|
56
|
jpayne@69
|
57 # Get metadata on a viral sample
|
jpayne@69
|
58 bio meta 11138 -H | head
|
jpayne@69
|
59
|
jpayne@69
|
60 # Define a sequence ontology terms
|
jpayne@69
|
61 bio define exon
|
jpayne@69
|
62
|
jpayne@69
|
63 # Define a gene ontology terms
|
jpayne@69
|
64 bio define food vacuole
|
jpayne@69
|
65
|
jpayne@69
|
66 ## Documentation
|
jpayne@69
|
67
|
jpayne@69
|
68 Detailed documentation is maintained at
|
jpayne@69
|
69
|
jpayne@69
|
70 * https://www.bioinfo.help/
|
jpayne@69
|
71
|
jpayne@69
|
72 ## Quick install
|
jpayne@69
|
73
|
jpayne@69
|
74 `bio` works on Linux and Mac computers and on Windows when using the Linux Subsystem.
|
jpayne@69
|
75
|
jpayne@69
|
76 pip install bio --upgrade
|
jpayne@69
|
77
|
jpayne@69
|
78 See more details in the [documentation][docs].
|
jpayne@69
|
79
|
jpayne@69
|
80 ## `bio` is stream oriented
|
jpayne@69
|
81
|
jpayne@69
|
82 `bio` supports stream oriented programming where the output of one task may be chained into the second. Take the example above
|
jpayne@69
|
83 but now start with a file `acc.txt` that contains just the accession numbers:
|
jpayne@69
|
84
|
jpayne@69
|
85 NC_045512
|
jpayne@69
|
86 MN996532
|
jpayne@69
|
87
|
jpayne@69
|
88 we can run `bio` to generate a VCF file with the variants of the S nucleotides forming the S protein like so:
|
jpayne@69
|
89
|
jpayne@69
|
90 cat acc.txt | bio fetch | bio fasta --gene S | bio align --vcf | head
|
jpayne@69
|
91
|
jpayne@69
|
92 to print:
|
jpayne@69
|
93
|
jpayne@69
|
94 ##fileformat=VCFv4.2
|
jpayne@69
|
95 ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
|
jpayne@69
|
96 ##FILTER=<ID=PASS,Description="All filters passed">
|
jpayne@69
|
97 ##INFO=<ID=TYPE,Number=1,Type=String,Description="Type of the variant">
|
jpayne@69
|
98 ##contig=<ID=YP_009724390.1,length=3822,assembly=YP_009724390.1>
|
jpayne@69
|
99 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT QHR63300.2
|
jpayne@69
|
100 YP_009724390.1 33 33C/T C T . PASS TYPE=SNP GT 1
|
jpayne@69
|
101 YP_009724390.1 54 54T/A T A . PASS TYPE=SNP GT 1
|
jpayne@69
|
102 YP_009724390.1 60 60C/T C T . PASS TYPE=SNP GT 1
|
jpayne@69
|
103 YP_009724390.1 69 69A/G A G . PASS TYPE=SNP GT 1
|
jpayne@69
|
104
|
jpayne@69
|
105
|
jpayne@69
|
106 ## Who is `bio` designed for?
|
jpayne@69
|
107
|
jpayne@69
|
108 The software was written to teach bioinformatics and is the companion software to the [Biostar Handbook][handbook] textbook. The targeted audience comprises:
|
jpayne@69
|
109
|
jpayne@69
|
110 - Students learning about bioinformatics.
|
jpayne@69
|
111 - Bioinformatics educators who need a platform to demonstrate bioinformatics concepts.
|
jpayne@69
|
112 - Scientists working with large numbers of similar genomes (bacterial/viral strains).
|
jpayne@69
|
113 - Scientists who need to investigate and understand the precise details of a genomic region closely.
|
jpayne@69
|
114
|
jpayne@69
|
115 The ideas and motivations fueling `bio` have been developed while educating the many cohorts of students who used the handbook in the classroom. `bio` is an opinionated take on how bioinformatics, particularly data representation and access, should be simplified and streamlined.
|
jpayne@69
|
116
|
jpayne@69
|
117 [handbook]: https://www.biostarhandbook.com/
|
jpayne@69
|
118 [docs]: https://www.bioinfo.help/
|
jpayne@69
|
119
|
jpayne@69
|
120 ## Development
|
jpayne@69
|
121
|
jpayne@69
|
122 If you clone the repository, we recommend that you install it as a development package with:
|
jpayne@69
|
123
|
jpayne@69
|
124 python setup.py develop
|
jpayne@69
|
125
|
jpayne@69
|
126 ## Testing
|
jpayne@69
|
127
|
jpayne@69
|
128 `bio` can test itself, to run all tests execute:
|
jpayne@69
|
129
|
jpayne@69
|
130 bio test
|
jpayne@69
|
131
|
jpayne@69
|
132 Tests are automatically built from a shell script that mimics real-life usage scenarios.
|
jpayne@69
|
133
|
jpayne@69
|
134 * https://github.com/ialbert/bio/blob/master/test/usage.sh
|
jpayne@69
|
135
|
jpayne@69
|
136 ## Generating documentation
|
jpayne@69
|
137
|
jpayne@69
|
138 To generate the docs, you will need the `bookdown` package:
|
jpayne@69
|
139
|
jpayne@69
|
140 conda install r-bookdown r-servr
|
jpayne@69
|
141
|
jpayne@69
|
142 To run the docs in a browse:
|
jpayne@69
|
143
|
jpayne@69
|
144 make
|
jpayne@69
|
145
|
jpayne@69
|
146 then visit http://localhost:8000
|
jpayne@69
|
147
|