jpayne@68
|
1 Metadata-Version: 2.1
|
jpayne@68
|
2 Name: pybedtools
|
jpayne@68
|
3 Version: 0.11.0
|
jpayne@68
|
4 Summary: Wrapper around BEDTools for bioinformatics work
|
jpayne@68
|
5 Home-page: https://github.com/daler/pybedtools
|
jpayne@68
|
6 Download-URL:
|
jpayne@68
|
7 Maintainer: Ryan Dale
|
jpayne@68
|
8 Maintainer-email: ryan.dale@nih.gov
|
jpayne@68
|
9 License: MIT
|
jpayne@68
|
10 Classifier: Development Status :: 5 - Production/Stable
|
jpayne@68
|
11 Classifier: Intended Audience :: Science/Research
|
jpayne@68
|
12 Classifier: License :: OSI Approved :: MIT License
|
jpayne@68
|
13 Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
|
jpayne@68
|
14 Classifier: Programming Language :: Python
|
jpayne@68
|
15 Classifier: Programming Language :: Python :: 3
|
jpayne@68
|
16 Classifier: Programming Language :: Python :: 3.6
|
jpayne@68
|
17 Classifier: Programming Language :: Python :: 3.7
|
jpayne@68
|
18 Classifier: Programming Language :: Python :: 3.8
|
jpayne@68
|
19 Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
jpayne@68
|
20 License-File: LICENSE.txt
|
jpayne@68
|
21 Requires-Dist: pysam
|
jpayne@68
|
22 Requires-Dist: numpy
|
jpayne@68
|
23
|
jpayne@68
|
24
|
jpayne@68
|
25 Overview
|
jpayne@68
|
26 --------
|
jpayne@68
|
27
|
jpayne@68
|
28 .. image:: https://badge.fury.io/py/pybedtools.svg?style=flat
|
jpayne@68
|
29 :target: https://badge.fury.io/py/pybedtools
|
jpayne@68
|
30
|
jpayne@68
|
31 .. image:: https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg
|
jpayne@68
|
32 :target: https://bioconda.github.io
|
jpayne@68
|
33
|
jpayne@68
|
34 The `BEDTools suite of programs <http://bedtools.readthedocs.org/>`_ is widely
|
jpayne@68
|
35 used for genomic interval manipulation or "genome algebra". `pybedtools` wraps
|
jpayne@68
|
36 and extends BEDTools and offers feature-level manipulations from within
|
jpayne@68
|
37 Python.
|
jpayne@68
|
38
|
jpayne@68
|
39 See full online documentation, including installation instructions, at
|
jpayne@68
|
40 https://daler.github.io/pybedtools/.
|
jpayne@68
|
41
|
jpayne@68
|
42 The GitHub repo is at https://github.com/daler/pybedtools.
|
jpayne@68
|
43
|
jpayne@68
|
44 Why `pybedtools`?
|
jpayne@68
|
45 -----------------
|
jpayne@68
|
46
|
jpayne@68
|
47 Here is an example to get the names of genes that are <5 kb away from
|
jpayne@68
|
48 intergenic SNPs:
|
jpayne@68
|
49
|
jpayne@68
|
50 .. code-block:: python
|
jpayne@68
|
51
|
jpayne@68
|
52 from pybedtools import BedTool
|
jpayne@68
|
53
|
jpayne@68
|
54 snps = BedTool('snps.bed.gz') # [1]
|
jpayne@68
|
55 genes = BedTool('hg19.gff') # [1]
|
jpayne@68
|
56
|
jpayne@68
|
57 intergenic_snps = snps.subtract(genes) # [2]
|
jpayne@68
|
58 nearby = genes.closest(intergenic_snps, d=True, stream=True) # [2, 3]
|
jpayne@68
|
59
|
jpayne@68
|
60 for gene in nearby: # [4]
|
jpayne@68
|
61 if int(gene[-1]) < 5000: # [4]
|
jpayne@68
|
62 print gene.name # [4]
|
jpayne@68
|
63
|
jpayne@68
|
64 Useful features shown here include:
|
jpayne@68
|
65
|
jpayne@68
|
66 * `[1]` support for all BEDTools-supported formats (here gzipped BED and GFF)
|
jpayne@68
|
67 * `[2]` wrapping of all BEDTools programs and arguments (here, `subtract` and `closest` and passing
|
jpayne@68
|
68 the `-d` flag to `closest`);
|
jpayne@68
|
69 * `[3]` streaming results (like Unix pipes, here specified by `stream=True`)
|
jpayne@68
|
70 * `[4]` iterating over results while accessing feature data by index or by attribute
|
jpayne@68
|
71 access (here `[-1]` and `.name`).
|
jpayne@68
|
72
|
jpayne@68
|
73 In contrast, here is the same analysis using shell scripting. Note that this
|
jpayne@68
|
74 requires knowledge in Perl, bash, and awk. The run time is identical to the
|
jpayne@68
|
75 `pybedtools` version above:
|
jpayne@68
|
76
|
jpayne@68
|
77 .. code-block:: bash
|
jpayne@68
|
78
|
jpayne@68
|
79 snps=snps.bed.gz
|
jpayne@68
|
80 genes=hg19.gff
|
jpayne@68
|
81 intergenic_snps=/tmp/intergenic_snps
|
jpayne@68
|
82
|
jpayne@68
|
83 snp_fields=`zcat $snps | awk '(NR == 2){print NF; exit;}'`
|
jpayne@68
|
84 gene_fields=9
|
jpayne@68
|
85 distance_field=$(($gene_fields + $snp_fields + 1))
|
jpayne@68
|
86
|
jpayne@68
|
87 intersectBed -a $snps -b $genes -v > $intergenic_snps
|
jpayne@68
|
88
|
jpayne@68
|
89 closestBed -a $genes -b $intergenic_snps -d \
|
jpayne@68
|
90 | awk '($'$distance_field' < 5000){print $9;}' \
|
jpayne@68
|
91 | perl -ne 'm/[ID|Name|gene_id]=(.*?);/; print "$1\n"'
|
jpayne@68
|
92
|
jpayne@68
|
93 rm $intergenic_snps
|
jpayne@68
|
94
|
jpayne@68
|
95 See the `Shell script comparison <http://daler.github.io/pybedtools/sh-comparison.html>`_ in the docs
|
jpayne@68
|
96 for more details on this comparison, or keep reading the full documentation at
|
jpayne@68
|
97 http://daler.github.io/pybedtools.
|