comparison README.md @ 0:91438d32ed58

"planemo upload"
author kkonganti
date Wed, 14 Sep 2022 10:39:29 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:91438d32ed58
1 # LexMapr2
2 > Updated [LexMapr](https://github.com/cidgoh/LexMapr) with added functionalities to:
3 > - pull ontology accession ids and definitions from [EMBL-EBI](https://www.ebi.ac.uk/ols/ontologies)
4 > via the API
5 > - group mapped results by parent ontologies
6 > - visualize mapping results
7
8 LexMapr2 will attempt to match short free-form text to terms and exact synonyms already existing in the specified ontologies without much contextualization. It is important to ensure that the chosen ontologies are relevant to the input. For example, 'ground' can match to food (ground):FOODON_00002713 or earth:EOL_0001587. 'Turkey' can match to the country (GAZ_00000558) or the bird (NCBITaxon_9103). 'Pet food for dog' will match Canis lupus familiaris:NCBITaxon_9615.
9
10
11 ## Table of Contents
12 * [Setup](#setup)
13 * [Usage](#usage)
14 * [Customization](#customization)
15 * [Ontology map](#ontology-map)
16 * [Authors](#authors)
17
18
19 ## Setup
20 The code will run as is when retrieved from GitHub with Python >= 3.7
21
22 The following Python packages are required: argparse, collections, copy, csv, datetime, dateutil, inflection, itertools, json, logging, matplotlib, nltk, pandas, permutations, pickle, pygraphviz, requests, seaborn, shutil, sqlite3, time, unicodedata. If any are missing, they can be installed with pip.
23
24 LexMapr2 will eventually be uploaded to PyPI as a package.
25
26
27 ## Usage
28 Input and output CSV/TSV formats are the same as in [LexMapr v 0.7](https://github.com/cidgoh/LexMapr). The current version will also generate graphs and a log. A stable connection to the Internet is required.
29
30
31 <pre>usage: lexmapr2.py [-h] [-o] [-a] [-b] [-e] [-f] [-g] [-j] [-r] [-u] [-v] input
32
33 positional arguments:
34 input input CSV or TSV file; required
35
36 optional arguments:
37 -h, --help show this help message and exit
38 -o, --output output TSV file path; default is stdout
39 -a, --no_ancestors remove ancestral terms from output
40 -b, --bin classify samples into default bins
41 -e, --embl_ontol user-defined comma-separated ontology short names
42 -f, --full full output format
43 -g, --graph visualize summaries of mapping and binning
44 -j, --graph_only only perform visualization with LexMapr2 output
45 -r, --remake_cache remake cached resources
46 -u, --user_bin path to JSON file with user-defined bins
47 -v, --version show program's version number and exit</pre>
48
49 Flags -a, -b, -g may substantially add to the runtime.
50
51 ## Customization
52 By default, the FOODON and NCBITaxon ontologies are used. Users can define a comma-delimited list of [ontology short names](https://www.ebi.ac.uk/ols/ontologies) flagged with '-e'. Bins are used to categorize matched ontologies by their parent ontologies. Users can override the default bins by flagging a JSON file with the '-u' option.
53
54 Example JSON format to use a bin titled 'ncbi_taxon':
55 <pre>
56 {
57 "ncbi_taxon":{
58 "Actinopterygii":"NCBITaxon_7898",
59 "Ecdysozoa":"NCBITaxon_1206794",
60 "Echinodermata":"NCBITaxon_7586",
61 "Fungi":"NCBITaxon_4751",
62 "Mammalia":"NCBITaxon_40674",
63 "Sauropsida":"NCBITaxon_8457",
64 "Spiralia":"NCBITaxon_2697495",
65 "Viridiplantae":"NCBITaxon_33090"
66 }
67 }</pre>
68
69
70 ## Ontology map
71 Binning results will be mapped as shown in the example below if the graph flag, '-g', is used. Yellow nodes are parent bins. Blue nodes are terms that were identified during matching. The map will not be drawn if there are more than 100 nodes or more than 150 edges. If refused, the program will print a list of either nodes or edges in the log. An attempt will not even be made if there are more than 1000 rows. It is receommended that the user curates the LexMapr2 output file to include rows of interest and use the graph_only flag, '-j', with the shortened output file as the input file.
72
73 ![Example screenshot](./img/example_map.png)
74
75
76 ## Authors
77 Kayla K. Pennerman,
78 Maria Balkey,
79 Ruth E. Timme