annotate README.md @ 2:328a5cff4d10

"planemo upload"
author kkonganti
date Thu, 15 Sep 2022 12:57:33 -0400
parents 91438d32ed58
children
rev   line source
kkonganti@0 1 # LexMapr2
kkonganti@0 2 > Updated [LexMapr](https://github.com/cidgoh/LexMapr) with added functionalities to:
kkonganti@0 3 > - pull ontology accession ids and definitions from [EMBL-EBI](https://www.ebi.ac.uk/ols/ontologies)
kkonganti@0 4 > via the API
kkonganti@0 5 > - group mapped results by parent ontologies
kkonganti@0 6 > - visualize mapping results
kkonganti@0 7
kkonganti@0 8 LexMapr2 will attempt to match short free-form text to terms and exact synonyms already existing in the specified ontologies without much contextualization. It is important to ensure that the chosen ontologies are relevant to the input. For example, 'ground' can match to food (ground):FOODON_00002713 or earth:EOL_0001587. 'Turkey' can match to the country (GAZ_00000558) or the bird (NCBITaxon_9103). 'Pet food for dog' will match Canis lupus familiaris:NCBITaxon_9615.
kkonganti@0 9
kkonganti@0 10
kkonganti@0 11 ## Table of Contents
kkonganti@0 12 * [Setup](#setup)
kkonganti@0 13 * [Usage](#usage)
kkonganti@0 14 * [Customization](#customization)
kkonganti@0 15 * [Ontology map](#ontology-map)
kkonganti@0 16 * [Authors](#authors)
kkonganti@0 17
kkonganti@0 18
kkonganti@0 19 ## Setup
kkonganti@0 20 The code will run as is when retrieved from GitHub with Python >= 3.7
kkonganti@0 21
kkonganti@0 22 The following Python packages are required: argparse, collections, copy, csv, datetime, dateutil, inflection, itertools, json, logging, matplotlib, nltk, pandas, permutations, pickle, pygraphviz, requests, seaborn, shutil, sqlite3, time, unicodedata. If any are missing, they can be installed with pip.
kkonganti@0 23
kkonganti@0 24 LexMapr2 will eventually be uploaded to PyPI as a package.
kkonganti@0 25
kkonganti@0 26
kkonganti@0 27 ## Usage
kkonganti@0 28 Input and output CSV/TSV formats are the same as in [LexMapr v 0.7](https://github.com/cidgoh/LexMapr). The current version will also generate graphs and a log. A stable connection to the Internet is required.
kkonganti@0 29
kkonganti@0 30
kkonganti@0 31 <pre>usage: lexmapr2.py [-h] [-o] [-a] [-b] [-e] [-f] [-g] [-j] [-r] [-u] [-v] input
kkonganti@0 32
kkonganti@0 33 positional arguments:
kkonganti@0 34 input input CSV or TSV file; required
kkonganti@0 35
kkonganti@0 36 optional arguments:
kkonganti@0 37 -h, --help show this help message and exit
kkonganti@0 38 -o, --output output TSV file path; default is stdout
kkonganti@0 39 -a, --no_ancestors remove ancestral terms from output
kkonganti@0 40 -b, --bin classify samples into default bins
kkonganti@0 41 -e, --embl_ontol user-defined comma-separated ontology short names
kkonganti@0 42 -f, --full full output format
kkonganti@0 43 -g, --graph visualize summaries of mapping and binning
kkonganti@0 44 -j, --graph_only only perform visualization with LexMapr2 output
kkonganti@0 45 -r, --remake_cache remake cached resources
kkonganti@0 46 -u, --user_bin path to JSON file with user-defined bins
kkonganti@0 47 -v, --version show program's version number and exit</pre>
kkonganti@0 48
kkonganti@0 49 Flags -a, -b, -g may substantially add to the runtime.
kkonganti@0 50
kkonganti@0 51 ## Customization
kkonganti@0 52 By default, the FOODON and NCBITaxon ontologies are used. Users can define a comma-delimited list of [ontology short names](https://www.ebi.ac.uk/ols/ontologies) flagged with '-e'. Bins are used to categorize matched ontologies by their parent ontologies. Users can override the default bins by flagging a JSON file with the '-u' option.
kkonganti@0 53
kkonganti@0 54 Example JSON format to use a bin titled 'ncbi_taxon':
kkonganti@0 55 <pre>
kkonganti@0 56 {
kkonganti@0 57 "ncbi_taxon":{
kkonganti@0 58 "Actinopterygii":"NCBITaxon_7898",
kkonganti@0 59 "Ecdysozoa":"NCBITaxon_1206794",
kkonganti@0 60 "Echinodermata":"NCBITaxon_7586",
kkonganti@0 61 "Fungi":"NCBITaxon_4751",
kkonganti@0 62 "Mammalia":"NCBITaxon_40674",
kkonganti@0 63 "Sauropsida":"NCBITaxon_8457",
kkonganti@0 64 "Spiralia":"NCBITaxon_2697495",
kkonganti@0 65 "Viridiplantae":"NCBITaxon_33090"
kkonganti@0 66 }
kkonganti@0 67 }</pre>
kkonganti@0 68
kkonganti@0 69
kkonganti@0 70 ## Ontology map
kkonganti@0 71 Binning results will be mapped as shown in the example below if the graph flag, '-g', is used. Yellow nodes are parent bins. Blue nodes are terms that were identified during matching. The map will not be drawn if there are more than 100 nodes or more than 150 edges. If refused, the program will print a list of either nodes or edges in the log. An attempt will not even be made if there are more than 1000 rows. It is receommended that the user curates the LexMapr2 output file to include rows of interest and use the graph_only flag, '-j', with the shortened output file as the input file.
kkonganti@0 72
kkonganti@0 73 ![Example screenshot](./img/example_map.png)
kkonganti@0 74
kkonganti@0 75
kkonganti@0 76 ## Authors
kkonganti@0 77 Kayla K. Pennerman,
kkonganti@0 78 Maria Balkey,
kkonganti@0 79 Ruth E. Timme