Mercurial > repos > kkonganti > lexmapr2_from_cfsan
view README.md @ 14:89998b9b4d46
"planemo upload"
author | kkonganti |
---|---|
date | Thu, 15 Sep 2022 14:15:53 -0400 |
parents | 91438d32ed58 |
children |
line wrap: on
line source
# LexMapr2 > Updated [LexMapr](https://github.com/cidgoh/LexMapr) with added functionalities to: > - pull ontology accession ids and definitions from [EMBL-EBI](https://www.ebi.ac.uk/ols/ontologies) > via the API > - group mapped results by parent ontologies > - visualize mapping results LexMapr2 will attempt to match short free-form text to terms and exact synonyms already existing in the specified ontologies without much contextualization. It is important to ensure that the chosen ontologies are relevant to the input. For example, 'ground' can match to food (ground):FOODON_00002713 or earth:EOL_0001587. 'Turkey' can match to the country (GAZ_00000558) or the bird (NCBITaxon_9103). 'Pet food for dog' will match Canis lupus familiaris:NCBITaxon_9615. ## Table of Contents * [Setup](#setup) * [Usage](#usage) * [Customization](#customization) * [Ontology map](#ontology-map) * [Authors](#authors) ## Setup The code will run as is when retrieved from GitHub with Python >= 3.7 The following Python packages are required: argparse, collections, copy, csv, datetime, dateutil, inflection, itertools, json, logging, matplotlib, nltk, pandas, permutations, pickle, pygraphviz, requests, seaborn, shutil, sqlite3, time, unicodedata. If any are missing, they can be installed with pip. LexMapr2 will eventually be uploaded to PyPI as a package. ## Usage Input and output CSV/TSV formats are the same as in [LexMapr v 0.7](https://github.com/cidgoh/LexMapr). The current version will also generate graphs and a log. A stable connection to the Internet is required. <pre>usage: lexmapr2.py [-h] [-o] [-a] [-b] [-e] [-f] [-g] [-j] [-r] [-u] [-v] input positional arguments: input input CSV or TSV file; required optional arguments: -h, --help show this help message and exit -o, --output output TSV file path; default is stdout -a, --no_ancestors remove ancestral terms from output -b, --bin classify samples into default bins -e, --embl_ontol user-defined comma-separated ontology short names -f, --full full output format -g, --graph visualize summaries of mapping and binning -j, --graph_only only perform visualization with LexMapr2 output -r, --remake_cache remake cached resources -u, --user_bin path to JSON file with user-defined bins -v, --version show program's version number and exit</pre> Flags -a, -b, -g may substantially add to the runtime. ## Customization By default, the FOODON and NCBITaxon ontologies are used. Users can define a comma-delimited list of [ontology short names](https://www.ebi.ac.uk/ols/ontologies) flagged with '-e'. Bins are used to categorize matched ontologies by their parent ontologies. Users can override the default bins by flagging a JSON file with the '-u' option. Example JSON format to use a bin titled 'ncbi_taxon': <pre> { "ncbi_taxon":{ "Actinopterygii":"NCBITaxon_7898", "Ecdysozoa":"NCBITaxon_1206794", "Echinodermata":"NCBITaxon_7586", "Fungi":"NCBITaxon_4751", "Mammalia":"NCBITaxon_40674", "Sauropsida":"NCBITaxon_8457", "Spiralia":"NCBITaxon_2697495", "Viridiplantae":"NCBITaxon_33090" } }</pre> ## Ontology map Binning results will be mapped as shown in the example below if the graph flag, '-g', is used. Yellow nodes are parent bins. Blue nodes are terms that were identified during matching. The map will not be drawn if there are more than 100 nodes or more than 150 edges. If refused, the program will print a list of either nodes or edges in the log. An attempt will not even be made if there are more than 1000 rows. It is receommended that the user curates the LexMapr2 output file to include rows of interest and use the graph_only flag, '-j', with the shortened output file as the input file.  ## Authors Kayla K. Pennerman, Maria Balkey, Ruth E. Timme