Mercurial > repos > kkonganti > lexmapr2_from_cfsan

# LexMapr2
> Updated [LexMapr](https://github.com/cidgoh/LexMapr) with added functionalities to:
> - pull ontology accession ids and definitions from [EMBL-EBI](https://www.ebi.ac.uk/ols/ontologies)
> via the API
> - group mapped results by parent ontologies
> - visualize mapping results

LexMapr2 will attempt to match short free-form text to terms and exact synonyms already existing in the specified ontologies without much contextualization. It is important to ensure that the chosen ontologies are relevant to the input. For example, 'ground' can match to food (ground):FOODON_00002713 or earth:EOL_0001587. 'Turkey' can match to the country (GAZ_00000558) or the bird (NCBITaxon_9103). 'Pet food for dog' will match Canis lupus familiaris:NCBITaxon_9615.


## Table of Contents
* [Setup](#setup)
* [Usage](#usage)
* [Customization](#customization)
* [Ontology map](#ontology-map)
* [Authors](#authors)


## Setup
The code will run as is when retrieved from GitHub with Python >= 3.7

The following Python packages are required: argparse, collections, copy, csv, datetime, dateutil, inflection, itertools, json, logging, matplotlib, nltk, pandas, permutations, pickle, pygraphviz, requests, seaborn, shutil, sqlite3, time, unicodedata. If any are missing, they can be installed with pip.

LexMapr2 will eventually be uploaded to PyPI as a package.


## Usage
Input and output CSV/TSV formats are the same as in [LexMapr v 0.7](https://github.com/cidgoh/LexMapr). The current version will also generate graphs and a log. A stable connection to the Internet is required.


<pre>usage: lexmapr2.py [-h] [-o] [-a] [-b] [-e] [-f] [-g] [-j] [-r] [-u] [-v] input

positional arguments:
  input                 input CSV or TSV file; required

optional arguments:
  -h, --help            show this help message and exit
  -o, --output          output TSV file path; default is stdout
  -a, --no_ancestors    remove ancestral terms from output
  -b, --bin             classify samples into default bins
  -e, --embl_ontol      user-defined comma-separated ontology short names
  -f, --full            full output format
  -g, --graph           visualize summaries of mapping and binning
  -j, --graph_only      only perform visualization with LexMapr2 output
  -r, --remake_cache    remake cached resources
  -u, --user_bin        path to JSON file with user-defined bins
  -v, --version         show program's version number and exit</pre>

Flags -a, -b, -g may substantially add to the runtime.

## Customization
By default, the FOODON and NCBITaxon ontologies are used. Users can define a comma-delimited list of [ontology short names](https://www.ebi.ac.uk/ols/ontologies) flagged with '-e'. Bins are used to categorize matched ontologies by their parent ontologies. Users can override the default bins by flagging a JSON file with the '-u' option.

Example JSON format to use a bin titled 'ncbi_taxon':
<pre>
{
  "ncbi_taxon":{
                "Actinopterygii":"NCBITaxon_7898",
                "Ecdysozoa":"NCBITaxon_1206794",
                "Echinodermata":"NCBITaxon_7586",
                "Fungi":"NCBITaxon_4751",
                "Mammalia":"NCBITaxon_40674",
                "Sauropsida":"NCBITaxon_8457",
                "Spiralia":"NCBITaxon_2697495",
                "Viridiplantae":"NCBITaxon_33090"
               }
}</pre>


## Ontology map
Binning results will be mapped as shown in the example below if the graph flag, '-g', is used. Yellow nodes are parent bins. Blue nodes are terms that were identified during matching. The map will not be drawn if there are more than 100 nodes or more than 150 edges. If refused, the program will print a list of either nodes or edges in the log. An attempt will not even be made if there are more than 1000 rows. It is receommended that the user curates the LexMapr2 output file to include rows of interest and use the graph_only flag, '-j', with the shortened output file as the input file.

![Example screenshot](./img/example_map.png)


## Authors
Kayla K. Pennerman,
Maria Balkey,
Ruth E. Timme
author	kkonganti
date	Thu, 15 Sep 2022 14:15:53 -0400
parents	91438d32ed58
children