linguine-python is a Python web server for use in the Linguine natural language processing workbench. The server accepts requests in a JSON format, and performs text analysis operations as they are implemented in Python.
The implemented operations can be found in /linguine/ops
.
To add a new analysis or cleanup operation to this project:
- Create a new Python file in
/linguine/ops
. - Fill the operation in using the template below.
- Import the op in
/linguine/operation_builder.py
and add the operation to theget_operation_handler
function body. - Any unit tests should go in
/test
.
# A sample cleanup operation
# Used to modify the existing text in a corpus set for easier analysis.
# Data will be passed to the op in the form of a collection of corpora.
# The op transforms the contents of each corpus and returns the results.
class FooOp:
def run(self, data):
for corpus in data:
corpus.contents = Bar(corpus.contents)
return data
# A sample analysis operation
# Used to generate meaningful data from a corpus set.
# Data will be passed to the op in the form of a collection of corpora.
# The op runs analysis on each corpus (or the set as a whole).
# It builds a set of results which are then returned in place of corpora.
class FooOp:
def run(self, data):
results = []
for corpus in data:
results.append({'corpus_id': corpus.id, 'bar': Bar(corpus.contents)})
return results
HTTP POST '/':
It expects a JSON payload in the provided format.
{
"corpora_ids": ["12345"], //Collection of corpora to pipe into analysis
"cleanup": ["stopwords"], //Cleanup steps to add
"operation": "nlp-relation", //Type of analysis to be preformed
"tokenizer": "", //Tokenizer used (if required)
"library": "", //Library associated w/ analysis (if required)
"transaction_id": "", //(Field to be populated by linguine-python)
"analysis_name": "Relation Extraction (Stanford CoreNLP)", //Name to display in text fields
"time_created": 1461342250445, //Used to calculate ETA of analyses
"user_id": "12345" //Unique identifier of user who created analysis
}
- Term Frequency
- Part of Speech Tagging
- Sentiment
- Named Entity Recognition
- Relation Extraction
- Coreference Resolution
- Python 3.9.1 or newer (Requires implementation of Future object)
- MongoDB
- NLTK Punkt model
- Stanford CoreNLP Pywrapper (Installation instructions can be found here).
- Install Stanford CoreNLP module following docs here.
sudo pip install -r requirements.txt
python -m textblob.download_corpora
python -m linguine.webserver --port <port> --database <database>
Note:
- For Linguine 1: port:
5555
, database:linguine-development
- For Linguine 2: port:
5551
, database:linguine2-development
To run tests:
sudo pip install -r requirements.txt
pytest test
Note: running the program from a directory other than the linguine-python root directory will cause directory linking errors.