linguine-python

Overview

linguine-python is a Python web server for use in the Linguine natural language processing workbench. The server accepts requests in a JSON format, and performs text analysis operations as they are implemented in Python. The implemented operations can be found in /linguine/ops.

Adding an operation

To add a new analysis or cleanup operation to this project:

Create a new Python file in /linguine/ops.
Fill the operation in using the template below.
Import the op in /linguine/operation_builder.py and add the operation to the get_operation_handler function body.
Any unit tests should go in /test.

Operation template

# A sample cleanup operation
# Used to modify the existing text in a corpus set for easier analysis.
# Data will be passed to the op in the form of a collection of corpora.
# The op transforms the contents of each corpus and returns the results.
class FooOp:
    def run(self, data):
        for corpus in data:
            corpus.contents = Bar(corpus.contents)
        return data

# A sample analysis operation
# Used to generate meaningful data from a corpus set.
# Data will be passed to the op in the form of a collection of corpora.
# The op runs analysis on each corpus (or the set as a whole).
# It builds a set of results which are then returned in place of corpora.
class FooOp:
    def run(self, data):
        results = []
        for corpus in data:
            results.append({'corpus_id': corpus.id, 'bar': Bar(corpus.contents)})
        return results

API

HTTP POST '/': It expects a JSON payload in the provided format.

{
	"corpora_ids": ["12345"], //Collection of corpora to pipe into analysis
	"cleanup": ["stopwords"], //Cleanup steps to add
	"operation": "nlp-relation", //Type of analysis to be preformed
	"tokenizer": "", //Tokenizer used (if required)
	"library": "", //Library associated w/ analysis (if required)
	"transaction_id": "", //(Field to be populated by linguine-python)
	"analysis_name": "Relation Extraction (Stanford CoreNLP)", //Name to display in text fields
	"time_created": 1461342250445, //Used to calculate ETA of analyses
	"user_id": "12345" //Unique identifier of user who created analysis
}

Currently implemented operations

Term Frequency
Part of Speech Tagging
Sentiment
Named Entity Recognition
Relation Extraction
Coreference Resolution

Dependencies

Python 3.9.1 or newer (Requires implementation of Future object)
MongoDB
NLTK Punkt model
Stanford CoreNLP Pywrapper (Installation instructions can be found here).

Development

Install Stanford CoreNLP module following docs here.
sudo pip install -r requirements.txt
python -m textblob.download_corpora
python -m linguine.webserver --port <port> --database <database>

Note:

For Linguine 1: port: 5555, database: linguine-development
For Linguine 2: port: 5551, database: linguine2-development

To run tests:

sudo pip install -r requirements.txt
pytest test

Note: running the program from a directory other than the linguine-python root directory will cause directory linking errors.

Name		Name	Last commit message	Last commit date
Latest commit History 375 Commits
linguine		linguine
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
brown.txt		brown.txt
requirements.txt		requirements.txt
setup.py		setup.py
wiki_downloader.py		wiki_downloader.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

linguine-python

Overview

Adding an operation

Operation template

API

Currently implemented operations

Dependencies

Development

About

Releases

Packages

Languages

License

ritlinguine/linguine-python

Folders and files

Latest commit

History

Repository files navigation

linguine-python

Overview

Adding an operation

Operation template

API

Currently implemented operations

Dependencies

Development

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages