Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some extractors (get_money, get_durations) are simply not working, while get_dates is? help? #76

Open
abrljak opened this issue Oct 24, 2024 · 0 comments

Comments

@abrljak
Copy link

abrljak commented Oct 24, 2024

Hi,
my app is a simple Rest API endpoint that is attempting to use 3 extractors (money, duration and dates) to locate the information in a supplied text. I pass in the same input text in all 3 extractors:

The amount of 120.000 USD should be paid in 12 equal monthly instalments starting with Jun 16, 2024.

I get the following output:

{
    "dates": [
        "Sun, 16 Jun 2024 00:00:00 GMT"
    ],
    "durations": [],
    "money": []
}

It looks like the get_dates() did it's job perfectly, but the other 2 extractors have not.
I have tried many different examples, tried downloading various tokenizers via nltk hoping that I am missing a dependency or something. I have no idea what might be wrong and I have a feeling I am missing something really simple.

Here is my complete code:

from flask import Flask, request, jsonify

import nltk
import lexnlp.extract.en.money
import lexnlp.extract.en.durations
import lexnlp.extract.en.dates

nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
nltk.download('stopwords')
nltk.download('maxent_ne_chunker')
nltk.download('words')

app = Flask(__name__)

@app.route('/extract', methods=['POST'])
def extract_info():
    # Get the text from the request body
    data = request.json
    contract_text = data.get('text', '')

    if not contract_text:
        return jsonify({"error": "No text provided"}), 400

    money = list(lexnlp.extract.en.money.get_money(contract_text))
    durations = list(lexnlp.extract.en.durations.get_durations(contract_text))
    dates = list(lexnlp.extract.en.dates.get_dates(contract_text))

    return jsonify({
        "money": money,
        "durations": durations,
        "dates": dates
    })

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

I am running everything inside docker - so here is also the complete dockerfile

FROM python:3.9-slim
RUN apt-get update && \
    apt-get install -y build-essential git ca-certificates && \
    update-ca-certificates
RUN git --version
RUN pip install spacy numpy dateparser pyahocorasick unidecode quantulum3 regex nltk
RUN python -m spacy download en_core_web_sm
RUN pip install git+https://github.com/LexPredict/[email protected]
RUN pip install Flask
WORKDIR /app
COPY . /app
CMD ["python", "app.py"]

Can you please help?

Thank you,
A.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant