Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add synset wordnet links to glosses #1382

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Add synset wordnet links to glosses #1382

wants to merge 13 commits into from

Conversation

Jetske
Copy link
Collaborator

@Jetske Jetske commented Nov 8, 2024

Waiting also for approval of @uklomp in #1350

This is a command that adds/updates the wordnet synset information. Note: this only works for NGT. It takes a little while to run it. I wonder if this is the best way, for example, it is also possible to add a button "Update synsets" to the manage datasets page. But I think it's better to run the command for example once a week to update it automatically.

The first time the command is used, it creates all synsets, finds missing wordnet information, and links them to glosses. After that, when updating synsets, it will use those that are already there, create new ones and deletes the ones not used anymore.

To do:
add folder "synsets" to writable
Add your wordnet login credentials to server specific settings
migrate

@Jetske Jetske linked an issue Nov 8, 2024 that may be closed by this pull request
@Jetske Jetske added the DRAFT label Nov 8, 2024
@susanodd
Copy link
Collaborator

@Jetske can you add the verbose names to the fields of the new model? That is used elsewhere, for example in queries if they are included in search fields.

@Jetske
Copy link
Collaborator Author

Jetske commented Nov 28, 2024

@susanodd adding verbose names would be redundant as they would be the same as the field names, or not?

@Jetske Jetske removed the DRAFT label Nov 28, 2024
@Jetske
Copy link
Collaborator Author

Jetske commented Nov 28, 2024

@Woseseltops or @vanlummelhuizen thi is not a draft anymore, can one of you review this?
I'll ask for a signbank wordnet account to add to settings, so we don't have to use mine.

@susanodd
Copy link
Collaborator

susanodd commented Nov 29, 2024

@susanodd adding verbose names would be redundant as they would be the same as the field names, or not?

The verbose names will be put in the translation files. Then if the fields are used in search forms, they will be retrieved there. The query parameters also make use of these.

@Jetske Jetske requested review from vanlummelhuizen and removed request for Woseseltops November 29, 2024 09:49
@vanlummelhuizen
Copy link
Collaborator

Thanks @Jetske

One important thing: nltk and bs4 are not in requierments.txt.

Question: the Synset model now links to Glosses using a ManyToManyField and to Dataset using a ForeignKey. Can Dataset be derived from Glosses, making the ForeignKey to Dataset unnecessary?

You now download the data and save it in a file on disk. Would it be feasible to keep the data in memory and use it from there? It would avoid potential file system problems.

Some minor remarks below.

Some methods in update_wordnet_links.py are quite long and some have quite some branching (for, if). If possible (time wise) you could see if you can make them shorter and simpler.

In lines 57-79 you are doing the same thing twice with a minor difference: 'links' and 'signs'. It would be a candidate to extract a new method from.

In get_links_data there are two nicely separated parts. Getting the signs from the CSV could be done in a separate method.

Note that these remarks may interact with the thing I said about saving data to disk.

@Jetske
Copy link
Collaborator Author

Jetske commented Dec 4, 2024

@vanlummelhuizen Yes, you're right that there is not really a need now to save the files. It's changed!

download_response = session.post(download_url, data=download_payload)
if download_response.status_code == 200:
csv = download_response.content
print("Links CSV downloaded successfully.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print("Links CSV downloaded successfully.") should probably be print("{csv_type} CSV downloaded successfully.")

@vanlummelhuizen
Copy link
Collaborator

Good busy!

I saw you made the variable names for the urls lower case. I thought upper case was a good choice since they are more like settings/constants. If anything, I would probably move them to the top of the file so that anyone reading this code instantly knows we are dealing with external urls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Link Wordnet to Signbank
3 participants