-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add synset wordnet links to glosses #1382
base: master
Are you sure you want to change the base?
Conversation
@Jetske can you add the verbose names to the fields of the new model? That is used elsewhere, for example in queries if they are included in search fields. |
@susanodd adding verbose names would be redundant as they would be the same as the field names, or not? |
@Woseseltops or @vanlummelhuizen thi is not a draft anymore, can one of you review this? |
The verbose names will be put in the translation files. Then if the fields are used in search forms, they will be retrieved there. The query parameters also make use of these. |
Thanks @Jetske One important thing: nltk and bs4 are not in requierments.txt. Question: the Synset model now links to Glosses using a ManyToManyField and to Dataset using a ForeignKey. Can Dataset be derived from Glosses, making the ForeignKey to Dataset unnecessary? You now download the data and save it in a file on disk. Would it be feasible to keep the data in memory and use it from there? It would avoid potential file system problems. Some minor remarks below. Some methods in In lines 57-79 you are doing the same thing twice with a minor difference: 'links' and 'signs'. It would be a candidate to extract a new method from. In Note that these remarks may interact with the thing I said about saving data to disk. |
@vanlummelhuizen Yes, you're right that there is not really a need now to save the files. It's changed! |
download_response = session.post(download_url, data=download_payload) | ||
if download_response.status_code == 200: | ||
csv = download_response.content | ||
print("Links CSV downloaded successfully.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print("Links CSV downloaded successfully.")
should probably be print("{csv_type} CSV downloaded successfully.")
Good busy! I saw you made the variable names for the urls lower case. I thought upper case was a good choice since they are more like settings/constants. If anything, I would probably move them to the top of the file so that anyone reading this code instantly knows we are dealing with external urls. |
Waiting also for approval of @uklomp in #1350
This is a command that adds/updates the wordnet synset information. Note: this only works for NGT. It takes a little while to run it. I wonder if this is the best way, for example, it is also possible to add a button "Update synsets" to the manage datasets page. But I think it's better to run the command for example once a week to update it automatically.
The first time the command is used, it creates all synsets, finds missing wordnet information, and links them to glosses. After that, when updating synsets, it will use those that are already there, create new ones and deletes the ones not used anymore.
To do:
add folder "synsets" to writable
Add your wordnet login credentials to server specific settings
migrate