Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link Wordnet to Signbank #1350

Open
Jetske opened this issue Oct 22, 2024 · 7 comments · May be fixed by #1382
Open

Link Wordnet to Signbank #1350

Jetske opened this issue Oct 22, 2024 · 7 comments · May be fixed by #1382
Assignees

Comments

@Jetske
Copy link
Collaborator

Jetske commented Oct 22, 2024

Add "Wordnet synsets" under senses

https://www.sign-lang.uni-hamburg.de/easier/sign-wordnet/index_ngt.html

E.g.
https://www.sign-lang.uni-hamburg.de/easier/sign-wordnet/sign/ngt.1863.html
this page has a link to signbank -> do this in reverse in signbank, but instead don't link to the gloss but to every synset of that gloss

With a list like this:
image

@Jetske
Copy link
Collaborator Author

Jetske commented Nov 8, 2024

image
@uklomp this looks good, right? :)

Some problems:

  • Many synsets don't have a working link, lemmas or description. I assume because they are new, or something. I decided to not show them at all.
  • For some synsets I can't find the lemmas and description that belong to it (see example above) but the link is working fine! I think this is because the wordnet python package was not updated, or something. I'll think a little more about a way to work around this.

@Jetske
Copy link
Collaborator Author

Jetske commented Nov 8, 2024

image

Okay I fixed the missing lemmas and descriptions with web scraping the urls that do work. It works, but is probably not really ideal as it takes quite long and there's a risk of too many requests.

@Woseseltops @vanlummelhuizen if you have ideas let me know

Jetske added a commit that referenced this issue Nov 8, 2024
Jetske added a commit that referenced this issue Nov 8, 2024
@Jetske Jetske linked a pull request Nov 8, 2024 that will close this issue
@Jetske Jetske linked a pull request Nov 8, 2024 that will close this issue
@vanlummelhuizen
Copy link
Collaborator

@Jetske First a small technical remark/question about your command. In method download_links_csv you start a context with with ...:

with requests.Session() as session:
login_payload = {
"username": WORDNET_USERNAME,
"password": WORDNET_PASSWORD
}
# Send the login request
login_response = session.post(LOGIN_URL, data=login_payload)

It seems to me you end it prematurely on line 48.


If I correctly understand what you are trying to do, I think it boils down to two questions:

  1. When should the script run? Once a day, every hour or triggered by an event?
  2. What should be done and what not? Can the script skip parts if they are not necessary at that time, so that it runs faster?

Depending on the answers we could find a better solution.

@Jetske
Copy link
Collaborator Author

Jetske commented Nov 13, 2024

@vanlummelhuizen Yes, that can probably be done without the 'with'

I would say ideally once a day. Everything should be done, as it is meant to update the synsets according to new versions of the different files that are downloaded. Each of these files may contain changes.

@vanlummelhuizen
Copy link
Collaborator

@vanlummelhuizen Yes, that can probably be done without the 'with'

I would add the rest of the code in the method to the with-context simply by indenting it.

I would say ideally once a day. Everything should be done, as it is meant to update the synsets according to new versions of the different files that are downloaded. Each of these files may contain changes.

Once a day, the whole thing. Then you probably should concentrate on avoiding making too many requests. Down side: the script will take even longer.

@Jetske
Copy link
Collaborator Author

Jetske commented Nov 13, 2024

Yes, to prevent too many requests it's probably better to check which links are updated and only edit the synsets in case of changes, rather than what is done now (delete everything and make again).

@Jetske
Copy link
Collaborator Author

Jetske commented Nov 20, 2024

@vanlummelhuizen done, now it just deletes the links between glosses and synsets and creates them again. The synsets themselves are not deleted, except if they are not used anymore. Only for new synsets it makes a web scrape request, if it could not be found in wordnet. That should make it much more efficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants