Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the number of identical get_two_letter_dir functions to one #1390

Open
vanlummelhuizen opened this issue Nov 13, 2024 · 6 comments
Open

Comments

@vanlummelhuizen
Copy link
Collaborator

I noticed there are six identical get_two_letter_dir functions, see details below. This is a big violation of the Don't Repeat Yourself (DRY) principle in software development. We should get rid of five of them and avoid copying+pasting code or even rewriting the same thing in the future.

❯ git grep -A 6 "def get_two_letter"
signbank/dataset_checks.py:def get_two_letter_dir(idgloss):
signbank/dataset_checks.py-    foldername = idgloss[:2]
signbank/dataset_checks.py-
signbank/dataset_checks.py-    if len(foldername) == 1:
signbank/dataset_checks.py-        foldername += '-'
signbank/dataset_checks.py-
signbank/dataset_checks.py-    return foldername
--
signbank/dictionary/management/commands/rename_backup_gloss_videos.py:def get_two_letter_dir(idgloss):
signbank/dictionary/management/commands/rename_backup_gloss_videos.py-    foldername = idgloss[:2]
signbank/dictionary/management/commands/rename_backup_gloss_videos.py-
signbank/dictionary/management/commands/rename_backup_gloss_videos.py-    if len(foldername) == 1:
signbank/dictionary/management/commands/rename_backup_gloss_videos.py-        foldername += '-'
signbank/dictionary/management/commands/rename_backup_gloss_videos.py-
signbank/dictionary/management/commands/rename_backup_gloss_videos.py-    return foldername
--
signbank/dictionary/management/commands/rename_non_mp4_extensions.py:def get_two_letter_dir(idgloss):
signbank/dictionary/management/commands/rename_non_mp4_extensions.py-    foldername = idgloss[:2]
signbank/dictionary/management/commands/rename_non_mp4_extensions.py-
signbank/dictionary/management/commands/rename_non_mp4_extensions.py-    if len(foldername) == 1:
signbank/dictionary/management/commands/rename_non_mp4_extensions.py-        foldername += '-'
signbank/dictionary/management/commands/rename_non_mp4_extensions.py-
signbank/dictionary/management/commands/rename_non_mp4_extensions.py-    return foldername
--
signbank/tools.py:def get_two_letter_dir(idgloss):
signbank/tools.py-    foldername = idgloss[:2]
signbank/tools.py-
signbank/tools.py-    if len(foldername) == 1:
signbank/tools.py-        foldername += '-'
signbank/tools.py-
signbank/tools.py-    return foldername
--
signbank/video/models.py:def get_two_letter_dir(idgloss):
signbank/video/models.py-    foldername = idgloss[:2]
signbank/video/models.py-
signbank/video/models.py-    if len(foldername) == 1:
signbank/video/models.py-        foldername += '-'
signbank/video/models.py-
signbank/video/models.py-    return foldername
--
signbank/zip_interface.py:def get_two_letter_dir(idgloss):
signbank/zip_interface.py-    foldername = idgloss[:2]
signbank/zip_interface.py-
signbank/zip_interface.py-    if len(foldername) == 1:
signbank/zip_interface.py-        foldername += '-'
signbank/zip_interface.py-
signbank/zip_interface.py-    return foldername
@susanodd
Copy link
Collaborator

susanodd commented Nov 14, 2024

This had something to do with it being needed inside "models" where it can't import from the other files.
As well as circular references because it was already inside "video/models" as well as inside of "tools".
I added it explicitly because the imports were going wrong. One place it's just coded in-line.

The three "commands" are going to be thrown away, basically, once it's all converted. That's why I put them there, to not import other code.

It's also violating other principles because the entire file system is visible inside the models.

@vanlummelhuizen
Copy link
Collaborator Author

This had something to do with it being needed inside "models" where it can't import from the other files.
As well as circular references because it was already inside "video/models" as well as inside of "tools".
I added it explicitly because the imports were going wrong. One place it's just coded in-line.

To circumvent circular imports, you could do the import within a function, class or method. See
get_two_letter_dir.patch.txt

Can you see whether this works on your side? git apply <patch-file>

@susanodd
Copy link
Collaborator

susanodd commented Nov 14, 2024

Can we make a totally new "file system related functions" file to put it in? The problem is that it also needs to use "idgloss" but that function is full of try-except clauses. So it could be failing internally yet we don't know. The glosses where video's are being uploaded are supposed to have lemma's, dataset's, lemma translations. So this should not be failing internally. I made a separate issue a few weeks ago about the idgloss property.

Because there were unexplained failings in the API I was trying to localize the code to be able to debug it.

If the file system related methods can be put in a file where other functions are not imported from that would help.
The problem with "tools" is that it's become full of numerous other functions.

The "database_checks" file was made to be independent because it takes a really long time to run. I assumed it would reduce the runtime if it didn't import the function. Although it makes use of idgloss. So that doesn't really solve anything.

I think the idgloss property should fail if the gloss is missing a dataset or lemma or lemma translation. (But that maks for awful code elsewhere.)

The entire video upload system is using this function constantly.
Plus the API for a non-zipped file is now making use of the backup system too. (So there may be new files with bak bak at the end being created if that is not solved.)

I merged the renaming of the backup files because it's in use and it's kind of high priority.

I agree completely about the 2-char function. But it's kind of in combination with idgloss. (And then that gets worse with all the right to left and pictogram characters.)

In some of the code (not reviewed) I started making the "desired relative path". In order to compare to see if something was messed up
There is a lot of inconsistency in the videofile access. Some uses of path, versus, name, versus url.
But those are not necessary available. Plus the original "upload_to" used to just be "glossvideo" without any dataset.
I don't know whether those can be "frozen" inside of old objects. (?)

susanodd pushed a commit that referenced this issue Nov 20, 2024
@susanodd
Copy link
Collaborator

I removed the three commands that had included the duplicated code.

@vanlummelhuizen
Copy link
Collaborator Author

Are they not necessary for work on the live server?

@susanodd
Copy link
Collaborator

Are they not necessary for work on the live server?

No, they're not necessary. The command remove_unused is very useful. But it's doing a traversal of the file system.

I don't know what to do about the right to left paths. The traversal command works better because it isn't making paths.

I was thinking it would be helpful to have a kind of "relative path" method like for the "idgloss" in order to embed the two-char folder into the path. @Woseseltops was writing the paths with underscore instead of slash. But it turned out that was just for documentation of the API.

Some of the code that uses paths needs to use the string version, not the Python path.
It ends up with type checking problems.
And then if the path is used in the url or to retrieve videos or images, then it's also different.

For the zip code I put the function there explicitly because I wanted the code to be self-contained.
The "idgloss" is used a lot. But it's a method. There are many Gloss model methods for getting paths related to a gloss. And these also end up calling each other. It is very relevant whether a file actually exists. Everything goes wrong with "existence" tests if the "path" ends up just being "glossvideo/NGT" without anything else. That exists. But then there is an error for protected_media because there is no file.

susanodd added a commit that referenced this issue Dec 3, 2024
Function to retrieve video content type of video file. Added here to avoid duplicating it in multiple files on multiple branches
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants