Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes to reversion obsolete file names in code #1374

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open

Conversation

susanodd
Copy link
Collaborator

@susanodd susanodd commented Nov 7, 2024

Fixes to bak bak reversion paths FOR NEW OBJECTS, delete video and reversion previous videos. This is on signbank-test. It can be experimented with there.

  • Some of the code was still using (version * ".bak") extensions. That has been repaired.

Other points of interest about video objects and paths

  • This branch includes a revised video command "remove_unused" (from 2022, @Jetske) that shows the video files that do no appear in GlossVideo. The path now displays instead of just the filename. [The found videos can be deleted by the command if Y is chosen when prompted. These found files correspond to videos of deleted glosses. Those may include wrong backup paths because the files are not referenced anymore.]

  • It is recommended to run this command, or to control the folder for a particular gloss on signbank-test before experimenting. Since there are few videos on the development servers, it is advised to make sure there are no gloss video objects for a particular gloss that don't point to any (non-existent) files. The reversion may find really old objects that don't point to anything on the development server. (The development servers get new databases every so often so sometimes there are old files not referenced by the database.)

  • Delete works like an "undo" of the most recently uploaded video. It restores a previous video by renumbering all the vesions of the old video files.

  • I put this code in a separate branch, so it can be reviewed specifically in a small scope. #1356: Patch to prevent 7-char sequence and missing objects #1371 (comment)

  • Some of the code was still using (version * ".bak") extensions. That has been repaired.

  • The reversion has been modfied. Only the deleted video is stored in the history, not all the reverted videos (which were also being stored as deleted in the history, but not actually being deleted)

@susanodd
Copy link
Collaborator Author

susanodd commented Nov 13, 2024

@vanlummelhuizen I am in the process of renaming the bak bak files using a command I made. There have been some glitches, but most of them are fixed.

Here is an example of something weird:

rename_backup_videos move /var/www/writable/glossvideo/ISL/עו/עולה-49544.bak.bak /var/www/writable/glossvideo/ISL/עו/עולה-49544.mp4.bak21688

Why is the original file not in a folder? Why is the filename a number instead of text? It looks like it messed up the "left to right" versus "right to left".

But the left-to-right right-to-left mixup might only be in the "print" log.

You can see the 2-char folders in writeable/glossvideo/ISL
Some actually begin with a space. (That's allowed?)

Can you browse the ISL folder? You wrote the original code that creates the files and folders.

Here's a couple more for ZEI:

rename_backup_videos move  /var/www/writable/glossvideo/ZEI/جو/جوجه-37107.bak.bak /var/www/writable/glossvideo/ZEI/جو/جوجه-37107.mp4.bak11859
rename_backup_videos move  /var/www/writable/glossvideo/ZEI/ما/ماست-37183.bak.bak /var/www/writable/glossvideo/ZEI/ما/ماست-37183.mp4.bak12642

@vanlummelhuizen
Copy link
Collaborator

rename_backup_videos move /var/www/writable/glossvideo/ISL/עו/עולה-49544.bak.bak /var/www/writable/glossvideo/ISL/עו/עולה-49544.mp4.bak21688

This file is in folder /var/www/writable/glossvideo/ISL/עו but because of the right-to-left order of the Hebrew text, that part including the slash is displayed in reverse order from what we are used to.

The same goes for the other file you mention:

root@signbank-new:/var/www/writable/glossvideo# ll /var/www/writable/glossvideo/ZEI/جو/
total 1054
drwxrwxr-x  2 root wwwsignbank      5 Nov 13 09:24 ./
drwxrwxr-x 75 root wwwsignbank     75 Mar 30  2023 ../
-rwxrwxr-x  1 root wwwsignbank 844854 Mar 30  2023 جوجه-37107.mp4*
-rwxrwxr-x  1 root wwwsignbank 112113 Mar 30  2023 جوجه-37107.mp4.bak11859*
-rwxrwxr-x  1 root wwwsignbank  77055 Mar 30  2023 جوجه-37107_small.mp4*

@susanodd

This comment was marked as outdated.

@susanodd
Copy link
Collaborator Author

susanodd commented Nov 14, 2024

It's hard to repeat. On tstMH, I uploaded two new videos. Here you can see that it created bakNNN videos.

(env) root@signbank-new:/var/www/repo# ls -l ../writable/glossvideo/tstMH/KA
total 340
-rw-r--r-- 1 ubuntu wwwsignbank 108086 Nov 14 11:14 KAAS-4611.mp4
-rwxrwxr-x 1 root   wwwsignbank 170483 Mar 30  2023 KAAS-4611.mp4.bak11025
-rwxrwxr-x 1 root   wwwsignbank  48143 Mar 30  2023 KAAS-4611_small.mp4
(env) root@signbank-new:/var/www/repo# ls -l ../writable/glossvideo/tstMH/KA
total 450
-rw-r--r-- 1 ubuntu wwwsignbank 120308 Nov 14 11:19 KAAS-4611.mp4
-rwxrwxr-x 1 root   wwwsignbank 170483 Mar 30  2023 KAAS-4611.mp4.bak11025
-rw-r--r-- 1 ubuntu wwwsignbank 108086 Nov 14 11:14 KAAS-4611.mp4.bak24544
-rwxrwxr-x 1 root   wwwsignbank  48143 Mar 30  2023 KAAS-4611_small.mp4
Then I deleted a video. One of the bak's was used to replace the normal video, but the other bak disappeared.

(env) root@signbank-new:/var/www/repo# ls -l ../writable/glossvideo/tstMH/KA
total 101
-rw-r--r-- 1 ubuntu wwwsignbank 108086 Nov 14 11:14 KAAS-4611.mp4

In the management page, there are now two version 0 videos:

Glosses with Multiple Version 0 Videos

1 results found

Lemma (Nederlands) Glos-ID Annotatie (Nederlands) Annotatie (Engels) Video Paths
KAAS 4611 michatest1 michatest1 0: glossvideo/tstMH/KA/KAAS-4611.mp4, 0: glossvideo/tstMH/KA/KAAS-4611.mp4

@susanodd
Copy link
Collaborator Author

susanodd commented Nov 14, 2024

Fix to ZEI

(env) root@signbank-new:/var/www/repo# ls -l ../writable/glossvideo/ZEI/گو
total 1108
-rwxrwxr-x 1 root wwwsignbank 151823 Mar 30  2023 گوسفند-37110.mp4
-rwxrwxr-x 1 root wwwsignbank  52114 Mar 30  2023 گوسفند-37110_small.m4v
-rwxrwxr-x 1 root wwwsignbank 792630 Mar 30  2023 گوشت-37160.mp4
-rwxrwxr-x 1 root wwwsignbank  76261 Mar 30  2023 گوشت-37160_small.mp4

Want to replace the 37110 mp4 file with a correct format 37110 mp4 file. After uploading in Gloss Detail:

(env) root@signbank-new:/var/www/repo# ls -l ../writable/glossvideo/ZEI/گو
total 1108
-rw-r--r-- 1 ubuntu wwwsignbank 159112 Nov 14 12:06 گوسفند-37110.mp4
-rwxrwxr-x 1 root   wwwsignbank  52114 Mar 30  2023 گوسفند-37110_small.m4v
-rwxrwxr-x 1 root   wwwsignbank 792630 Mar 30  2023 گوشت-37160.mp4
-rwxrwxr-x 1 root   wwwsignbank  76261 Mar 30  2023 گوشت-37160_small.mp4

There's no backup file, but there is a backup GlossVideo object with the wrong path and no actual file:

wrong-path-backfile-no-file

@susanodd
Copy link
Collaborator Author

susanodd commented Nov 14, 2024

[This was actually about a different issue, not this branch]
RESOLVED]

I saved all of the ZEI files locally that were in m4v that I had changed to mp4. They are still in the file system, just the extension is wrong. The one in the example above I used ffmpeg to convert it from m4v to mp4.

@susanodd

This comment was marked as outdated.

@susanodd

This comment was marked as outdated.

@susanodd susanodd requested a review from Jetske November 18, 2024 13:36
@susanodd susanodd changed the title #1373, 1356: Separate branch for fixes to bak bak reversion #1373, 1356: Fxes to reversion Nov 18, 2024
@vanlummelhuizen
Copy link
Collaborator

@susanodd It is very hard for me to judge whether your code will work correctly. I suggest you create a test environment that is (almost) the same as the live server. Perhaps the test server could serve that purpose.

Also, make a plan of how to apply the changes and run the commands that are necessary to make things correct. Please show us that plan. (Perhaps that also helps me to see what is involved.)

Finally, when doing all this on the live server, I think it is wise to backup all media files explicitly.

@susanodd
Copy link
Collaborator Author

@susanodd It is very hard for me to judge whether your code will work correctly. I suggest you create a test environment that is (almost) the same as the live server. Perhaps the test server could serve that purpose.

Also, make a plan of how to apply the changes and run the commands that are necessary to make things correct. Please show us that plan. (Perhaps that also helps me to see what is involved.)

Finally, when doing all this on the live server, I think it is wise to backup all media files explicitly.

This is a bit annoying. I did not write the original reversion code. I'm trying to fix very old code that I did not write.

I have tested it on signbank-test for new videos that get uploaded and then deleted.

The command remove_unused can be used and just choose N to not do anything. This can be safely run on the production server to examine what is there.

@Jetske and I discussed this off-line. What we don't know is whether it is necessary to keep all of these backup files?

@uklomp said they don't need backups like this. Their users keep their own videos and can upload them again, if needed.

@uklomp
Copy link
Collaborator

uklomp commented Nov 20, 2024 via email

@vanlummelhuizen
Copy link
Collaborator

@susanodd My point here is that we need to be careful, because, other than for NGT, people may not have backups themselves.

@susanodd
Copy link
Collaborator Author

susanodd commented Nov 20, 2024

@susanodd My point here is that we need to be careful, because, other than for NGT, people may not have backups themselves.

I tried to summarize the important aspects of this issue.

Details to answer @vanlummelhuizen / describes what the reversion is doing and what happens to the files
@Woseseltops might want to read this since it's relevant to how the code works.

I did not know what the code was actually doing, so time was spent figuring out what it was doing.

It looks like when a gloss is deleted, the video of the gloss remains on the file system, but there is no GlossVideo object pointing to it anymore (because the gloss does not exist). This ends up stored in DeletedGlossOrMedia, where you can see a filename and the former gloss ID saved there.

I suppose it would be possible to restore a video that way. But we have zero implementation of this. Moreover, the phonology etc. was not stored. That's why we introduced Archival.

I was totally unaware that there were videos of deleted glosses on the file system. But they are unaccessible.
It's also nearly impossible to write general code that accounts for switching between left to right and right to left.
(The Python paths are not necessarily correct for this.)

I agree we need to be careful. But it is also difficult to tell how much of the "backup and deleted media" is outdated.

I prefer to have Admin pages that summarize what is there. For the DeletedGlossOrMedia, it would be possible to add a column to check for the existence of the file on the file system.

The way the bakNNN reversion is implemented, it's actually just modifying version numbers (it used to be adding more bak's to it.) It modified the filenames stored in the objects regardless or not of whether there is actually a file. (I saw it was doing this locally.)

[For example, if the API does numerous unsuccessful uploads of a video and it is creating GlossVideo objects, but without any video file, then the system will keep making more GlossVideo objects with bakNNN filenames for all the previous unsuccessful uploads, but there are still no video files.) (To force it to do this, one would need a bad video file that is corrupt or something so the file system won't save it and just passes.)

So there are lots of weird things going on with the code.

I focussed on making it work for new uploads and deletes.

But it is also dangerous if it's about really old glosses. They are hard to find in the Admin because the newest is first (depending which table you search, they are ordered differently). It searches on the annotation, but the glosses are stored by the lemma. So it's possible you do not see an older object because there are thousands of objects. This is mainly relevant for the test dataset, since we've made some really weird lemma names and they don't look at all like the annotations.

Probably some additional information is needed in the Admin.
It's most useful for checking if a file exists for a GlossVideo object.
The same would be helpful for the DeletedGlossOrMedia.
The Gloss Video History as misleading. It's also showing unsuccessful attempts to upload videos. So repeated attempts to upload show up. (The objects are created but the file may not have been saved.)

@susanodd

This comment was marked as outdated.

@susanodd
Copy link
Collaborator Author

susanodd commented Nov 25, 2024

@vanlummelhuizen

Preface to bug:

I browse the videos in the ADMIN to make sure objects are not being created without any video files. There is one for AANDOEN-46565 (you can search on it, but it only shows up in the lemma since it's been archived. The GlossVideo object does not have a video file.

Continued: So I was looking in the folder glossvideo/NGT/AA to see if there is a file for it with some other name.

What caught my eye in that folder is the folllowing (see details below)

(bug)

It seems the API is making these. (The API uses the same add_video method as Gloss Edit) The version of video upload API that uses a zip file overwrites an already existing video. The new version of @Woseseltops uses reversion.

QUESTION: What's going on here? The file system is going to get full.

Note: the files below are on MASTER. The code in this branch is not on master yet.

all of these videos have version 1.

-rw-r--r-- 1 ubuntu wwwsignbank 1581584 Nov 23 15:15 AARDPLAAT-47555.mp4.bak101101
-rw-r--r-- 1 ubuntu wwwsignbank 1342762 Nov 23 15:15 AARDPLAAT-47555.mp4.bak101104
-rw-r--r-- 1 ubuntu wwwsignbank 1581584 Nov 23 18:17 AARDPLAAT-47555.mp4.bak109771
-rw-r--r-- 1 ubuntu wwwsignbank 1342762 Nov 23 18:17 AARDPLAAT-47555.mp4.bak109774
-rw-r--r-- 1 ubuntu wwwsignbank 1581584 Nov 23 21:18 AARDPLAAT-47555.mp4.bak118441
-rw-r--r-- 1 ubuntu wwwsignbank 1342762 Nov 23 21:18 AARDPLAAT-47555.mp4.bak118444
-rw-r--r-- 1 ubuntu wwwsignbank 1581584 Nov 24 00:20 AARDPLAAT-47555.mp4.bak127114
-rw-r--r-- 1 ubuntu wwwsignbank 1342762 Nov 24 00:20 AARDPLAAT-47555.mp4.bak127117
-rw-r--r-- 1 ubuntu wwwsignbank 1581584 Nov 24 03:21 AARDPLAAT-47555.mp4.bak135784
-rw-r--r-- 1 ubuntu wwwsignbank 1342762 Nov 24 03:21 AARDPLAAT-47555.mp4.bak135787
-rw-r--r-- 1 ubuntu wwwsignbank 1581584 Nov 24 06:15 AARDPLAAT-47555.mp4.bak144458
-rw-r--r-- 1 ubuntu wwwsignbank 1342762 Nov 24 06:16 AARDPLAAT-47555.mp4.bak144461
-rw-r--r-- 1 ubuntu wwwsignbank 1581584 Nov 24 10:14 AARDPLAAT-47555.mp4.bak153773
-rw-r--r-- 1 ubuntu wwwsignbank 1342762 Nov 24 10:14 AARDPLAAT-47555.mp4.bak153776
-rw-r--r-- 1 ubuntu wwwsignbank 1581584 Nov 24 14:14 AARDPLAAT-47555.mp4.bak163133
-rw-r--r-- 1 ubuntu wwwsignbank 1342762 Nov 24 14:14 AARDPLAAT-47555.mp4.bak163136
-rw-r--r-- 1 ubuntu wwwsignbank 1581584 Nov 24 18:14 AARDPLAAT-47555.mp4.bak172496
-rw-r--r-- 1 ubuntu wwwsignbank 1342762 Nov 24 18:14 AARDPLAAT-47555.mp4.bak172499
-rw-r--r-- 1 ubuntu wwwsignbank 1581584 Nov 24 22:14 AARDPLAAT-47555.mp4.bak181856
-rw-r--r-- 1 ubuntu wwwsignbank 1342762 Nov 24 22:14 AARDPLAAT-47555.mp4.bak181859
-rw-r--r-- 1 ubuntu wwwsignbank 1581584 Nov 25 02:14 AARDPLAAT-47555.mp4.bak191222
-rw-r--r-- 1 ubuntu www-data    1342762 Sep 23 09:43 AARDPLAAT-47555.mp4.bak20010
-rw-r--r-- 1 ubuntu wwwsignbank 1581584 Nov 22 14:39 AARDPLAAT-47555.mp4.bak31774
-rw-r--r-- 1 ubuntu wwwsignbank 1342762 Nov 22 14:39 AARDPLAAT-47555.mp4.bak31777
-rw-r--r-- 1 ubuntu wwwsignbank 1581584 Nov 22 18:15 AARDPLAAT-47555.mp4.bak40441
-rw-r--r-- 1 ubuntu wwwsignbank 1342762 Nov 22 18:16 AARDPLAAT-47555.mp4.bak40444
-rw-r--r-- 1 ubuntu wwwsignbank 1581584 Nov 22 21:15 AARDPLAAT-47555.mp4.bak49111
-rw-r--r-- 1 ubuntu wwwsignbank 1342762 Nov 22 21:16 AARDPLAAT-47555.mp4.bak49114
-rw-r--r-- 1 ubuntu wwwsignbank 1581584 Nov 23 00:16 AARDPLAAT-47555.mp4.bak57781
-rw-r--r-- 1 ubuntu wwwsignbank 1342762 Nov 23 00:16 AARDPLAAT-47555.mp4.bak57784
-rw-r--r-- 1 ubuntu wwwsignbank 1581584 Nov 23 03:16 AARDPLAAT-47555.mp4.bak66451
-rw-r--r-- 1 ubuntu wwwsignbank 1342762 Nov 23 03:16 AARDPLAAT-47555.mp4.bak66454
-rw-r--r-- 1 ubuntu wwwsignbank 1581584 Nov 23 06:13 AARDPLAAT-47555.mp4.bak75121
-rw-r--r-- 1 ubuntu wwwsignbank 1342762 Nov 23 06:14 AARDPLAAT-47555.mp4.bak75124
-rw-r--r-- 1 ubuntu wwwsignbank 1581584 Nov 23 09:15 AARDPLAAT-47555.mp4.bak83770
-rw-r--r-- 1 ubuntu wwwsignbank 1342762 Nov 23 09:15 AARDPLAAT-47555.mp4.bak83773
-rw-r--r-- 1 ubuntu wwwsignbank 1581584 Nov 23 12:15 AARDPLAAT-47555.mp4.bak92434
-rw-r--r-- 1 ubuntu wwwsignbank 1342762 Nov 23 12:15 AARDPLAAT-47555.mp4.bak92437

@susanodd
Copy link
Collaborator Author

susanodd commented Nov 25, 2024

This works correctly on signbank-test. The bak video versions are numbered correctly.

I tried both in Gloss Edit (uploaded 3 videos in succession.)
And in the API.
All the gloss video bakNNN objects have different version numbers.

@susanodd
Copy link
Collaborator Author

UPDATE @Woseseltops

On signbank-test you can

  • Use the Swagger API to upload videos
  • To confirm that the version numbers are being incremented (the bug on master/production sets them to 1)
  • Use Manage Datasets -> Manage Video Storage to inspect the paths of the gloss video objects

@susanodd
Copy link
Collaborator Author

susanodd commented Nov 29, 2024

@susanodd It is very hard for me to judge whether your code will work correctly. I suggest you create a test environment that is (almost) the same as the live server. Perhaps the test server could serve that purpose.

Also, make a plan of how to apply the changes and run the commands that are necessary to make things correct. Please show us that plan. (Perhaps that also helps me to see what is involved.)

Finally, when doing all this on the live server, I think it is wise to backup all media files explicitly.

This comment is actually not about what this branch is doing..

It is fixing the code for NEW backup videos. Some code on master uses (version * ".bak")

I included @Jetske command here to be able to inspect what files are in the file system but do not have any objects. I added to relative path to the print statements, so we can see in what folder the files it finds sit. (That function is old code that already exists.)

Since this bak bak code still exists, it's not clear whether it is being run. The idea is to FIX the bak bak code and find out whether there are hidden files that are not visible in the admin.

In the time it's taking to get this branch reviewed, I have implemented functions for the admin as well as the Manage Video Storage page for viewing the objects and paths. The bug still exists on master.

@susanodd susanodd changed the title #1373, 1356: Fxes to reversion Fixes to reversion obsolete file names in code Nov 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants