-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes to reversion obsolete file names in code #1374
base: master
Are you sure you want to change the base?
Conversation
paths that are wrong form
@vanlummelhuizen I am in the process of renaming the bak bak files using a command I made. There have been some glitches, but most of them are fixed. Here is an example of something weird:
Why is the original file not in a folder? Why is the filename a number instead of text? It looks like it messed up the "left to right" versus "right to left". But the left-to-right right-to-left mixup might only be in the "print" log. You can see the 2-char folders in writeable/glossvideo/ISL Can you browse the ISL folder? You wrote the original code that creates the files and folders. Here's a couple more for ZEI:
|
This file is in folder The same goes for the other file you mention: root@signbank-new:/var/www/writable/glossvideo# ll /var/www/writable/glossvideo/ZEI/جو/
total 1054
drwxrwxr-x 2 root wwwsignbank 5 Nov 13 09:24 ./
drwxrwxr-x 75 root wwwsignbank 75 Mar 30 2023 ../
-rwxrwxr-x 1 root wwwsignbank 844854 Mar 30 2023 جوجه-37107.mp4*
-rwxrwxr-x 1 root wwwsignbank 112113 Mar 30 2023 جوجه-37107.mp4.bak11859*
-rwxrwxr-x 1 root wwwsignbank 77055 Mar 30 2023 جوجه-37107_small.mp4* |
This comment was marked as outdated.
This comment was marked as outdated.
It's hard to repeat. On tstMH, I uploaded two new videos. Here you can see that it created bakNNN videos.
In the management page, there are now two version 0 videos: Glosses with Multiple Version 0 Videos1 results found
|
Fix to ZEI
Want to replace the 37110 mp4 file with a correct format 37110 mp4 file. After uploading in Gloss Detail:
There's no backup file, but there is a backup GlossVideo object with the wrong path and no actual file: |
[This was actually about a different issue, not this branch] I saved all of the ZEI files locally that were in m4v that I had changed to mp4. They are still in the file system, just the extension is wrong. The one in the example above I used ffmpeg to convert it from m4v to mp4. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
do not include mp4 in reverted videos.
@susanodd It is very hard for me to judge whether your code will work correctly. I suggest you create a test environment that is (almost) the same as the live server. Perhaps the test server could serve that purpose. Also, make a plan of how to apply the changes and run the commands that are necessary to make things correct. Please show us that plan. (Perhaps that also helps me to see what is involved.) Finally, when doing all this on the live server, I think it is wise to backup all media files explicitly. |
This is a bit annoying. I did not write the original reversion code. I'm trying to fix very old code that I did not write. I have tested it on signbank-test for new videos that get uploaded and then deleted. The command @Jetske and I discussed this off-line. What we don't know is whether it is necessary to keep all of these backup files? @uklomp said they don't need backups like this. Their users keep their own videos and can upload them again, if needed. |
Yes, I did say that, but that might only be true for the NGT data.
|
@susanodd My point here is that we need to be careful, because, other than for NGT, people may not have backups themselves. |
I tried to summarize the important aspects of this issue. Details to answer @vanlummelhuizen / describes what the reversion is doing and what happens to the files I did not know what the code was actually doing, so time was spent figuring out what it was doing. It looks like when a gloss is deleted, the video of the gloss remains on the file system, but there is no GlossVideo object pointing to it anymore (because the gloss does not exist). This ends up stored in DeletedGlossOrMedia, where you can see a filename and the former gloss ID saved there. I suppose it would be possible to restore a video that way. But we have zero implementation of this. Moreover, the phonology etc. was not stored. That's why we introduced Archival. I was totally unaware that there were videos of deleted glosses on the file system. But they are unaccessible. I agree we need to be careful. But it is also difficult to tell how much of the "backup and deleted media" is outdated. I prefer to have Admin pages that summarize what is there. For the DeletedGlossOrMedia, it would be possible to add a column to check for the existence of the file on the file system. The way the bakNNN reversion is implemented, it's actually just modifying version numbers (it used to be adding more bak's to it.) It modified the filenames stored in the objects regardless or not of whether there is actually a file. (I saw it was doing this locally.) [For example, if the API does numerous unsuccessful uploads of a video and it is creating GlossVideo objects, but without any video file, then the system will keep making more GlossVideo objects with bakNNN filenames for all the previous unsuccessful uploads, but there are still no video files.) (To force it to do this, one would need a bad video file that is corrupt or something so the file system won't save it and just passes.) So there are lots of weird things going on with the code. I focussed on making it work for new uploads and deletes. But it is also dangerous if it's about really old glosses. They are hard to find in the Admin because the newest is first (depending which table you search, they are ordered differently). It searches on the annotation, but the glosses are stored by the lemma. So it's possible you do not see an older object because there are thousands of objects. This is mainly relevant for the test dataset, since we've made some really weird lemma names and they don't look at all like the annotations. Probably some additional information is needed in the Admin. |
This comment was marked as outdated.
This comment was marked as outdated.
Preface to bug:
I browse the videos in the ADMIN to make sure objects are not being created without any video files.
There is one for AANDOEN-46565 (you can search on it, but it only shows up in the lemma since it's been archived.
The GlossVideo object does not have a video file.
Continued: So I was looking in the folder What caught my eye in that folder is the folllowing (see details below) (bug) It seems the API is making these. (The API uses the same add_video method as Gloss Edit) The version of video upload API that uses a zip file overwrites an already existing video. The new version of @Woseseltops uses reversion. QUESTION: What's going on here? The file system is going to get full. Note: the files below are on MASTER. The code in this branch is not on master yet. all of these videos have version 1.
|
This works correctly on signbank-test. The bak video versions are numbered correctly. I tried both in Gloss Edit (uploaded 3 videos in succession.) |
UPDATE @Woseseltops On signbank-test you can
|
This comment is actually not about what this branch is doing.. It is fixing the code for NEW backup videos. Some code on master uses I included @Jetske command here to be able to inspect what files are in the file system but do not have any objects. I added to relative path to the print statements, so we can see in what folder the files it finds sit. (That function is old code that already exists.) Since this bak bak code still exists, it's not clear whether it is being run. The idea is to FIX the bak bak code and find out whether there are hidden files that are not visible in the admin. In the time it's taking to get this branch reviewed, I have implemented functions for the admin as well as the Manage Video Storage page for viewing the objects and paths. The bug still exists on master. |
Fixes to bak bak reversion paths FOR NEW OBJECTS, delete video and reversion previous videos. This is on signbank-test. It can be experimented with there.
(version * ".bak")
extensions. That has been repaired.Other points of interest about video objects and paths
This branch includes a revised video command "remove_unused" (from 2022, @Jetske) that shows the video files that do no appear in GlossVideo. The path now displays instead of just the filename. [The found videos can be deleted by the command if Y is chosen when prompted. These found files correspond to videos of deleted glosses. Those may include wrong backup paths because the files are not referenced anymore.]
It is recommended to run this command, or to control the folder for a particular gloss on signbank-test before experimenting. Since there are few videos on the development servers, it is advised to make sure there are no gloss video objects for a particular gloss that don't point to any (non-existent) files. The reversion may find really old objects that don't point to anything on the development server. (The development servers get new databases every so often so sometimes there are old files not referenced by the database.)
Delete works like an "undo" of the most recently uploaded video. It restores a previous video by renumbering all the vesions of the old video files.
I put this code in a separate branch, so it can be reviewed specifically in a small scope. #1356: Patch to prevent 7-char sequence and missing objects #1371 (comment)
Some of the code was still using
(version * ".bak")
extensions. That has been repaired.The reversion has been modfied. Only the deleted video is stored in the history, not all the reverted videos (which were also being stored as deleted in the history, but not actually being deleted)