-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redo the API video upload #1332
Comments
I agree the check for h264 is excessive. It was more a check that the user didn't put say, PDF files in with videos. The zip uploads them to a different location because the normal upload of a video creates an object at the same time. My opinion is that we should turn off reversion. (It will create a mess if it stays, because it does this every time you upload a video and then moves files around.) For the perspective videos, I made it so it just "overwrites" an existing 'left' or 'right' file and leaves the "GlossVideo object" alone. (So its link just points to a replaced file.) |
Summary of @susanodd : there are some extra things we should think about if we want to do this right. |
I would rather go this way:
This would definitely make my life (and for other developers easier) as we are going to upload max 6 videos per glos + 1 animation file. |
This information is already maintained and is in the admin. We can make this visible via the dataset manager page. That's where "reversion" comes in. Reversion adds extra ".bak" to the end of filenames. And it's ".bak.bak.bak" extensions depending on how many deletes. Or it's version #, in the object model with 0, 1, 2, ... and those all get modified whenever a new video file is uploaded. (The code is inconsistent in doing this. Some of this is automated.) I don't like the code that does this at all. Moreover, if the user keeps retrying to upload a video, it keeps making more of these files...
Where it is uploaded to is a problem. Objects need to be created. At the moment there is also "reversion" as described above. There is also a problem with transactions versus file system operations. If the file system hasn't finished writing an upload before the object is saved something goes wrong. Django writes to temp storage first. All the operations work this way. |
[IMPLEMENTATION] Maybe it would be possible to split the video upload into two parts, so the "gloss video object" and "gloss video history" objects are created, but only dummy things in the file system, so it can be done fast. Then couple a cron job that puts all the actual files to replace the dummies. |
In #1329 the bug seems to be with symbolic links. I fixed this. I rewrote the code to delete first then upload, instead of replace. Hopefully this issue can remove the reversion code. |
A few responses to @rem0g :
For speed, but I wouldn't mind doing the raw video file instead.
Can you rephrase this? I'm unsure what you mean exactly here... how is this different from my original proposal?
I might be mistaken, but I think after the initial upload there are no other processing steps, now that the only check has been removed? That is, it's either uploaded to a gloss, or the upload failed?
How about an endpoint |
Videos are preprocessed already, but zipping video files slows down and compilates the process. Signbank has to download the provided URL and then unpack it, this can all be skipped by using simple video upload via POST form.
That is exactly what your proposal is about, but it should be:
then the video is uploaded and binded at once, becasue it retrieves video file in binary + associated post forms.
SB does post status of other glosses in the response form when i upload entirely different glos video.
Ok |
Django uploads files to a temporary storage location before they are put in the proper locations for the dataset/glosses. |
Because reversion is being used, Django potentially moves a bunch of video files whenever a new video is uploaded. It also creates objects for each upload A video upload is never just one operation. |
@Woseseltops there are two branches/pull requests now with migrations on the underlying video models. |
For restoring videos from the video history a better backup system is needed. My opinion is that reversion needs to be removed. It makes a mess of stuff. Even if we try to program around it, it is still there sometimes. The video history has the operations but there are no videos in it. And if you keep doing operations they keep showing up in the history because it also has unsuccessful (historical) operations stored there. (That is something to do with the file system, because if something failed the objects have already been stored in the database, but perhaps not the file system, and vice verse. Django does not wait on the file system. If there is a solution to this, specific to Django?) |
That's fine, but it's our requirement to upload video via POST form instead of zipping it. I still fail to see any reason to zip the video before uploading it, sorry. |
The zip allows the special characters to remain inside the zip because you can give a simple ascii-based filename. In #1341 you have glosses with special characters in them. We are working on locating the bugs. |
No, every video should be named with signbank id to prevent issues with
gloss renaming.
Gomer
…On Fri, Oct 11, 2024 at 06:42 susanodd ***@***.***> wrote:
That's fine, but it's our requirement to upload video via POST form
instead of zipping it. I still fail to see any reason to zip the video
before uploading it, sorry.
The zip allows the special characters to remain inside the zip because you
can give a simple ascii-based filename.
Signbank has video files with Chinese characters in them, for example.
—
Reply to this email directly, view it on GitHub
<#1332 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACC4F7XT2CIWPSIPOUF6R3DZ25JKFAVCNFSM6AAAAABO5E6FQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBWGU2DGMZSHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Yes, of course. But the zip file itself is not the same as the video file inside that includes the annotation in the filename. The intent of using a zip file is to shield the "information exchange over internet" from the special characters. Of course, we still need to figure out what is wrong with #1341. |
There probably needs to be totally different new code for the API videos. I report here all the buggy things that are messed up with videos. |
I agree the video upload needs a complete rewrite, ideally based on signbank ID as video name. This also goes for NMM and Voorbeeldzinnen videos, they all need ID's. That way we have a better idea of what's going on instead of investigating problems in old codes which needs more people. |
Okay I got the external part working, I can now upload video to a Signbank server like this: # The URL where the video file will be sent
url = 'http://35.159.203.216/dictionary/gloss/12/video'
# The path to your video file
video_file_path = 'glossvideo_ASL_FO_FORCE-834.mov'
# Auth
headers = {'Authorization': f'Bearer {BEARER_TOKEN}'}
# Open the video file in binary mode and send it in a POST request
with open(video_file_path, 'rb') as video_file:
files = {'file': video_file}
response = requests.post(url, files=files, headers=headers)
# Print the server's response
print(response.status_code)
print(response.text) The internal processing of the file will follow probably next week. Let me know what I can do to make it more convenient for you @rem0g ! From what you write, I'm thinking perhaps a generic video upload end point, so you're not forced to put the numeric ID of the gloss in the url? |
@Woseseltops can you take a look at the other issues about the videos? #1331 #1341 #673 The problem is that the database gets locked from the traffic. |
I will make this be an option for the existing zip archive so you can use merely the ID as the filename. (The Manage Media page also uses the same format for the zip archive.) You ought to put like 20 or 50 videos in the zip file. Not just one video. This will reduce traffic to the server. The code makes a streamed-json result to include all the results for all the videos that are in the same zip file. Based on the ID, it is already possible to just lookup the gloss to retrieve its default annotation and its lemma (first two characters) in order to construct the path. |
I posted many excerpts from the logs in the other issues. We can modify the existing zip file upload request to put the file inside as a POST instead of as a URL parameter. |
@rem0g I added api creation with video upload of voorbeeldzinnen here: https://signbank.github.io/Global-signbank/#/Adding%20Signbank%20data/post_dictionary_api_create_annotated_sentence__datasetid__ |
What @Woseseltops proposes is that the API uses a file name is the path to the file destination in the Signbank file system, using underscore where slash is. #1347 (there are actually bugs in creating folders if they don't exist yet) |
@Woseseltops the zip file contains an archive of many videos. Not just one video. I am working on making the zip archive inside the POST instead of as a URL. |
Response to @rem0g :
I'm not sure if you're responding to my Python example or my (alternative) proposal below it, but to make sure I don't slow things down even further I have finished the end point as in the example and put it up for review. Let me know if you're also interested in having a generic endpoint that doesn't force you to put the gloss ID in the URL!
This is example output from using the script above, does that answer your question?
In my test cases it is relatively quick! There are worries on our side that adding this end point might generate a lot of parallel requests, which will lock our database (which was not designed for mass usage). We could implement rate limiting, but instead it's probably easier to ask you to put a few seconds between each request if possible :) |
Yes perfect. I will take a look at it next week.
…On Thu, Oct 17, 2024 at 14:35 Wessel Stoop ***@***.***> wrote:
Response to @rem0g <https://github.com/rem0g> :
Yes perfect
I'm not sure if you're responding to my Python example or my (alternative)
proposal below it, but to make sure I don't slow things down even further I
have finished the end point as in the example and put it up for review. Let
me know if you're also interested in having a generic endpoint that doesn't
force you to put the gloss ID in the URL!
will there also be callback if the video upload has succeed and processed
in the database directly from the URL?
This is example output from using the script above, does that answer your
question?
400
{"error": "No file uploaded"}
200
{"message": "Uploaded video of size 156919 bytes to dataset NGT."}
If that will cause a long wait time I can adjust my scripts.
In my test cases it is relatively quick! There are worries on our side
that adding this end point might generate a lot of parallel requests, which
will lock our database (which was not designed for mass usage). We could
implement rate limiting, but instead it's probably easier to ask you to put
a few seconds between each request if possible :)
—
Reply to this email directly, view it on GitHub
<#1332 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACC4F7UIRIQLJPRKGIVGU2DZ36VIZAVCNFSM6AAAAABO5E6FQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJZGQZDGMJYG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Okay I think the only remaining item here is this idea
I'm not sure what we have in terms of video history at all, so perhaps before I start: @rem0g , given approaching deadlines, do you still think it would make sense for me to work on this? |
|
The video upload now makes use of the add_video command that makes use of reversion. This can go wrong. See #1374 and the other discussions trying to repair it. It may be needed to convert a video if necessary to mp4 prior to performing gloss method add_video, and fail there if it cannot be converted. There was something going on with frame rates when it was failing. If you look in the admin, there are 8 glosses that do not have video files for (NGT) GlossVideo objects. Some other videos with format "webm" are showing up in the Manage Storage page. That format is not visible/supported on Ubuntu or MacOS. The files are not being converted anymore because UvA did not want that for the API. Gloss Video Files with non-mp4 Extensions3 results found Lemma (Nederlands) | Glos-ID | Annotatie (Nederlands) | Annotatie (Engels) | Video Paths
Files that don't exist for the GlossVideo objects
|
@rem0g suggested (in #1331) we redesign the video upload. I think the current workflow is:
/dictionary/upload_zipped_videos_folder_json/{datasetid}/
and provide this url/dictionary/upload_videos_to_glosses/{datasetid}
to actually link them to the glosses.I want to replace this with just one step, to be run for each gloss invidually:
/dictionary/gloss/{gloss_id}/zippedvideo
with just the zipped file in the payload.I don't think uploading the videos one by one has any real drawbacks in terms of overload, and it does make the whole process easier to follow: you upload a video to a gloss, and if it fails somewhere in the process, it can just return what went wrong for that one gloss.
Also, @rem0g suggests to remove the h264 check:
The text was updated successfully, but these errors were encountered: