-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash due to locking not handled properly with multiple zm_detect.py processes are run at once with the edgetpu. #43
Comments
I found where auto locking was supposed to be disabled when only one model type is used, which locks the device in detect_stream def in detect_sequence.py but the "auto_lock" false option is not being passed down to detect def. This at least fixes the issue when using a single model but if you add any other models then you will run into the same issue because it enables the auto lock again. To fix the above issue, I added the following in pyzm/ml/detect_sequence.py - def _load_models:
I believe that the solution for this is to just pass the lock(s) from object to face instead of locking/unlocking as we just want to hold the device for the current process.
When I get more time I can attempt to work on this. I don't know Python worth a flip so its slow going. |
Great job on tracking down the root issue! |
Event Server version
6.1.28
Hooks version (if you are using Object Detection)
app:6.1.28, pyzm:0.3.56
The version of ZoneMinder you are using:
v1.37.25
Details
Getting Error: HandleQueuedBulkIn transfer in failed. Not found: USB transfer error 5 [LibUsbDataInCallback] when multiple zm_detect.py processes are run at once with the edgetpu.
tpu_max_processes is set to 1.
I am executing the zm_detect.py with a custom script that dynamically generates the list of frames to check for objects after the event is complete which is for the most part all alarmed frames so the list can be up to 30 frames.
For this scenario, I have object enabled with the edge TPU only.
If this happens to many times the TPU will completely lock up and I have to unplug it and plug it back in to get it working again. I believe this same thing happens with the GPU/Yolo which handles it ok, but it will run out of memory periodically. The TPU only allows 1 process to access it so it just dies every time.
From what I see there's a couple ways to solve this:
We could lock the devices that we are going to be using when the process starts, but we don't know exactly what we need before process the images so devices may be locked when not needed.
We could lock the device when the model is loaded and don't unlock until the current process is done processing images with that model. The process could load additional models with the same lock that is already held for the GPU, etc (not sure if the same process can load multiple models on the TPU or not).
As a side note, an option to be able to run the models sequentially in post processing scenarios such as mine would be nice because there's really no reason (for me) after the event is complete to process each frame through every model, it would be better to just run each model on all the frames, if nothing found move to model 2 and so on. The current sequence makes complete sense with real time detection/events.
My current work around is to comment out the lock release in /pyzm/ml/coral_edgetpu.py so it doesn't release the lock after each frame for the TPU which means the lock is only released once the script is done, but this limits me to only 1 model so I can't do the face detection.
179
180 if self.options.get('auto_lock',True):
181 #self.release_lock()
182 g.logger.Debug(2,'Not Releasing Lock')
183 except:
184 if self.options.get('auto_lock',True):
Debug Logs
The text was updated successfully, but these errors were encountered: