-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running Batch Jobs #38
Comments
I also see the segfault on SLURM. Possibly related to this. |
TL;DR: In my case this is no longer a problem with Tried to debug the code to figure out which call triggered the segfault. The problem seems to be in the call to drmaa_get_next_job_id. Afterwards I tried compiling the latest libdrmaa for SLURM and with it I could no longer reproduce the segfault. It seems like this is an upstream problem and only seems to affect some releases when a certain combination of conditions is met. The code used to debug this issue:
On a malfunctioning system the message "Submitting..." would be shown and immediately after "Segmentation fault" . When the system is working properly you should instead see "Job submitted with...". |
For anyone that lands on this post after dealing with segmentation fault errors on SLURM, you might want to ask your cluster administrator to use It's far from perfect and will still segfault if there are options in the |
Perhaps worth adding a note in the docs and closing as this is a DRMAA implementation issue. |
Just adding that natefoo/slurm-drmaa@7b5991e solves this issue. |
Hi I'm trying to run a batch job utilizing the --array slurm option. Wondering if this is possible using drmaa-python. I know there is a runBulkJobs(...), however it seems that this doesn't run an array of jobs. There doesn't seem to be any $SLURM_ARRAY_TASK_ID (or the likes) associated with that run environment.
when I try and run this I get a segmentation fault.
OUTPUT
gdb debug backtrace gives the following result
aside: I'm also having trouble with it throwing a OutOfMemoryException.. Therefore am forced to assume it was aborted due to memory (not preferable) so advice on what's happening there would be great.
Thanks!
The text was updated successfully, but these errors were encountered: