-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unusually high memory memory usage with gene_quanti #56
Comments
Hello! If it helps, I reran gene_quanti with --unique_only and this time it did complete in about 430 minutes using 37.5G Memory, CPU usage was about half of READemption Align. |
We aim to reduce memory consumption in the future. In the meantime there is no other solution than increasing memory. I had a data set with 15 libraries with about 30 million reads each and needed around 400 G Memory. If you need a machine with more power you could try out de.NBI cloud |
Is it normal, that the command runs for several days on 12 paired and samples with fastq files around 12 Gb for each forward and reverse reads on a server with 70 CPUs? |
yes, it is possible. I implemented printing out timestamps of intermediate steps. If there are still timestamps and intermediate steps added from time to time everything runs as intended. You can also post the output of the current command and I have a look at it. |
Which READemption version are you using? Can you check your memory usage? Do you still have memory left? |
I am running version 2.0.4: watch -n 5 free -m
Mem: 806288 16460 67087 13 722740 784735 |
all right, there is still some memory left. I would give it a couple more days. |
Okay, however it is already running for 8 days. Is this still normal behaviour? |
Still running...I fear it is stuck somewhere I have restarted it now with the --no_count_split_by_alignment_no option. Or is using --no_count_splitting_by_gene_no as well a better option? |
Hi, Which --processes makes sense to run with gene_quant. I have given 140 to the job and I can see that for my 24 samples only 24 processes are really used. So number of processes = sample size? USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Thanks in advance. |
Is it somehow possible to give more processes to Reademption? because at the moment only 20% of our total CPU capacity is used. Thanks in advance :) |
exactly, the number of maximum parallel processes is the number of maximum number of samples. Is the gene quantification step still running? |
Hi, yes the step is still running since I started it on the 16.04.2024 with the --no_count_split_by_alignment_no --no_count_splitting_by_gene_no option. It is a paired end analysis with two different bacteria species and 12 samples. Since there are two different species 24 for processes are used, right? |
Hello,
I noticed that gene_quanti seems to be using a strangely high amount of memory.
The command I'm running is:
reademption gene_quanti -p 4 --features 'gene,cds,region,exon' --project_path READemption_project
I'm using this container:
container "tillsauerwein/reademption:2.0.2"
The command runs perfectly fine on the tutorial data (fastq file size 1.5M-1.6M, bam file size), with a standard memory footprint. However, I'm now using a second set of test data with fastq files around 1.8-2.6GB. The reference genomes and gff files are much smaller. Yet, this step keeps crashing due to out of memory errors - so far 200 GB hasn't been enough for it. The alignment ran fine; I'm not sure why this step would be such a memory hog. I'm currently running htseq-count and featureCounts via command line to see how these perform in comparison.
Is this expected behavior? Could this be the result of a bug?
The text was updated successfully, but these errors were encountered: