-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to enable igzip multi-threaded run to improve compression throughput, -T parameter does not work #250
Comments
Threading is not added by default in the igzip utility. Need to add HAVE_THREADS to options. Please try the following.
|
Thank you for your guidance.I tried it the way you said, and I encountered new problems. And in the makefile(make.inc), it is found that there is a link to the thread library,but there are still compile errors. progs: $(bin_PROGRAMS) This expression such as "have_threads ?= ..." is hard to understand. |
@gbtucker One more thing to check is whether the decompression supports multithreading.As you can see from the help information, the -T parameter supports compression only. |
As deflate blocks are interdependent, multithreaded decompression is impossible. Pigz supports it technically, but it is actually only running the decompression and the checksumming in different threads. The result is a marginably quicker wall clock time (10%) for a moderate increase in CPU resources (30%). It is in my opinion not a very worthwile endeavour to put it in ISA- L, though it is not my say as I am just a user of the project, not a developer. I only managed to get threading built using |
Rapidgzip is able to fully parallelize decompression. See also the accompanying paper. (Disclaimer: I am the developer of rapidgzip). |
Very interesting! It needs 4 cores though to be on par with igzip. That is quite a cost, unless wall clock time is the only thing to worry about. If you have 200 files to decompress it is 4 times more efficient to do 200 files in parallel rather than using rapidgzip. |
Yes that would be ideal if I could integrate ISA-l for that. That's also why I was looking into the ISA-l source code more deeply. My thought was that I might be able to leverage only the Huffman decoder as a first step in the hope that it is easier to integrate with my custom-written inflate implementation and with the assumption that most of the performance is tied to the Huffman decoder anyway. I already integrated ISA-l for decompression if an index is known (when --import-index is used or when decompressing bgzip-generated files). This use case might not be very common when using rapidgzip as a command line tool but it is very common when using rapidgzip as a library in ratarmount. |
Awesome project! I regularly decompress very big gzip compressed files (an occupational hazard), which is why I frequently roam this part of the internet. I wish you all the best! |
I have integrated the ISA-l Huffman decoder and it sped up the Silesia benchmark by ~40% thanks to the LUT also including the backreference length, but the base64 test case is only ~5% faster. And both values are still almost half as slow as igzip. It seems that most of the performance gains are left in the handcrafted assembler routine for |
@mxmlnkn Random base64 is not really worth it though. That is not really representative of real-world data that is gzipped. I would suggest you test on data that matters to you, your company and your projects. That way your optimizations will at least make someone happy. I personally test on big compressed FASTQ data (millions of short DNA fragments). |
I found that make -f Makefile.unx |
I download the isa-l's soure code,after compiling,open the directory programs/.libs ,find two bin files, igzip and lt-igzip, running as
first: time ./igzip -z /raid0/data/fq/SRR9613620_1.fq -o /raid0/data/fq/9613620_00.gz
real 0m41.708s
user 0m36.434s
sys 0m5.177s
second: time ./igzip -T 10 -z /raid0/data/fq/SRR9613620_1.fq -o /raid0/data/fq/9613620_000.gz
real 0m40.680s
user 0m36.335s
sys 0m4.249s
The time of first and second is nearly, mean that -T parameter does not work.
Thank you for your help.
The text was updated successfully, but these errors were encountered: