-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KMC 3 stops during stage 2 when using BFC-corrected reads #42
Comments
Hello, Thanks for reporting that issue.
Or you mean some other "stats table" (which?)? I am asking because this table should be printed after finishing stage 2. |
Hi Marek, The input files are not publicly available. The input includes 211,646,643 interleaved read pairs, and the size of the gzipped FASTQ file is approximately 36GB. Yes, I meant the table printed after finishing stage 2. I used this command
I usually use the default setting for memory ( |
Oh, OK you mean 0s as zeros, not as zero seconds :) |
I tried with both KMC v2 installed with Conda and your KMC v3 pre-compiled version. I don't understand what might be the problem. KMC works fine when I use the raw FASTQ files; that is, without prior error correction with BFC. I will send you an e-mail and attach a few thousand reads. |
Hi, I am not sure if we should allow tabs, what do you think?. On the other hand at first look it seems that there is only a little change required in KMC code, so maybe I will do it i the next couple of days. Anyway thanks for reporting that bug and using KMC. |
Thank you for your help! I also think that FASTQ headers are not supposed to have tabs. At least in Illumina reads, a space should precede the read number element. I have read that the format of reads corrected with BFC might cause parsing problems in other tools, such as SGA and khmer. However, BFC-corrected reads have been used with various genome assemblers, and I have used them without any prior reformatting in assemblies with SPAdes. Anyway, I will replace tabs with spaces and try again. |
Hello,
I'm trying to use KMC v3 with reads previously corrected with BFC. However, KMC stops during stage 2, there is no warning or error message, and the stats table shows only 0s. I ran Jellyfish v2 with the same corrected reads without a problem. Below are the commands that I'm using.
Correct reads
bash -c "bfc -s 200m -k33 -t 16 <(seqtk mergepe reads_1.fastq.gz reads_2.fastq.gz) <(seqtk mergepe reads_1.fastq.gz reads_2.fastq.gz) | gzip -1 > bfc-corrected.fastq.gz"
Count k-mers
kmc -k21 -ci2 -m100 -t12 -v bfc-corrected.fastq.gz bfc-corrected_kmc3 ./tmp
This is an example of a read pair after BFC correction:
@E00476:214:HHLTNALXX:8:1101:21217:1186 ec:Z:0_0:104_0_3:0_0
aTAACATATAATGTTTTTAAATAAATTTTAATTTAATTGGAATACTTATTTATTCAATAAAATTATTAACAATAATTTACCTCTATTTTGGTTTCAATTAAATAAATTTATAgAGAAATAaTAAATAAATAAAGCTTCTAACTTTATAATA
+
&???????????????????????????????????????+??????+??????++??+++???+???????????????+???+?????+????++??+++?+++??????%++++???%++?????+?+???+????+?++????++??
@E00476:214:HHLTNALXX:8:1101:21217:1186 ec:Z:0_0:103_0_3:0_0
aTATATTTTTGTTTATTATTTTAAGTATAGGTTAATTGAAGAATTATTTAATTTATTAAAATTAGATTATTTTGTTTATTATAAAATATTTTATTTTTTTTTTATAATTATAATTTTTTATTATTTTTTATTTgATTAAAATaTATGAATA
+
&?????????????????????????????????????????????????????++????????????????????+??????????+?????????????????++?++++++?????++????????++??%+??++++?#???+++?+
I would really appreciate any help.
The text was updated successfully, but these errors were encountered: