Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't count kmer on fastq file #137

Open
chequochuu opened this issue Sep 10, 2019 · 11 comments
Open

Can't count kmer on fastq file #137

chequochuu opened this issue Sep 10, 2019 · 11 comments

Comments

@chequochuu
Copy link

chequochuu commented Sep 10, 2019

I using the latest kmc code but i can't count kmer on fastq file. It work on fasta

$cat r1_test.fq 
@0|Chromosome|4051100|4051286/2 BX:Z:CGACACGGTTTGGGCC
AAACCCAACCAC
+
FFFFFFFFFFFF

$kmc -fq -m5 -ci1 -k3 r1_test.fq res ./tmp/
Stage 1: 100%
1st stage: 0.000393s
2nd stage: 6.4e-05s
Total    : 0.000457s
Tmp size : 0MB

Stats:
   No. of k-mers below min. threshold :            0
   No. of k-mers above max. threshold :            0
   No. of unique k-mers               :            0
   No. of unique counted k-mers       :            0
   Total no. of k-mers                :            0
   Total no. of reads                 :            1
   Total no. of super-k-mers          :            0

@marekkokot
Copy link
Contributor

Hi,

I am not able to reproduce this bug.

By latest kmc code you mean that you compile commit 85ad769?

Could you rerun it with -v switch and send me your output?

Could you try to rerun it with -t1 and check if it still does not work?

@chequochuu
Copy link
Author

Yes, I use that commit.
Still got error.


Info: Small k optimization on!

******* configuration for small k mode: *******
No. of input files           : 1
Output file name             : res
Input format                 : FASTQ

k-mer length                 : 3
Max. k-mer length            : 256
Min. count threshold         : 1
Max. count threshold         : 1000000000
Max. counter value           : 255
Both strands                 : true
Input buffer size            : 33554432

No. of readers               : 1
No. of splitters             : 1

Max. mem. size               :  5000MB

Max. mem. for PMM (FASTQ)    :  3294MB
Part. mem. for PMM (FASTQ)   :    33MB
Max. mem. for PMM (reads)    :     1MB
Part. mem. for PMM (reads)   :     0MB
Max. mem. for PMM (b. reader):   402MB
Part. mem. for PMM (b. reader):   134MB

Stage 1: 100%
1st stage: 0.000247s
2nd stage: 6.3e-05s
Total    : 0.00031s
Tmp size : 0MB

Stats:
   No. of k-mers below min. threshold :            0
   No. of k-mers above max. threshold :            0
   No. of unique k-mers               :            0
   No. of unique counted k-mers       :            0
   Total no. of k-mers                :            0
   Total no. of reads                 :            1
   Total no. of super-k-mers          :            0

@chequochuu
Copy link
Author

chequochuu commented Sep 10, 2019

It seem that it doesn't work when reading with barcode included in the read name. When I remove the barcode:

@0|Chromosome|4051100|4051286/2
AAACCCAACCAC
+
FFFFFFFFFFFF

It works like a charm!

@marekkokot
Copy link
Contributor

Hmmm,
it is still weird, that it worked on my machine. Maybe I have prepared input file other then yours. Could you maybe send me your file r1_test.fq ?

@chequochuu
Copy link
Author

This is all my r1_test.fq

@0|Chromosome|4051100|4051286/2 BX:Z:CGACACGGTTTGGGCC
AAACCCAACCAC
+
FFFFFFFFFFFF

@marekkokot
Copy link
Contributor

Hi,
I ment send me a file not its content, because maybe github remove something when you copy paste. It seems unlikely, but currently, I cannot imagine another reason why it works on my machine.

You may also copy what you have pasted here to a new file and check if KMC still produces wrong results on your machine.

@chequochuu
Copy link
Author

I have find out that the character between id and barcode is \t instead of space. Sorry, my bad.

@marekkokot
Copy link
Contributor

marekkokot commented Sep 13, 2019

Ok, thanks for the info. It seems it is the same bug as #42, so I will keep it open to remember to add '\t' support. Anyway, thanks for reporting that issue and thanks for using KMC.

@taprs
Copy link

taprs commented Dec 5, 2023

Bump! I ran into the same issue as of today. Would be cool to have it fixed, especially given that many linked-read pipelines produce tabbed headers by default.

@richardstoeckl
Copy link

The new versions of Nanopore's Dorado and related tools also produce tabbed headers in their fastq files, so I would also appreciate a fix :)

@esdpoort
Copy link

I also ran into this issue with fastq files generated by Dorado v0.7.0 which have tabs in the headers. For now I used seqkit replace to change tabs into spaces as a workaround but it would be nice if kmc could handle tabbed fastq headers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants