-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Clarification] Why are alignments fractional? #54
Comments
Per default each alignment of a read is normalized by the total number of alignments the given read has. Additionally each alignment is normalized by the number of features it overlaps with. Both normalization methods are turned on per default an will result in fractions. You can turn off these options via: We'll change the documentation to make this point more clear! |
Ah, thank you so much for explaining! My ultimate goal is to use the output of the gene quantification step (ideally) and feed it into a DESeq2-based downstream pipeline - we have an unusual experimental design and I'd like to try altering the design matrix beyond what we'd be able to do in READemption. Is the split-multimapped-read count table "gene_wise_quantifications_combined.tsv" also the count matrix that is fed into DESeq2? (The source of my confusion was that I thought DESeq2 only accepted matrix integer inputs.) And am I correct in understanding that --unique_only would count only the reads that are not multimapping? Said another way, setting no_count_split_by_alignment_no (and no_count_splitting_by_gene_no) to true would mean that reads are double-counted (e.g. a read mapping to multiple genes is +1 for the count of each gene, or +0?), right? Not that they are thrown out? Thank you very much! |
"And am I correct in understanding that --unique_only would count only the reads that are not multimapping?" |
Hello! I had a follow-up to this -- when I ran this command: |
The fractions likely result from normalizing alignments by the number of features they overlap with. So if a read aligns to a single position, but the position is part of two features (e.g. gene and exon) each feature gets a count of 0.5. |
Hello - thank you for engineering such a helpful tool.
I'm running through the tutorial and examining the file "gene_wise_quantifications_combined.tsv", which I understand contains genewise/transcript counts along with raw counts.
I notice that the count data is between 0 and 76.83, with plenty of fractional counts. In this pre-normalized data, why do fractional counts occur? What do they represent? I was under the impression that the gene/transcript-wise quantification at this step was simply counting (i.e. integer quantification) of segemehl alignments to a region.
Thank you for your help!
The text was updated successfully, but these errors were encountered: