[Clarification] Why are alignments fractional? #54

kmuench · 2023-09-11T14:35:34Z

Hello - thank you for engineering such a helpful tool.

I'm running through the tutorial and examining the file "gene_wise_quantifications_combined.tsv", which I understand contains genewise/transcript counts along with raw counts.

I notice that the count data is between 0 and 76.83, with plenty of fractional counts. In this pre-normalized data, why do fractional counts occur? What do they represent? I was under the impression that the gene/transcript-wise quantification at this step was simply counting (i.e. integer quantification) of segemehl alignments to a region.

Thank you for your help!

Tillsa · 2023-09-12T07:54:23Z

Per default each alignment of a read is normalized by the total number of alignments the given read has. Additionally each alignment is normalized by the number of features it overlaps with. Both normalization methods are turned on per default an will result in fractions. You can turn off these options via:
--no_count_split_by_alignment_no
--no_count_splitting_by_gene_no
Further information can be found in the gene quantification documentation

We'll change the documentation to make this point more clear!

kmuench · 2023-09-12T12:37:31Z

Ah, thank you so much for explaining!

My ultimate goal is to use the output of the gene quantification step (ideally) and feed it into a DESeq2-based downstream pipeline - we have an unusual experimental design and I'd like to try altering the design matrix beyond what we'd be able to do in READemption.

Is the split-multimapped-read count table "gene_wise_quantifications_combined.tsv" also the count matrix that is fed into DESeq2? (The source of my confusion was that I thought DESeq2 only accepted matrix integer inputs.)

And am I correct in understanding that --unique_only would count only the reads that are not multimapping?

Said another way, setting no_count_split_by_alignment_no (and no_count_splitting_by_gene_no) to true would mean that reads are double-counted (e.g. a read mapping to multiple genes is +1 for the count of each gene, or +0?), right? Not that they are thrown out?

Thank you very much!

Tillsa · 2023-09-13T11:54:15Z

"And am I correct in understanding that --unique_only would count only the reads that are not multimapping?"
Yes!
"Said another way, setting no_count_split_by_alignment_no (and no_count_splitting_by_gene_no) to true would mean that reads are double-counted (e.g. a read mapping to multiple genes is +1 for the count of each gene, or +0?), right? Not that they are thrown out?"
Yes, you are right!
"The source of my confusion was that I thought DESeq2 only accepted matrix integer inputs"
I believe it also accepts floats as inputs, but maybe they are rounded by DESeq2

kmuench · 2023-09-28T09:12:17Z

Hello! I had a follow-up to this -- when I ran this command: reademption gene_quanti --unique_only -p 4 --features gene,cds,region,exon --project_path READemption_analysis , I still produced an output with fractional reads (e.g., 547.33, 258.5). My understanding is that there should not be any fractional reads because the --unique_only option would eliminate multimapping reads, and therefore there are no reads to be split to produce the multimapping fractional counts. Could you help me understand what is happening here? (I am looking at the file "gene_wise_quantifications_combined.csv")

Tillsa · 2023-10-10T14:05:17Z

The fractions likely result from normalizing alignments by the number of features they overlap with. So if a read aligns to a single position, but the position is part of two features (e.g. gene and exon) each feature gets a count of 0.5.
You can turn this option off by using the parameter --no_count_splitting_by_gene_no.

kmuench changed the title ~~[Clarification] Why are fractional a~~ [Clarification] Why are alignments fractional? Sep 19, 2023

kmuench closed this as completed Sep 19, 2023

kmuench reopened this Sep 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Clarification] Why are alignments fractional? #54

[Clarification] Why are alignments fractional? #54

kmuench commented Sep 11, 2023

Tillsa commented Sep 12, 2023 •

edited

Loading

kmuench commented Sep 12, 2023

Tillsa commented Sep 13, 2023

kmuench commented Sep 28, 2023 •

edited

Loading

Tillsa commented Oct 10, 2023

[Clarification] Why are alignments fractional? #54

[Clarification] Why are alignments fractional? #54

Comments

kmuench commented Sep 11, 2023

Tillsa commented Sep 12, 2023 • edited Loading

kmuench commented Sep 12, 2023

Tillsa commented Sep 13, 2023

kmuench commented Sep 28, 2023 • edited Loading

Tillsa commented Oct 10, 2023

Tillsa commented Sep 12, 2023 •

edited

Loading

kmuench commented Sep 28, 2023 •

edited

Loading