Fix: categorical association dataframe output #46

aron0093 · 2024-09-09T23:59:09Z

The results dataframe should contain "program_name" column. Currently it is the index which is inconsistent from the output from other evaluations.
The p-values are automatically rounded off. Explicitly setting dtype should fix this.

adamklie · 2024-09-10T20:28:52Z

After discussion, we are thinking of moving to a different strategy for the categorical association evaluation than is currently implemented.

Evaluation side

By default, we will treat each level of the category separately and run the following procedure:

Binarize the category across cells (1 for category==level, 0 if category!=level) --> can probably use Narges's function here
For each program, calculate a Pearson correlation between this binary vector and the program scores. Adjust the pvals using https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.fdrcorrection.html
For each program, calculate a log2FC of (mean program scores of category==level)/(mean program scores of category!=level)

We will then combine these results across levels into a single wide dataframe:

program_name	batch_level_1_pearsonr_stat	batch_level_1_pearsonr_pval	batch_level_1_pearsonr_adj_pval	batch_level_1_log2FC	batch_level_2_pearsonr_stat	batch_level_2_pearsonr_pval	batch_level_2_pearsonr_adj_pval	batch_level_2_log2FC
0	0.82	0.004	0.012	1.4	0.75	0.008	0.02	1.3
1	0.65	0.03	0.09	0.7	0.58	0.035	0.12	0.8
2	0.90	0.002	0.006	1.9	0.81	0.005	0.01	1.7

We will then save this as a txt file for each categorical_key in the config with the following file name: {prog_key}_{categorical_key}_association_results.txt

prog_key comes from the config
categorical_key comes from the config
must end in association_results.txt suffix
Example: cNMF_batch_association_results.txt

Dashapp side

The dashapp will parse the output directory for all files with this suffix and will parse the appropriate categorical_key (which it can match to the eval config it is also passed in)
It will then plot a level x program by heatmap of the correlations (update to plotly)
It will also compute an aggregate statistic across levels (e.g. max) and generate a plot that shows the distribution of the statistic (without pval for now*)

This is due to the inflation of pvals seen when running a test on this statistic. Also gaven't fully decided on whether to rank-order this or not.

adamklie · 2024-09-11T22:59:32Z

I came up with a really hacky solution to this. I really couldn't figure out an easy way to implement this as an option with the current codebase. Basically, it looks for the combination of using test="pearsonr" and mode=one_vs_all and runs a separate set of code for that one case.

Obviously this won't do for the future, but it was the only way I could manage to get it to work while still preserving most of the old functionality.

TL;DR hopefully this works, but we will likely need to revisit

adamklie · 2024-09-11T23:17:06Z

Hacked the implementation
Update the eval pipeline
Reconfigure the dashapp

aron0093 self-assigned this Sep 9, 2024

aron0093 added the bug Something isn't working label Sep 9, 2024

adamklie added enhancement New feature or request dashboard evaluations labels Sep 10, 2024

adamklie added this to the 2024 Gene Program Jamboree milestone Sep 10, 2024

This was referenced Sep 10, 2024

Add log2fc and FDR to association test outputs #47

Closed

Single run analysis page, Section 2: Covariate association #25

Closed

adamklie self-assigned this Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: categorical association dataframe output #46

Fix: categorical association dataframe output #46

aron0093 commented Sep 9, 2024

adamklie commented Sep 10, 2024

adamklie commented Sep 11, 2024

adamklie commented Sep 11, 2024 •

edited

Loading

Fix: categorical association dataframe output #46

Fix: categorical association dataframe output #46

Comments

aron0093 commented Sep 9, 2024

adamklie commented Sep 10, 2024

Evaluation side

Dashapp side

adamklie commented Sep 11, 2024

adamklie commented Sep 11, 2024 • edited Loading

adamklie commented Sep 11, 2024 •

edited

Loading