Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: categorical association dataframe output #46

Open
aron0093 opened this issue Sep 9, 2024 · 3 comments
Open

Fix: categorical association dataframe output #46

aron0093 opened this issue Sep 9, 2024 · 3 comments
Assignees
Labels
bug Something isn't working dashboard enhancement New feature or request evaluations

Comments

@aron0093
Copy link
Collaborator

aron0093 commented Sep 9, 2024

  1. The results dataframe should contain "program_name" column. Currently it is the index which is inconsistent from the output from other evaluations.
  2. The p-values are automatically rounded off. Explicitly setting dtype should fix this.
@aron0093 aron0093 self-assigned this Sep 9, 2024
@aron0093 aron0093 added the bug Something isn't working label Sep 9, 2024
@adamklie
Copy link
Collaborator

After discussion, we are thinking of moving to a different strategy for the categorical association evaluation than is currently implemented.

Evaluation side

By default, we will treat each level of the category separately and run the following procedure:

  1. Binarize the category across cells (1 for category==level, 0 if category!=level) --> can probably use Narges's function here
  2. For each program, calculate a Pearson correlation between this binary vector and the program scores. Adjust the pvals using https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.fdrcorrection.html
  3. For each program, calculate a log2FC of (mean program scores of category==level)/(mean program scores of category!=level)

We will then combine these results across levels into a single wide dataframe:

program_name batch_level_1_pearsonr_stat batch_level_1_pearsonr_pval batch_level_1_pearsonr_adj_pval batch_level_1_log2FC batch_level_2_pearsonr_stat batch_level_2_pearsonr_pval batch_level_2_pearsonr_adj_pval batch_level_2_log2FC
0 0.82 0.004 0.012 1.4 0.75 0.008 0.02 1.3
1 0.65 0.03 0.09 0.7 0.58 0.035 0.12 0.8
2 0.90 0.002 0.006 1.9 0.81 0.005 0.01 1.7

We will then save this as a txt file for each categorical_key in the config with the following file name: {prog_key}_{categorical_key}_association_results.txt

  • prog_key comes from the config
  • categorical_key comes from the config
  • must end in association_results.txt suffix
    Example: cNMF_batch_association_results.txt

Dashapp side

  1. The dashapp will parse the output directory for all files with this suffix and will parse the appropriate categorical_key (which it can match to the eval config it is also passed in)
  2. It will then plot a level x program by heatmap of the correlations (update to plotly)
  3. It will also compute an aggregate statistic across levels (e.g. max) and generate a plot that shows the distribution of the statistic (without pval for now*)
  • This is due to the inflation of pvals seen when running a test on this statistic. Also gaven't fully decided on whether to rank-order this or not.

@adamklie
Copy link
Collaborator

I came up with a really hacky solution to this. I really couldn't figure out an easy way to implement this as an option with the current codebase. Basically, it looks for the combination of using test="pearsonr" and mode=one_vs_all and runs a separate set of code for that one case.

Obviously this won't do for the future, but it was the only way I could manage to get it to work while still preserving most of the old functionality.

TL;DR hopefully this works, but we will likely need to revisit

@adamklie
Copy link
Collaborator

adamklie commented Sep 11, 2024

  • Hacked the implementation
  • Update the eval pipeline
  • Reconfigure the dashapp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dashboard enhancement New feature or request evaluations
Projects
None yet
Development

No branches or pull requests

2 participants