Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MetaProViz::Pool_Estimation - Data normality #63

Open
ChristinaSchmidt1 opened this issue Sep 15, 2023 · 5 comments
Open

MetaProViz::Pool_Estimation - Data normality #63

ChristinaSchmidt1 opened this issue Sep 15, 2023 · 5 comments
Assignees
Labels
documentation Improvements or additions to documentation Intermediate priority Implementation needs to be prioritised

Comments

@ChristinaSchmidt1
Copy link
Member

As I was looking into data normality and SD in a different context, I realised that this might be something we need to dicuss in regards to the CV caculation of the pool samples.

Since the CV depends on the SD, we shouldensure that the data is normally distributed and otherwise eisther return a warning, use something else like interquartile range or try to enforce data normality by log transformation (which wouldnt be my favorite choice).

I personally would use the shapiro test on the pool samples. Here we will only have one condition (="Pool") and perform the test for each metabolite. We can return a warning/message about the data distribution as in the DMA function and let the user know the importance of this in regards of CV calculation. We can even consider to add the results into the output DF. Given that this is the same code as in the DMA function I would make the shapiro test into a helper function, so that we can use the helper function in both, DMA and Pool_Estimation.

For the time being, I will add a comment into the vignette, so that the user is informed about the importance of data normality

@ChristinaSchmidt1 ChristinaSchmidt1 self-assigned this Sep 15, 2023
@ChristinaSchmidt1 ChristinaSchmidt1 added the documentation Improvements or additions to documentation label Sep 15, 2023
@ChristinaSchmidt1 ChristinaSchmidt1 added the Intermediate priority Implementation needs to be prioritised label Sep 27, 2023
@dprymidis
Copy link
Contributor

Here we said to make Shapiro test a helper function in preprocessing and add qqplots=T/F and call it in DMA with qqplots=F.

@ChristinaSchmidt1
Copy link
Member Author

Hi Dimitrios,
I am just going trough the open issues and I wanted to check if with the helper function this was completely fixed or if something else needs to be done.

@dprymidis
Copy link
Contributor

Hello!
This is partially done. The shapiro is a separate function but its still in the DMA script. The qqplots=T/F is added, but there are still some parameters which need to be adjusted (like the STAT_pval) for using it in the preprocessing vs DMA.

@ChristinaSchmidt1
Copy link
Member Author

Thanks for the quick response :)

Ok, what I thought initially is just to check data normality with Shapiro in the pool estimation and if it is not normally distributed to flank this metabolite as the interpretation of the CV will be impacted.

I did not plan to produce any plots, but rather add an additional column and the message/warning.

Would you have done something additional or would that be fine (Just checking so I do not miss anything)?

@dprymidis
Copy link
Contributor

No, what you have in mind is correct. I just mentioned what info I have on the matter. The qqplot functionality is there but its not nessesary for preprocessing as it would produce many plots which no one really would check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation Intermediate priority Implementation needs to be prioritised
Projects
None yet
Development

No branches or pull requests

2 participants