-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MetaProViz::Pool_Estimation - Data normality #63
Comments
Here we said to make Shapiro test a helper function in preprocessing and add qqplots=T/F and call it in DMA with qqplots=F. |
Hi Dimitrios, |
Hello! |
Thanks for the quick response :) Ok, what I thought initially is just to check data normality with Shapiro in the pool estimation and if it is not normally distributed to flank this metabolite as the interpretation of the CV will be impacted. I did not plan to produce any plots, but rather add an additional column and the message/warning. Would you have done something additional or would that be fine (Just checking so I do not miss anything)? |
No, what you have in mind is correct. I just mentioned what info I have on the matter. The qqplot functionality is there but its not nessesary for preprocessing as it would produce many plots which no one really would check. |
As I was looking into data normality and SD in a different context, I realised that this might be something we need to dicuss in regards to the CV caculation of the pool samples.
Since the CV depends on the SD, we shouldensure that the data is normally distributed and otherwise eisther return a warning, use something else like interquartile range or try to enforce data normality by log transformation (which wouldnt be my favorite choice).
I personally would use the shapiro test on the pool samples. Here we will only have one condition (="Pool") and perform the test for each metabolite. We can return a warning/message about the data distribution as in the DMA function and let the user know the importance of this in regards of CV calculation. We can even consider to add the results into the output DF. Given that this is the same code as in the DMA function I would make the shapiro test into a helper function, so that we can use the helper function in both, DMA and Pool_Estimation.
For the time being, I will add a comment into the vignette, so that the user is informed about the importance of data normality
The text was updated successfully, but these errors were encountered: