MetaProViz::Pool_Estimation - Data normality #63

ChristinaSchmidt1 · 2023-09-15T08:14:50Z

As I was looking into data normality and SD in a different context, I realised that this might be something we need to dicuss in regards to the CV caculation of the pool samples.

Since the CV depends on the SD, we shouldensure that the data is normally distributed and otherwise eisther return a warning, use something else like interquartile range or try to enforce data normality by log transformation (which wouldnt be my favorite choice).

I personally would use the shapiro test on the pool samples. Here we will only have one condition (="Pool") and perform the test for each metabolite. We can return a warning/message about the data distribution as in the DMA function and let the user know the importance of this in regards of CV calculation. We can even consider to add the results into the output DF. Given that this is the same code as in the DMA function I would make the shapiro test into a helper function, so that we can use the helper function in both, DMA and Pool_Estimation.

For the time being, I will add a comment into the vignette, so that the user is informed about the importance of data normality

dprymidis · 2023-09-28T08:56:01Z

Here we said to make Shapiro test a helper function in preprocessing and add qqplots=T/F and call it in DMA with qqplots=F.

ChristinaSchmidt1 · 2024-01-30T10:09:11Z

Hi Dimitrios,
I am just going trough the open issues and I wanted to check if with the helper function this was completely fixed or if something else needs to be done.

dprymidis · 2024-01-30T11:10:42Z

Hello!
This is partially done. The shapiro is a separate function but its still in the DMA script. The qqplots=T/F is added, but there are still some parameters which need to be adjusted (like the STAT_pval) for using it in the preprocessing vs DMA.

ChristinaSchmidt1 · 2024-01-30T12:07:18Z

Thanks for the quick response :)

Ok, what I thought initially is just to check data normality with Shapiro in the pool estimation and if it is not normally distributed to flank this metabolite as the interpretation of the CV will be impacted.

I did not plan to produce any plots, but rather add an additional column and the message/warning.

Would you have done something additional or would that be fine (Just checking so I do not miss anything)?

dprymidis · 2024-01-30T13:04:37Z

No, what you have in mind is correct. I just mentioned what info I have on the matter. The qqplot functionality is there but its not nessesary for preprocessing as it would produce many plots which no one really would check.

ChristinaSchmidt1 self-assigned this Sep 15, 2023

ChristinaSchmidt1 added the documentation Improvements or additions to documentation label Sep 15, 2023

ChristinaSchmidt1 added a commit that referenced this issue Sep 15, 2023

Added comment for CV as mentioned in the issue #63

b762ddb

ChristinaSchmidt1 added the Intermediate priority Implementation needs to be prioritised label Sep 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MetaProViz::Pool_Estimation - Data normality #63

MetaProViz::Pool_Estimation - Data normality #63

ChristinaSchmidt1 commented Sep 15, 2023

dprymidis commented Sep 28, 2023

ChristinaSchmidt1 commented Jan 30, 2024

dprymidis commented Jan 30, 2024

ChristinaSchmidt1 commented Jan 30, 2024

dprymidis commented Jan 30, 2024

MetaProViz::Pool_Estimation - Data normality #63

MetaProViz::Pool_Estimation - Data normality #63

Comments

ChristinaSchmidt1 commented Sep 15, 2023

dprymidis commented Sep 28, 2023

ChristinaSchmidt1 commented Jan 30, 2024

dprymidis commented Jan 30, 2024

ChristinaSchmidt1 commented Jan 30, 2024

dprymidis commented Jan 30, 2024