You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Flaky tests, such as test_conditional_prob_inf_given_vl_dist, are due to their non-deterministic nature. Monte Carlo simulations, such as those used in the baseline_exposure_model fixture and test_conditional_prob_inf_given_vl_dist, fail because they're probabilistic. For modelling this is great - but for unit tests its not 😅 It's a nice feeling to have a 🟢 CI.
All you'd have to do to fix this test is set the random seed, like np.random.seed(42). However, this is misleading if the goal (as I suspect) is to ensure the models accuracy within a certain tolerance.
If the goal is to measure the models accuracy, what if you removed the 3x retry and instead ran the model 10 times, gathering the results each time? Then you could do a statistical analysis against the mean/median/standard deviation/percentiles etc.
Another alternative is to change the fixed 0.002 tolerance. What if you calculated the tolerance based on the number of runs? Could an absolute tolerance of 0.002 be wishful thinking?
It could also help to log model deviations and store them with a timestamp. Then you can monitor deviations over time.
I'm happy to fix this test (and get that CI ✅). Its a cool project. Just let me know what the expected behavior is & how I can help out.
The text was updated successfully, but these errors were encountered:
Flaky tests, such as test_conditional_prob_inf_given_vl_dist, are due to their non-deterministic nature. Monte Carlo simulations, such as those used in the
baseline_exposure_model
fixture andtest_conditional_prob_inf_given_vl_dist
, fail because they're probabilistic. For modelling this is great - but for unit tests its not 😅 It's a nice feeling to have a 🟢 CI.All you'd have to do to fix this test is set the random seed, like
np.random.seed(42)
. However, this is misleading if the goal (as I suspect) is to ensure the models accuracy within a certain tolerance.If the goal is to measure the models accuracy, what if you removed the 3x retry and instead ran the model 10 times, gathering the results each time? Then you could do a statistical analysis against the mean/median/standard deviation/percentiles etc.
Another alternative is to change the fixed
0.002
tolerance. What if you calculated the tolerance based on the number of runs? Could an absolute tolerance of0.002
be wishful thinking?It could also help to log model deviations and store them with a timestamp. Then you can monitor deviations over time.
I'm happy to fix this test (and get that CI ✅). Its a cool project. Just let me know what the expected behavior is & how I can help out.
The text was updated successfully, but these errors were encountered: