Specifying uncertainty in input parameters #269

nnoll · 2020-03-28T16:34:27Z

nnoll
Mar 28, 2020
Maintainer

Hey all,

This is a topic that needs to be addressed ASAP. As our model is starting to see more usage, the fully deterministic approach to forecasting is inadequate at best. We need to start forecasting a range of scenarios given distributions of input parameters and reporting the mean and standard deviation of the results (or confidence intervals). While there is also variance associated to fundamental stochasticity of the dynamics, I think this is less important for now.

We need a way to input our degree of certainty for a given input - i.e. R0 is known to be 3.2 with a 5% error on it. This is a potential UI nightmare if every parameter has another box associated with an error bar. Can we brainstorm a better way that this is input by the user -> Change to a drag bar for an interval of possible values? I view this as an incredibly important next step for the website.

The alternative is we don't allow error/uncertainty to be toggled by the user and just put error bars on the parameters ourselves to report confidence intervals.

johnurbanik · 2020-03-30T03:54:22Z

johnurbanik
Mar 30, 2020

Hey Nick. Fully support the idea of adding uncertainty to the model, but my belief is that the stochasticity of the model is potentially more important than parameter variations. I saw that you removed the poisson sampling based code from the codebase and closed #182, which I was a bit disappointed by.

I welcome you to model it yourself, but there is ample evidence that modeling the heterogeneity and stochasticity of networks leads to completely different emergent behavior than the deterministic, homogenous variants. I've done a bit of lit review here: https://github.com/understand-covid/proposal/blob/master/model/Prior%20Art.md

If you end up agreeing, my argument is that adding parameter uncertainty actually leads to the potential to use the model more irresponsibly (purporting that something is a worst case scenario, for example). I'd urge that if you do end up implementing it, some additional language should be added to the modal with the disclaimers.

If you end up going forward with some sort of sensitivity analysis, I think the bigger implementation problem is interactions between variables; you'll likely have to start doing MCMC, stochastic calculus, or approaching from a bayesian perspective, all of which will be computationally expensive in the browser. From a UI implementation perspective, I think a toggle that changes each parameter to have two inputs (for a min and max) would be reasonable UX.

0 replies

nnoll · 2020-03-30T15:55:10Z

nnoll
Mar 30, 2020
Maintainer Author

Hey John,

I fully agree that hetereogenity of the underlying dynamics is an important effect that the model completely ignores right now. However, if one wants to capture the correct effects of stochasticity here, adding a Langevin term (i.e. Poisson sampling) isn't sufficient and will inaccurately capture the statistics of the stochastic process you are trying to simulate. Things one needs to worry about at this level of granularity are spatial processes, social structure, etc. Parameterization of everything becomes difficult so we opted, for the time being, for an ODE approach that hopefully approximates the mean of such a process.

As you allude to in your comment, to do this properly requires either an MCMC or an agent-based model. I would love to, in the future, have this type of model running in the background in a back-end for users. However, within the context of the the model as it is now, I don't think adding Poisson error bars is capturing at all the relevant stochastic dynamics. So think of this as a future implementation as time permits.

0 replies

johnurbanik · 2020-03-30T18:14:18Z

johnurbanik
Mar 30, 2020

Nick,

Thanks for the reply. Apologies that I didn't communicate clearly - I guess that is the danger of very late night posts. I was not trying to suggest that using a poisson prior was the right approach, but instead trying to suggest that moving in the direction of treating things as a SDE was the right idea. I think that Poisson priors are likely bad prior, given that most of the empirical data has different mean and variance.

However, let's do a thought experiment for a moment. If you agree that a SDE is the more accurate way of modeling the phenomena at hand, how do we define the current formulation as an SDE? The answer lies pretty clearly in the codebase already - when you removed the poisson sampling, you left the sample() method in place. The 'sampling' function you used is the identity function. Another way of looking at this is that it is a multiplicative Langevin term where the random variable is defined by a discrete distribution - namely a singular Kroenecker delta. Under this formulation, the use of deterministic variables is 'worse' than using a poisson prior at capturing system dynamics.

While I don't like the idea of doing Ito calculus over breakfast, the truth is that almost every system that we model is more appropriately modeled with a set of SDEs (and in most non-trivial systems, those happen to be stochastic convolutional equations). ODEs can be seen simply as a more compact way of representing SDEs with extremely restrictive priors on the random variables - a delta function - so as to free up computational resources. In places where variance is very low with respect to the mean, this can be a reasonable approximation. In places where that is not true, my belief is that relaxing to even a weakly informative prior is always an improvement.

That being said, my understanding of your proposal is that you would add stochasticity to the initial conditions, and then keep those variables fixed within each run - essentially you'd be changing the delta function out for another distribution at time t0 only. If the belief is that the variables are in fact near-zero variance - but we simply don't know the exact value - then this is a reasonable way of modeling things. But it sounds like you (like me) believe that social structure and individual level variation is crucial to understanding the emergent dynamics of this highly heterogenous system.

I believe that the injection of stochasticity at t0 to the input parameters is certainly an improvement on the model. However, my concern is that adding error bars is potentially misleading; someone could interpret it as a fully stochastic model and interpret that the upper bound is actually a 'worst case' scenario, when stochasticity of network dynamics may completely change the bounds. At least with the deterministic ODE solution, that interpretation is not possible (though it has plenty of opportunity to be misunderstood in other ways).

Even more concerning is the fact that SDEs often have different emergent dynamics than their ODE counterparts, so even shape of the curve could be radically different, not just the bounds. Lets take a more concrete example of how 'fat tails' emerge in reality. A 'hub' based network graph, where a few nodes have extremely high connectivity, but then their child nodes (that are not also hubs) have extremely low degree but a long serial chain of connections, will have very different dynamics than a homogenous network in terms of how an infection (especially one with high latency) will spread. In the firewall world, this is why most network topologies have moved to a multi-tier model. In the epidemiological virus world, this is why viruses often take longer to reach rural areas, but can spread quickly within those rural areas once the virus makes its way there because of the high degree within that subgraph. This is the reason why, with COVID-19, one of the first enacted policies was to ban large gatherings. Changing the network topology through policy dramatically changes the system dynamics.

As such, my belief is that investing in moving into the SDE world (with at least weakly informative priors and then moving toward the empirical distributions as data becomes available) is a worthwhile investment even in the short term. I understand, however, that it may not be in the scope of this project and that there may be higher priorities on your end.

3 replies

nnoll Mar 30, 2020
Maintainer Author

Hey John,

Thanks for your thoughtful reply. I''ll preface my reply to say that we actually agree to a fundamental limitation of our approach. I ideally would run a range of stochastic simulations and report the mean and standard deviation of the result through time. However, in our current computational approach this is simply not feasible as we are expecting users to run this on their machines in the browser. This strongly limits our ability to simulate and sample a sophisticated stochastic model. If/when we implement a back-end, this arithmetic of cost/benefit will change and you will likely see us allow for simulation of a more "correct" model.

However, I would not say what we do is 'worse' than simulating a Poisson process, rather more conservative! Error bars have a lot of information content to them - if we publish error bars that simply capture number fluctuations, then we are saying this is what we believe the true variance of the process is. I don't think that at all. I feel much more confident only reporting the mean for now and calculating the variance subject to a more technical calculation - see Discussion #279.

I also share your concern with parameterizing the underlying social network. If we did this, we could more accurately simulate different mitigation strategies as your allude to. Longer term, it would allow one to assess different mitigation relaxation strategies. One would want to simulate this on a scale-free network with variable power law connectivity distributions to see their effects. We could either compute the sampling kernel this should have to simulate this in our mean-field manner now, or do the full stochastic simulation. But again, I don't think this is best done in a browser.

I'll end by saying I've done some work on a more sophisticated stochastic simulation that could plug into our front-end but it's early days. And it's unclear if its fast enough to be usable.

kapawlak Mar 30, 2020

You can also approximate the magnitude of stochastic fluctuations theoretically and shade the 1 sd area, as I've suggested in another thread. The mean field equations will be predictive enough in single cities/counties for 1 or 2 month forecasts, which is the intent of this tool (and given the unknown quality of sampling data that leads to volatile parameter estimates, I wouldn't trust any sim that forecasts beyond that anyway).

The effects that result in large systematic errors and qualitative behavior change only contribute to the large sample, long time limit. Someone better at field theoretic normalization than I could give an estimate of this time/population scale off hand.

I've talked with @nnoll about stochasticity since the first iteration of this model and it is not something being ignored. For the scale of this simulation and implicit quality of data it's not very critical-- corrections are actually generated by updating parameters as new info comes out. However, if one were to worry about spatial distributions of individuals and diffusion of the epidemic then these concerns might lead to crucial discrepancies. Unfortunately, that's beyond the scope of a browser-based case surge estimation tool

nnoll Mar 30, 2020
Maintainer Author

I would also add that this exact computation is where pure theorists could help this project!

rneher · 2020-03-31T13:58:27Z

rneher
Mar 31, 2020
Maintainer

I'd love to have some sort of a range input for parameters and then simply sample a couple of combinations from the hypercube of ranges. Things like this for R0 or mitigation measures would go a long way:

1 reply

nnoll Mar 31, 2020
Maintainer Author

I agree, this is probably the best way forward...

cbird808 · 2020-04-24T17:15:02Z

cbird808
Apr 24, 2020

So, given that the model now gives upper and lower bounds, can you describe what they represent?

3 replies

nnoll Apr 24, 2020
Maintainer Author

They correspond to the range of scenarios generated from the range of input parameters given by the user (currently for R0 and mitigation efficacy where uncertainty matters most). Thus the user specifies a range of possible values, we sample uniformally from these ranges and run a set of simulations and report the 20, 50, and 80 percentiles for all variables (e.g. 20 percentile implies 20% of our simulation had at least this value at a given time).

cbird808 Apr 24, 2020

thanks @nnoll!

have you updated since yesterday? I'm getting upper and lower bound, but no percentiles. So based upon your answer above, it seems that I'm getting the range of all outcomes in 100 random draws from a uniform distribution on each variable with the range I've provided. Yes?

ivan-aksamentov Apr 25, 2020
Maintainer

@cbird808 Just to extend on Nick's message, we also have a new parameter caled "Number of runs" (numberStochasticRuns in the code), which allows you to set how many runs (draws) you want. This is a tradeoff between accuracy of sampling vs runtime performance (note that one run is on order of hundreds of milliseconds, but it adds up pretty quickly)

I assume you are either using master git branch or master.covid19-scenarios.org deployment.

We are still not sure how to better represent the percentiles visually, because the tooltip space is kinda limited. Also, still thinking about what to show in case of small number of runs - hard to get proper percentiles from 10 samples. But increasing the number of samples also increases run time and decreases the ability to quickly iterate and experiment on scenarios.

Nick also has another PR on the topic: #613 (temporary deployment link), feel free to experiment and share your thoughts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specifying uncertainty in input parameters #269

{{title}}

Replies: 5 comments 7 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Specifying uncertainty in input parameters #269

nnoll Mar 28, 2020 Maintainer

Replies: 5 comments · 7 replies

johnurbanik Mar 30, 2020

nnoll Mar 30, 2020 Maintainer Author

johnurbanik Mar 30, 2020

nnoll Mar 30, 2020 Maintainer Author

kapawlak Mar 30, 2020

nnoll Mar 30, 2020 Maintainer Author

rneher Mar 31, 2020 Maintainer

nnoll Mar 31, 2020 Maintainer Author

cbird808 Apr 24, 2020

nnoll Apr 24, 2020 Maintainer Author

cbird808 Apr 24, 2020

ivan-aksamentov Apr 25, 2020 Maintainer

nnoll
Mar 28, 2020
Maintainer

Replies: 5 comments 7 replies

johnurbanik
Mar 30, 2020

nnoll
Mar 30, 2020
Maintainer Author

johnurbanik
Mar 30, 2020

nnoll Mar 30, 2020
Maintainer Author

nnoll Mar 30, 2020
Maintainer Author

rneher
Mar 31, 2020
Maintainer

nnoll Mar 31, 2020
Maintainer Author

cbird808
Apr 24, 2020

nnoll Apr 24, 2020
Maintainer Author

ivan-aksamentov Apr 25, 2020
Maintainer