Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't crash if /metadata takes too long #581

Open
cjllanwarne opened this issue Mar 8, 2019 · 6 comments
Open

Don't crash if /metadata takes too long #581

cjllanwarne opened this issue Mar 8, 2019 · 6 comments

Comments

@cjllanwarne
Copy link
Collaborator

cjllanwarne commented Mar 8, 2019

Sometimes it will just take too long to get workflow metadata. We have some control over that, but inevitably sometimes the process will time out.

It would be great for Job Manager to not look like the UI itself crashed if the backend request for metadata times out, but rather provide a simple error page stating something like "Unable to fetch metadata in time for this workflow"

I can foresee this happening fairly frequently when people try to load metadata for very large workflows, and a meaningful error page will be a much friendlier user experience than the generic error page.

@ruchim
Copy link

ruchim commented Mar 8, 2019

Suggestion:
Instead of using terminology like "metadata" -- we can just use something like "workflow details"

Unable to fetch workflow details in time.

It would also help to add an actionable component to this message, like

Please file this issue [here](https://github.com/DataBiosphere/job-manager/issues/new) and include the URL which failed to load in time in the report, so we can address this issue.

@rsasch
Copy link
Contributor

rsasch commented Mar 8, 2019

@ruchim But if the issue is Cromwell not being able to return data, how is that a Job Manager issue?

@ruchim
Copy link

ruchim commented Mar 8, 2019

The spirit of this issue is to improve the error, and get reports of when this happens, and eventually the issue will be evaluated and addressed by the Cromwell team. It's more that we want people to not have to know about Cromwell and go to its repo when dealing with this failure--as they shouldn't have to care about any JM backends.

@cjllanwarne
Copy link
Collaborator Author

I suspect 90% of these such reports we could "triage" by moving directly across to the Cromwell repo and working on from there

@cjllanwarne
Copy link
Collaborator Author

Although as I'm typing that... to follow this argument through to its conclusion, maybe we should really be asking people to make Terra tickets (wherever they go) and allow the triage to happen another level up

@ruchim
Copy link

ruchim commented Mar 8, 2019

@cjllanwarne What about people running this outside of Terra? I guess the question is, do we want to know how many times this happen in general, vs only in Terra. I'd like to know in general, but if you and @rsasch would prefer to focus on Terra requests, we can modify the issue tracker accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants