Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket hang up with long running request to service #228

Open
codedge opened this issue Nov 13, 2023 · 5 comments
Open

Socket hang up with long running request to service #228

codedge opened this issue Nov 13, 2023 · 5 comments

Comments

@codedge
Copy link
Contributor

codedge commented Nov 13, 2023

Hey!

I experience a weird behaviour when having long running requests in my service connected to the gateway.
I run a PHP-based service behind the gateway, that sometimes needs up to 45s to return the response. In 90% of the cases the response is not returned by the gateway and instead I get a Socket hang up back.

I already enabled the Limits plugin and put this there

{
      "name": "limits",
      "config": {
        "max-response-time": "120s",
        "max-request-bytes": 1000000
      }
}

This removes the initial reached timeout error message, but still I have the problem that the connection between the gateway and the service somehow gets lost.

I am 100% sure, that the response is correct and is returned by the (backend) service properly. When calling the GraphQL endpoint of the backend service directly, there is no issue at all.

Does that somehow sound familiar to you or any hint where to look?

Thanks!

@codedge codedge changed the title Long running request to service Socket hang up with long running request to service Nov 13, 2023
@pkqk
Copy link
Member

pkqk commented Nov 13, 2023

Hi @codedge, can you post a copy of the response you're getting? The string Socket hang up seem to show up when I search the go stdlib and you mention later it's a reached timeout.

Does it log the request when it fails?

@codedge
Copy link
Contributor Author

codedge commented Nov 13, 2023

Sorry for the confusion.

1. Resolving the reached timeout

At first I got a reached timeout error. This error was directly visible inside the logs of Bramble. I figured out, that by using the limits plugin with the above mentioned configuration, I can get around this error.

This is solved ✔️

2. The Socket hang up problem

This error is returned by curl (or Postman) or any other GraphQL client. There is no other response.

Error in curl

2023-11-13_224641

Error in Postman

2023-11-13_225222

I tend to say this is some keepAlive/idle timeout problem.

I also found this link, which talks about the net.http.Server.WriteTimeout.

It logs the request towards the backend service, but it does not log the response coming back.

@codedge
Copy link
Contributor Author

codedge commented Nov 13, 2023

.. and I can confirm, that changing the WriteTimeout to f. ex. 60

func runHandler(ctx context.Context, wg *sync.WaitGroup, name, addr string, handler http.Handler) {
	srv := &http.Server{
		Addr:         addr,
		Handler:      handler,
		ReadTimeout:  5 * time.Second,
		WriteTimeout: 60 * time.Second,
		IdleTimeout:  120 * time.Second,
	}
        // ...
}

everything works flawlessly.

Do you think you can make this configurable via the limits plugin?

Update

I can see that there are three server instances runnning - public, private, metrics. I guess in my case only the one for public is the relevant one.

Ideally the user is able to configure this for each of these three.

I would create a PR (if you don't find time).

@pkqk
Copy link
Member

pkqk commented Nov 15, 2023

Thanks for doing the debugging @codedge, that makes sense, if the write timeout is set to 10s by default it will be closing the socket before your service has responded.

It would be useful to have bramble craft a timeout response in that situation but we can make the socket settings tuneable as well.

The public and private muxs are there so you can have plugins apply different middleware to an published endpoint and an internal endpoint, i.e. we have auth on the public mux which is exposed via ingress to our webapp and the private mux serves backend services which are inside our VPC.

@codedge
Copy link
Contributor Author

codedge commented Nov 28, 2023

Is there a release planned to include the new configuration?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants