Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Internal Server Error When Fetching More Than 60 Traces in Jaeger Using OpenSearch Backend #5825

Open
raman-goel opened this issue Aug 11, 2024 · 0 comments
Labels

Comments

@raman-goel
Copy link

raman-goel commented Aug 11, 2024

What happened?

I’m experiencing an issue when querying Jaeger for traces using an OpenSearch backend. When the query is limited to 60 traces, everything works as expected. However, when trying to fetch more than 60 traces, I receive an "Internal Server Error." Interestingly, I manually hit the OpenSearch _msearch API with 500 traces, and it returned a 200 status, indicating that OpenSearch itself is capable of handling larger queries. This suggests that the issue may be related to how Jaeger is interacting with OpenSearch.

Steps to reproduce

  1. Deploy Jaeger with an OpenSearch backend.
  2. Query for traces with a limit of 60. The query succeeds.
  3. Increase the limit to more than 60 traces.
  4. Observe the "Internal Server Error" response.

Expected behavior

Jaeger should successfully return more than 60 traces without encountering an internal server error.

Relevant log output

jaeger logs:

2024-08-11T06:52:02.699771596Z stderr F {"level":"error","ts":1723359122.6995814,"caller":"app/http_handler.go:505","msg":"HTTP handler, Internal Server Error","error":"elastic: Error 502 (Bad Gateway)","stacktrace":"github.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).handleError\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/http_handler.go:505\ngithub.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).search\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/http_handler.go:260\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngithub.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).handleFunc.traceResponseHandler.func2\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/http_handler.go:549\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.WithRouteTag.func1\n\tgo.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:256\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*middleware).serveHTTP\n\tgo.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:218\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.NewMiddleware.func1.1\n\tgo.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:74\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngithub.com/gorilla/mux.(*Router).ServeHTTP\n\tgithub.com/gorilla/[email protected]/mux.go:212\ngithub.com/jaegertracing/jaeger/cmd/query/app.createHTTPServer.additionalHeadersHandler.func4\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/additional_headers_handler.go:28\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngithub.com/jaegertracing/jaeger/cmd/query/app.createHTTPServer.CompressHandler.CompressHandlerLevel.func6\n\tgithub.com/gorilla/[email protected]/compress.go:141\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngithub.com/gorilla/handlers.recoveryHandler.ServeHTTP\n\tgithub.com/gorilla/[email protected]/recovery.go:80\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:3142\nnet/http.(*conn).serve\n\tnet/http/server.go:2044"}

Opensearch logs:

TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks, channel closed] at org.opensearch.tasks.TaskManager$CancellableTaskHolder.registerChildNode(TaskManager.java:671) at org.opensearch.tasks.TaskManager.registerChildNode(TaskManager.java:344) at org.opensearch.action.support.TransportAction.registerChildNode(TransportAction.java:78) at org.opensearch.action.support.TransportAction.execute(TransportAction.java:97) at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:112) at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:99) at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:476) at org.opensearch.client.support.AbstractClient.search(AbstractClient.java:607) at org.opensearch.action.search.TransportMultiSearchAction.executeSearch(TransportMultiSearchAction.java:180) at org.opensearch.action.search.TransportMultiSearchAction$1.handleResponse(TransportMultiSearchAction.java:203) at org.opensearch.action.search.TransportMultiSearchAction$1.onFailure(TransportMultiSearchAction.java:188) at org.opensearch.action.support.TransportAction$1.onFailure(TransportAction.java:124) at org.opensearch.core.action.ActionListener$5.onFailure(ActionListener.java:277) at org.opensearch.action.search.AbstractSearchAsyncAction.raisePhaseFailure(AbstractSearchAsyncAction.java:797) at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:770) at org.opensearch.action.search.FetchSearchPhase$1.onFailure(FetchSearchPhase.java:127) at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:54) at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:941) at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1583)

Envoy logs:

2024-08-11T06:52:10.495976131Z stdout F [2024-08-11T06:51:59.504Z] "GET /_msearch?rest_total_hits_as_int=true HTTP/1.1" 502 UPE 165089 87 3194 - "-" "elastic/6.2.37 (linux-amd64)" "0b9806da-fc57-4572-b860-3c31a31b922a"


### Screenshot

_No response_

### Additional context

_No response_

### Jaeger backend version

v1.58.0

### SDK

_No response_

### Pipeline

_No response_

### Stogage backend

OpenSearch 2.15

### Operating system

_No response_

### Deployment model

_No response_

### Deployment configs

_No response_
@raman-goel raman-goel added the bug label Aug 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant