You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m experiencing an issue when querying Jaeger for traces using an OpenSearch backend. When the query is limited to 60 traces, everything works as expected. However, when trying to fetch more than 60 traces, I receive an "Internal Server Error." Interestingly, I manually hit the OpenSearch _msearch API with 500 traces, and it returned a 200 status, indicating that OpenSearch itself is capable of handling larger queries. This suggests that the issue may be related to how Jaeger is interacting with OpenSearch.
Steps to reproduce
Deploy Jaeger with an OpenSearch backend.
Query for traces with a limit of 60. The query succeeds.
Increase the limit to more than 60 traces.
Observe the "Internal Server Error" response.
Expected behavior
Jaeger should successfully return more than 60 traces without encountering an internal server error.
Relevant log output
jaeger logs:
2024-08-11T06:52:02.699771596Z stderr F {"level":"error","ts":1723359122.6995814,"caller":"app/http_handler.go:505","msg":"HTTP handler, Internal Server Error","error":"elastic: Error 502 (Bad Gateway)","stacktrace":"github.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).handleError\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/http_handler.go:505\ngithub.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).search\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/http_handler.go:260\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngithub.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).handleFunc.traceResponseHandler.func2\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/http_handler.go:549\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.WithRouteTag.func1\n\tgo.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:256\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*middleware).serveHTTP\n\tgo.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:218\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.NewMiddleware.func1.1\n\tgo.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:74\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngithub.com/gorilla/mux.(*Router).ServeHTTP\n\tgithub.com/gorilla/[email protected]/mux.go:212\ngithub.com/jaegertracing/jaeger/cmd/query/app.createHTTPServer.additionalHeadersHandler.func4\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/additional_headers_handler.go:28\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngithub.com/jaegertracing/jaeger/cmd/query/app.createHTTPServer.CompressHandler.CompressHandlerLevel.func6\n\tgithub.com/gorilla/[email protected]/compress.go:141\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngithub.com/gorilla/handlers.recoveryHandler.ServeHTTP\n\tgithub.com/gorilla/[email protected]/recovery.go:80\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:3142\nnet/http.(*conn).serve\n\tnet/http/server.go:2044"}
Opensearch logs:
TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks, channel closed] at org.opensearch.tasks.TaskManager$CancellableTaskHolder.registerChildNode(TaskManager.java:671) at org.opensearch.tasks.TaskManager.registerChildNode(TaskManager.java:344) at org.opensearch.action.support.TransportAction.registerChildNode(TransportAction.java:78) at org.opensearch.action.support.TransportAction.execute(TransportAction.java:97) at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:112) at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:99) at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:476) at org.opensearch.client.support.AbstractClient.search(AbstractClient.java:607) at org.opensearch.action.search.TransportMultiSearchAction.executeSearch(TransportMultiSearchAction.java:180) at org.opensearch.action.search.TransportMultiSearchAction$1.handleResponse(TransportMultiSearchAction.java:203) at org.opensearch.action.search.TransportMultiSearchAction$1.onFailure(TransportMultiSearchAction.java:188) at org.opensearch.action.support.TransportAction$1.onFailure(TransportAction.java:124) at org.opensearch.core.action.ActionListener$5.onFailure(ActionListener.java:277) at org.opensearch.action.search.AbstractSearchAsyncAction.raisePhaseFailure(AbstractSearchAsyncAction.java:797) at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:770) at org.opensearch.action.search.FetchSearchPhase$1.onFailure(FetchSearchPhase.java:127) at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:54) at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:941) at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1583)
Envoy logs:
2024-08-11T06:52:10.495976131Z stdout F [2024-08-11T06:51:59.504Z] "GET /_msearch?rest_total_hits_as_int=true HTTP/1.1" 502 UPE 165089 87 3194 - "-""elastic/6.2.37 (linux-amd64)""0b9806da-fc57-4572-b860-3c31a31b922a"
What happened?
I’m experiencing an issue when querying Jaeger for traces using an OpenSearch backend. When the query is limited to 60 traces, everything works as expected. However, when trying to fetch more than 60 traces, I receive an "Internal Server Error." Interestingly, I manually hit the OpenSearch _msearch API with 500 traces, and it returned a 200 status, indicating that OpenSearch itself is capable of handling larger queries. This suggests that the issue may be related to how Jaeger is interacting with OpenSearch.
Steps to reproduce
Expected behavior
Jaeger should successfully return more than 60 traces without encountering an internal server error.
Relevant log output
jaeger logs:
Opensearch logs:
Envoy logs:
The text was updated successfully, but these errors were encountered: