-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NetworkPolicy xDS stream stalls for ~20 seconds before finishing updates on all worker threads #958
Comments
Full agent logs: logs-cilium-x8hcj-cilium-agent-20241003-071420.log |
The affected worker thread
It is curious that the policy update starts progressing on the same exact millisecond with a closing event on connection |
…pdate policies Add a warning log if worker threads take longer than 100ms to update their network policy maps. This is to help identify cases like #958, where worker thread [20] took almost 20 seconds to start running the update. Related: #958 Signed-off-by: Jarno Rajahalme <[email protected]>
…pdate policies Add a warning log if worker threads take longer than 100ms to update their network policy maps. This is to help identify cases like #958, where worker thread [20] took almost 20 seconds to start running the update. Related: #958 Signed-off-by: Jarno Rajahalme <[email protected]>
…pdate policies Add a warning log if worker threads take longer than 100ms to update their network policy maps. This is to help identify cases like #958, where worker thread [20] took almost 20 seconds to start running the update. Related: #958 Signed-off-by: Jarno Rajahalme <[email protected]>
…pdate policies Add a warning log if worker threads take longer than 100ms to update their network policy maps. This is to help identify cases like #958, where worker thread [20] took almost 20 seconds to start running the update. Related: #958 Signed-off-by: Jarno Rajahalme <[email protected]>
…pdate policies [ upstream commit 769db97 ] Add a warning log if worker threads take longer than 100ms to update their network policy maps. This is to help identify cases like #958, where worker thread [20] took almost 20 seconds to start running the update. Related: #958 Signed-off-by: Jarno Rajahalme <[email protected]>
…pdate policies [ upstream commit 769db97 ] Add a warning log if worker threads take longer than 100ms to update their network policy maps. This is to help identify cases like #958, where worker thread [20] took almost 20 seconds to start running the update. Related: #958 Signed-off-by: Jarno Rajahalme <[email protected]>
NPDS version 247 is received at 07:14:02, some worker threads update immediately after, but other only at 07:14:22, causing Cilium Agent to not be able to get ACKs on policy updates in time (100ms timeout is much less than 20 seconds):
from
logs-cilium-envoy-kgzx9-cilium-envoy-20241003-071420.log
:Agent logs (
logs-cilium-x8hcj-cilium-agent-20241003-071420.log
):The text was updated successfully, but these errors were encountered: