-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NETOBSERV-1994: remove unneeded bpf map update calls #466
base: main
Are you sure you want to change the base?
Conversation
/ok-to-test |
New image: It will expire after two weeks. To deploy this build, run from the operator repo, assuming the operator is running: USER=netobserv VERSION=9868e0f make set-agent-image |
bpf/flows.c
Outdated
@@ -152,7 +139,8 @@ static inline int flow_monitor(struct __sk_buff *skb, u8 direction) { | |||
if (trace_messages) { | |||
bpf_printk("error adding flow %d\n", ret); | |||
} | |||
|
|||
// Update global counter for hashmap update errors | |||
increase_counter(HASHMAP_FLOWS_DROPPED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shouldn't report drops here, since the flows are still sent via RB, that would be misleading.
if the intent is to track hashmap update errors, we can already know that by looking at the RB usage metric that already exists (ie. if RB is used it's because there was a failed update)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok yeah wanted to know when hmap not used and we do rb but I agree its duplicate I will remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
(I didn't find noticeable improvement in CPU though ...) |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #466 +/- ##
=======================================
Coverage 29.56% 29.57%
=======================================
Files 50 50
Lines 4867 4866 -1
=======================================
Hits 1439 1439
+ Misses 3322 3321 -1
Partials 106 106
Flags with carried forward coverage won't be shown. Click here to find out more.
|
a3250fd
to
873f357
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
873f357
to
3e7f125
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits, but also a more general comment: you're not handling failure of bpf_map_update(...,BPF_NOEXIST), which means that if two threads end up doing this concurrently, one of the creation attempts will be lost.
This is no worse than before the patch, though (before the patch, one attempt would silently overwrite the other, now one will just fail). But, well, it's possible to do better, so I would suggest handling the errors where possible :)
3e7f125
to
ab8e306
Compare
are u referring to DNS update or this is more of general comment and u need better handling for map update errors ? can u expand more if that is the case ? |
ab8e306
to
d88371b
Compare
/ok-to-test |
New image: It will expire after two weeks. To deploy this build, run from the operator repo, assuming the operator is running: USER=netobserv VERSION=2920e5b make set-agent-image |
d88371b
to
7c1855e
Compare
/ok-to-test |
New image: It will expire after two weeks. To deploy this build, run from the operator repo, assuming the operator is running: USER=netobserv VERSION=4a20325 make set-agent-image |
7c1855e
to
27be1cb
Compare
/ok-to-test |
(flow_metrics *)bpf_map_lookup_elem(&aggregated_flows, &id); | ||
if (aggregate_flow != NULL) { | ||
update_existing_flow(aggregate_flow, &pkt, dns_errno, len); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could keep the HASHMAP_FLOWS_DROPPED counter and update it in an "else" branch here. I would expect this to basically never happen, but just to be on the safe side, and since you already have that counter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that is why I felt its less useful but I can reuse it when eexist fail to lkup
// In this case, we send the single-packet flow via ringbuffer as in the worst case we can have | ||
// a repeated INTERSECTION of flows (different flows aggregating different packets), | ||
// which can be re-aggregated at userpace. | ||
// other possible values https://chromium.googlesource.com/chromiumos/docs/+/master/constants/errnos.md | ||
if (trace_messages) { | ||
bpf_printk("error adding flow %d\n", ret); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not personally consider EEXIST an error that needs logging (here, or in other places you have trace logging). Only if the subsequent lookup then fails to return an entry (see comment below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the thing is map update can fail for many reasons ebusy or e2big so i need to capture those as well , now if eexist error will be happening alot I can filter out the trace for exist error and with counter back we will know when we drop flows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so that's basically what I meant: Use the trace message only for errors other than EEXIST (just move the log statement a bit further down where you're handling those anyway)
New image: It will expire after two weeks. To deploy this build, run from the operator repo, assuming the operator is running: USER=netobserv VERSION=6042039 make set-agent-image |
Signed-off-by: Mohamed Mahmoud <[email protected]>
27be1cb
to
58cbf78
Compare
New image: It will expire after two weeks. To deploy this build, run from the operator repo, assuming the operator is running: USER=netobserv VERSION=b156a29 make set-agent-image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I tested the first version which worked well, will need to retest when you tell me it's stable
@msherif1234: This pull request references NETOBSERV-1994 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
// Update global counter for hashmap update errors | ||
increase_counter(HASHMAP_FLOWS_DROPPED); | ||
} | ||
update_existing_flow(aggregate_flow, &pkt, dns_errno, len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
must swap len and dns_errno
update_existing_flow(aggregate_flow, &pkt, dns_errno, len); | |
update_existing_flow(aggregate_flow, &pkt, len, dns_errno); |
flow_metrics *aggregate_flow = | ||
(flow_metrics *)bpf_map_lookup_elem(&aggregated_flows, &id); | ||
if (aggregate_flow != NULL) { | ||
update_existing_flow(aggregate_flow, &pkt, dns_errno, len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here as well
update_existing_flow(aggregate_flow, &pkt, dns_errno, len); | |
update_existing_flow(aggregate_flow, &pkt, len, dns_errno); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While testing it was showing unexpected low Bps on my workload test ... that's because of wrong argument ordering
Description
bpf code was doing unnecessary calls to
bpf_map_update_elem
while the map is already been updated in place. which also added unneeded cpu loadDependencies
n/a
Checklist
If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.