Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dynamic parameter extraction #3143

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

ScriptSathi
Copy link
Contributor

@ScriptSathi ScriptSathi commented Nov 19, 2024

The discussion for this PR can be found here #3142

This is currently a draft.
I wanted to start the discussion before submitting the final code, as I think, it is a big enough PR.

Take this PR in the today state as a proof of concept for dynamic parameter extraction. I will continue to work on this PR until the below checks are done.

At the current state, the PR is able to

  • Extract parameters from LSM programs
  • Extract parameters from kprobes programs
  • Handle exception for uprobes as it is not part of BTF
  • Have unit tests for all cases
  • Have written documentation

Description

This PR introduce the dynamic parameter extraction

Comments

  • I do not like the use of OverwriteType parameter. But since the function argSelectorType does not receive EventConfig, it is not possible to search for the type using directly BTF types. So I suggest doing another PR before this one is merged to do so if possible. Then I'll remove the parameter. It is also not possible to add an if condition for uprobes in this function. So if the user defines the parameter, it will overwrite the type at this moment.
  • I choose to use u8 to store the offset, as the verifier does not allow me to use u16. At this moment, I don't know if it is possible to found offset > 255 in BTF structures. For my uses cases, using u8 was enough.
  • I would appreciate seeing a closer relationship between BTF and the existing types. This mean that we could directly call it from YAML instead of hard-coding it. At least doing so for primitives types, and hard-code only the complex structures

Test the PR

You can use the following config

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: "lsm"
spec:
  lsmhooks:
  - hook: "bprm_check_security"
    args:
      - index: 0
        type: "linux_binprm"
        extractParam: "file.f_path.dentry.d_name.name"
        overwriteType: "string"
    selectors:
      - matchArgs:
        - index: 0
          operator: "Postfix"
          values:
            - "ls"
            - "sh"
            - "bash"

If you want to test it with more arguments, you can use bprm_creds_from_file hook. It has struct linux_binprm and struct file which are supported.

@ScriptSathi ScriptSathi requested a review from a team as a code owner November 19, 2024 16:27
@ScriptSathi ScriptSathi marked this pull request as draft November 19, 2024 16:28
@ScriptSathi ScriptSathi force-pushed the pr/ScriptSathi/add-dynamic-paramter-extraction branch from 9a9d21f to ce33ec7 Compare November 20, 2024 10:48
Copy link

netlify bot commented Nov 20, 2024

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit 1fb7dbc
🔍 Latest deploy log https://app.netlify.com/sites/tetragon/deploys/674216bc2d27200008186dd3
😎 Deploy Preview https://deploy-preview-3143--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@ScriptSathi ScriptSathi force-pushed the pr/ScriptSathi/add-dynamic-paramter-extraction branch from ce33ec7 to 5ebb3e0 Compare November 20, 2024 16:00
@mtardy mtardy linked an issue Nov 21, 2024 that may be closed by this pull request
2 tasks
@mtardy mtardy added the release-note/major This PR introduces major new functionality label Nov 21, 2024
@ScriptSathi ScriptSathi force-pushed the pr/ScriptSathi/add-dynamic-paramter-extraction branch 2 times, most recently from e3117d7 to ba048b3 Compare November 21, 2024 10:59
Copy link
Contributor

@olsajiri olsajiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, left some comments

thanks for splitting the code in multiple logical changes, it's rare ;-)

we were originally thinking of using C expression parsing (there's some go lib for that) but I think that can be done later if it's ever needed, the basic parsing you did should be fine

please add tests, would be great to have some framework where we could easily add various 'ExtractParam' expressions for testing

pkg/sensors/tracing/genericuprobe.go Show resolved Hide resolved
@@ -150,6 +150,19 @@ struct selector_arg_filters {
__u32 argoff[5];
} __attribute__((packed));

struct config_btf_arg_child {
__u8 offset;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u8 is not enough, is it? you can have bigger offset than 256

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was doing the POC for this I could use u32 and the verifier does not complain, but when i put this in tetragon it does, so I changed it to u8. Though I've seen this function args_off that format the int to 14 bits. So maybe the verifier let do math pointer until 1600 ?
I will make more tests with u16 to push the verifier to the limits to try.

i int,
) (*btf.Type, error) {
switch t := currentType.(type) {
case *btf.Struct:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add here btf.Union as well

pkg/btf/btf.go Outdated
if childOff > 255 {
return nil, fmt.Errorf("Unable to reach type %v at offset %d", currentType.TypeName(), childOff)
}
(*btfArgChilds)[i].Offset = uint8(childOff)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offset should be uint32 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verifier does not allow more than u8. But I should test if it is possible. This is why I put an exception if offset > 255.

if (index <= 4) {
btf_config = (&config->btf_arg_child0)[index];
struct extract_arg_data extract_data = {
.config = config,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can drop extract_arg_data and just pass btf_config and arg directly to extract_arg_depth ?
I don't see struct extract_arg_data being used elsewhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loop(MAX_BTF_ARG_CHILD_DEPTH, extract_arg_depth, &extract_data, 0);
loop takes only 4 params. I do not see how we could do it ?
Another solution could be nested callback, but not sure it is better
Example :
loop(MAX_BTF_ARG_CHILD_DEPTH, extract_arg_depth(btf_config), (void **)&a, 0);
with extract_arg_depth that take bpf_config and return a function compatible with loop. But this look ugly as well

bpf/process/types/basic.h Outdated Show resolved Hide resolved
@@ -89,6 +101,25 @@ generic_process_event(void *ctx, struct bpf_map_def *heap_map,
a = (&e->a0)[index];
total = e->common.size;

if (index <= 4) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the new extra processing should be optional, now we do that always.. maybe extra bool in event_config?

#else
loop(MAX_BTF_ARG_CHILD_DEPTH, extract_arg_depth, &extract_data, 0);
#endif /* __V61_BPF_PROG */
probe_read((void *)&a, sizeof(char *), (char *)a);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed because pointer are dereferenced on the next iteration when called, not when we do math ptr.
To give more context, let's assume we have linux_binprm and want to follow this linux_binprm.file.my_var with my_var as string and file as struct file *. The is_pointer check in the loop is not done at file iteration but on the next with my_var. In order to get this pointer delay back, we make sure my_var is safe ptr.

In the tetragon logs it look like this

With this final probe_read

{
  "process_lsm": {
    "args": [
      {
        "string_arg": "ls"
      }
    ],
  },
}

Without it

    {
  "process_lsm": {
    "args": [
      {
        "string_arg": "xB�O�=�O�ls"
      }
    ],
  },
}

argType = gt.GenericTypeFromBTF(*lastChild)
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should put this in function and use it from all the generic probes config code, now we just duplicate the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ! I was thinking about this because it is very ugly this way. But after that, I though maybe there are things that can be mutualized between generickprobe.go and genericlsm.go. So I put the ugly code like this and later we do refacto.
But if you mention this, I'll start just for this haha

This commit introduce the `struct config_btf_arg_depth`. It append
`btf_argN` to `struct event_config` as an array of children to dig
in any structure.

For more details, any `btf_argN` can have a list of childrens as
follow.
For example, file->f_path is 152 bytes, so the array will look like
[{ offset: 152, is_pointer: 0 }, { offset: 0, is_pointer: 0 } ...].

The max value `MAX_BTF_ARG_DEPTH` as been set
arbitrary as the verifier need a fixed size.

Signed-off-by: Tristan d'Audibert <[email protected]>
The aim of this commit is to enable the loop on the current hook arguments
and move the buffer at config->btf_argN[i]->offset defined in
userspace. The new offset is then written in the original variable to
keep the same behaviour.
The result of this is the rewrite of the current argument buffer with the
required data.

Signed-off-by: Tristan d'Audibert <[email protected]>
This commit make a closer relationship between the defined
`GenericStringToType` arguments and the existing BTF types.

For instance:
`struct file` with btf is called `file` and also exists in
`GenericStringToType` with the same alias.
But `struct qstr d_name->name` has a returned type `unsigned char` wich
does not exist yet in `GenericStringToType`.
The same thing happened with `linux_binprm->filename` has the type is
`char`

Signed-off-by: Tristan d'Audibert <[email protected]>
The function recursively search in the structure for childrens matching
`pathToFound`. The variable `i` is used here as the depth in the
`pathToFound` array.

For instance, if the search is in the linux_binprm structure and the
path is as follow `file.f_path.dentry.d_name.name`, the following
actions will be done.
- Look for the variable name `file` inside `linux_binprm`.
- If the matches, it stores the offset from linux_binprm where the `file`
  variable can be found.
- Then it take the btf type `file` and search for parameter named
  `f_path`.

Signed-off-by: Tristan d'Audibert <[email protected]>
This commit add 2 parameters to give the ability to the user to search
for a specific variable following a "path" as follow:
```yml
...
    args:
      - index: 0
        type: "linux_binprm"
        extractParam: "file.f_path.dentry.d_name.name"
        overwriteType: "string"
...
```
The above config can be used to extract a specific parameter from the
structure at index 0.

Signed-off-by: Tristan d'Audibert <[email protected]>
The OverwriteType parameter should be deleted if `argSelectorType` can use
the `EventConfig` structure to search for the correct Type.

Signed-off-by: Tristan d'Audibert <[email protected]>
Search if every types defined by the user with ExtractParam exists as BTF
types and store their offsets in ConfigBtfArg.

This function do a basic split on `ExtractParam` to obtain the "path"
to the required data. Then, the array is gave to `btf.FindNextBTFType` to
find the offsets of each elements until we reach the required data.
The output is stored in EventConfig to keep the normal behaviour.

For example, if the arg 0 is `struct linux_binprm` and ExtractParam is
set to `file.f_path.dentry.d_name.name`, the output will give an array
of all the offsets from there parents as such
[{ offset: 96, is_pointer: 0 }, { offset: 152, is_pointer: 1 }, ...]
From input = "linux_binprm -> file.f_path.dentry.d_name.name"

Signed-off-by: Tristan d'Audibert <[email protected]>
This commit update `addLsm` function to use the parameters `ExtractParam`
and `OverwriteType` in order to look for the child members in BTF
structure.

Signed-off-by: Tristan d'Audibert <[email protected]>
…probes

Uprobe offsets can't be found in BTF file. Thus, if the user defines
these specific parameters, they are ignored and a warning is displayed

Signed-off-by: Tristan d'Audibert <[email protected]>
Add very similar code as in `genericlsm.go` file to handle those params.
In a future commit, a refactoring should be acheived to avoid code
redundant code with `genericlsm.go`

Signed-off-by: Tristan d'Audibert <[email protected]>
@ScriptSathi ScriptSathi force-pushed the pr/ScriptSathi/add-dynamic-paramter-extraction branch from ba048b3 to 1fb7dbc Compare November 23, 2024 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/major This PR introduces major new functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add dynamic parameter extration for specific use cases
3 participants