-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making checkov between 70% to 25% faster 🚀 #6740
Comments
If you feel like the examples I used for the benchmarks are not very representative, let me know and I can run a few more |
I have also started working on a similar pull request. See tpvasconcelos#3 for more details. Updating the results table above with this new improvement, shows the following results on my M3 mac:
|
If I run the pre-commit hook defined in the Motivation section above in a private repository of mine, it only takes 1.38 seconds when running from this branch. While it takes a 2.77 seconds when running against the latest |
Hey @gruebel @tsmithv11! Apologies for pinging you directly here but I wanted to check whether this is something you'd be interested in for the checkov project. In short, the patch I'm suggesting makes checkov between 70% faster (when executing simple commands like If this is something that interests your team, I'd be happy to provide more context where needed, run additional benchmarks, and make the PR ready for review. Thanks in advance! 🚀 |
@tpvasconcelos First of all, thank you for this amazing suggestion 💯 However, I'm hesitant about your specific PR as it introduces a specific mechanism of lazy loading with references to the python internal mechanisms (like references to the stack-frames).
Instead, I think you raised an important point which is the fact that we don't need to import all of the code of the checks for a lot of usages in checkov. @tpvasconcelos Is that something you are willing to consider/try to implement? |
Hey @bo156 thanks for your response! Just letting you know that I did not forget about this. I simply haven't been able to prioritise going over my PR to check whether I can properly address your concerns. I will try to do this in the coming days/weeks. In the meantime, do you think you could take a look at:
Thanks in advance! |
Description
checkov always eagerly loads all runners and checks on all CLI invocations, regardless of whether they are needed or not. I guess this was not always an issue but currently it seems to add quite the overhead. For instance, a simple
checkov --version
orcheckov --help
on a 4-core 8Gb-memory Gitpod instance takes just over 2 seconds. Most of this time is spent importing the hundreds of python modules and checks.If you agree that this is a welcomed improvement, I've done some digging into how this could be addressed and am proposing an incremental solution in this pull request. My results show reduced runtimes of:
checkov --version
checkov --framework=ansible -d .
)The current changes in the pull request already pass all current unit tests but if this is a desired feature, I'll need to go over it more carefully to make sure it is ready for a full review.
Motivation
Firstly, it would be nice to not have to wait over 2 seconds for the output of
checkov --help
.More seriously, when running multiple checkov checks in a CI/CD pipeline, the time checkov takes to load starts to add up, mainly when using checkov with pre-commit. Consider the following pre-commit config for example:
Benchmarking
The following benchmarks were run both on my local machine (M3 mac) and on a 4-core/8Gb-memory Gitpod instance. The results can vary quite a bit on different Gitpod workspace instances, even when requesting the same resources. For this reason, the benchmarks bellow were run on the same workspace instance.
I used hyperfine to help me gather the benchmark statistics and pandas to correlate and render them in HTML.
Requirements
(expandable section)After booting the Gitpod instance, I ran the following commands:
On my local machine (macOS) I also had to install hyperfine and pandas with:
Isolating the effect of the change
(expandable section)The changes proposed in this pull request should not have any impact on the actual execution of the checks and checkov Runners. The effects are only present before the runs are triggered. You can verify this yourself by running some local tests.
To properly isolate the behaviour changed in this PR and remove any extra sources of noise, I patched the
BaseRunner.run()
to simply return an emptyReport
right away.This can be achieved by creating a new entry point under
checkov/main_patched.py
with the following code:Test that it works by running:
Running the benchmark
(expandable section)I executed the following script to generate the results displayed in the next section:
Results
M3 - macOS
--version
--list
--framework=openapi
-d tests/openapi/
--framework=ansible
-d tests/ansible/examples/
--framework=terraform
-d tests/terraform/checks/data
-d tests/
4-core/8Gb-memory Gitpod instance
--version
--list
--framework=openapi
-d tests/openapi/
--framework=ansible
-d tests/ansible/examples/
--framework=terraform
-d tests/terraform/checks/data
-d tests/
The text was updated successfully, but these errors were encountered: