Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to programmatically get the output for timeit() or bench_func()? #167

Closed
rayluo opened this issue Aug 3, 2023 · 9 comments
Closed

Comments

@rayluo
Copy link

rayluo commented Aug 3, 2023

Hi @vstinner , thanks for this PyPerf project. Presumably because of its sophisticated architecture, a command line pyperf timeit -s "...define my_func..." "my_func()" is able to print a human-readable output to the terminal, such as "Mean +- std dev: 38.3 us +- 2.8 us". And I love that its std dev is noticeably smaller than some other benchmark tools, and its mean is more consistent.

Now, how do I programmatically get that reliable mean value? I tried the following experiments, but could not get what I want.

  • Intuitively/pythonically, I thought PyPerf's timeit() would mimic Python's same-name function timeit() to return the time elapsed, preferably a mean, but that is not the case. PyPerf's timeit() returns None.
  • The alternative bench_func() would return a benchmark object, but the following attempt does not work.
import pyperf
runner = pyperf.Runner()
return_value = runner.timeit("Times a function", stmt="locals()")
print(return_value)  # This is always None

benchmark = runner.bench_func("bench_func", locals)
print(benchmark)
if benchmark:  # This check is somehow necessary, probably due to the multiprocess architecture
    print(benchmark.get_values())
    # It is still unclear how to get benchmark.mean()
    # It throws exception: statistics.StatisticsError: mean requires at least one data point

BTW, I suspect #165 was for my same use case.

@rayluo
Copy link
Author

rayluo commented Aug 4, 2023

I did more experiment, which brought me further, but still ended up a dead end.

import pyperf

runner = pyperf.Runner()  # "Only once instance of Runner must be created. Use the same instance to run all benchmarks."


def timeit(stmt, *args):
    """It will spawn 20+ subprocesses. The main process returns (time, stdev).
    Otherwise subprocesses return None.  Do NOT run any workload on None code path.

    :param Callable stmt: stmt can be a callable.
    :param args: Positional arguments for the stmt callable.
    """
    name = getattr(stmt, "__name__", str(stmt))  # TODO: The str() could end up with
        # different addresses for the same function in different subprocesses, though
    benchmark = runner.bench_func(name + str(args), stmt, *args)
    if benchmark and benchmark.get_nrun() > 1:  # Then all sub-processes finished
        # PyPerf will already show the mean and stdev on stdout
        return benchmark.median()  # Or we could return mean()
    # Unfortunately, sub-processes will still expose None results. Caller needs to somehow ignore them.


if __name__ == "__main__":
    print("Expensive setup")
    result = timeit(globals)
    if result:
        print(result)

In the snippet above, I can get the time for the test subject (global() in my case).

But the line "expensive setup" is also being printed 20+ times. This makes it useless in a bigger project that needs expensive setup.

@vstinner, is PyPerf meant to support those programmatic use cases?

@bluenote10
Copy link

I was wondering the same and found a solution.

Note that the pyperf architecture seems to be based on re-spawing the script multiple times for the worker processes. This can be seen by printing sys.argv in the script, and you will see that the whole script gets executed many times, and the --worker and --worker-task=<index> arguments are the way pyperf decided what to do exactly in each invocation.

For this reason I would avoid trying to do anything inside the benchmark script itself, because every side effect (like printing) will be executed many times. Instead I would run the entire benchmark via a subprocess.call and pass-in -o some_dump_path. This allows you to then load the written dump file in your main process, which itself isn't subject to re-running.

To illustrate, I'm using something like this in my actual benchmark suite:

def main():
    # In my case I have a bunch of benchmark "script snippets" in another folder,
    # which contain the actual benchmark code, e.g., some call like:
    # pyperf.Runner().bench_time_func(name, func)
    bench_files = [p for p in (Path(__file__).parent / "benchmarks").glob("*.py")]

    for bench_file in bench_files:
        name = bench_file.stem
        print(f"Benchmarking: {name}")

        dump_path = Path(f"/tmp/bench_results/{name}.json")
        dump_path.parent.mkdir(exist_ok=True, parents=True)
        dump_path.unlink(missing_ok=True)

        subprocess.check_call(
            ["python", bench_file, "-o", dump_path], cwd=bench_file.parent
        )

        with dump_path.open() as f:
            benchmarks = pyperf.BenchmarkSuite.load(f).get_benchmarks()

        # now you can programmatically read the benchmark results here...

@vstinner
Copy link
Member

vstinner commented Oct 13, 2023

Once you have a BenchmarkSuite class, you can use the documented API:

Would you mind to elaborate your question?

Examples of code loading JSON files: https://pyperf.readthedocs.io/en/latest/examples.html#hist-scipy-script

@rayluo
Copy link
Author

rayluo commented Oct 14, 2023

Would you mind to elaborate your question?

It seems that PyPerf's multi-process architecture determines that PyPerf's usage pattern is command-line as input, json file as output. This means, if we want to programmatically run multiple test cases and analysis their results, it cannot be done inside a benchmark script. Kudos to @bluenote10 who found a feasible approach to organize such a multi-benchmark project by one main driver script. Overall, this seems difficult to be incorporated into an existing Pytest-powered test suite.

Shameless advertising: I ended up developing perf_baseline, which is a thin wrapper built on top of Python's timeit, whose accuracy is adequate. I also added some handy behaviors that I needed for my "perf regression detection" project, and it fits my need well.

@vstinner
Copy link
Member

I proposed multiple times to have an option to disable fork. Results may be less reliable, but apparently, using fork is causing troubles and pyperf cannot be used in some cases. But so far, nobody really asked for that feature, so it wasn't implemented.

It seems that PyPerf's multi-process architecture determines that PyPerf's usage pattern is command-line as input, json file as output. This means, if we want to programmatically run multiple test cases and analysis their results, it cannot be done inside a benchmark script.

You can write a second script which runs the benchmark suite and analyze them.

@rayluo
Copy link
Author

rayluo commented Oct 14, 2023

Fair enough. Closing this issue, because we have a workaround (and I have an alternative).

@rayluo rayluo closed this as completed Oct 14, 2023
@vstinner
Copy link
Member

As I wrote, I would be fine with an option to not spawn worker processes, but run all benchmarks in a single process.

The main process running all benchmark worker process gets Benchmark objects, it's already part of the API ;-)

@rayluo
Copy link
Author

rayluo commented Oct 14, 2023

I proposed multiple times to have an option to disable fork. Results may be less reliable, but apparently, using fork is causing troubles and pyperf cannot be used in some cases. But so far, nobody really asked for that feature, so it wasn't implemented.

To your point, I suppose we do not need to change PyPerf's multi-process (i.e. fork) nature, especially when that architecture is considered the reason of being more reliable.

What some people needed, at least initially, is an old-school function-style, timeit-like, api, such as output = func(input). So, perhaps, PyPerf can provide a higher level api to wrap all those subprocess.call() and return the content of the json output.

@vstinner
Copy link
Member

To your point, I suppose we do not need to change PyPerf's multi-process (i.e. fork) nature, especially when that architecture is considered the reason of being more reliable.

In terms of API, maybe pyperf can provide an API which runs a process which is the main process, and this one spawns worker process. The API should just return objects directory. So hide the inner complexity.

But here I'm talking about an API which does everything in the whole process.

I'm not sure if it always matter to spawn worker processes. "It depends" :-) It's the beauty of benchmarking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants