-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[L0 v2] implement native handle API for event #2354
base: main
Are you sure you want to change the base?
Conversation
5551986
to
ddf647c
Compare
9cf5653
to
1e60df6
Compare
Compute Benchmarks level_zero_v2 run (with params: --compare baseline-v2): |
Compute Benchmarks level_zero_v2 run (--compare baseline-v2): SummaryNo diffs to calculate performance change (result is better) Performance change in benchmark groupsRelative perf in group api (6): cannot calculate
Relative perf in group memory (3): cannot calculate
Relative perf in group miscellaneous (1): cannot calculate
Relative perf in group multithread (8): cannot calculate
Relative perf in group Velocity-Bench (6): cannot calculate
Relative perf in group Runtime (8): cannot calculate
Relative perf in group MicroBench (14): cannot calculate
Relative perf in group Pattern (10): cannot calculate
Relative perf in group ScalarProduct (6): cannot calculate
Relative perf in group USM (7): cannot calculate
Relative perf in group VectorAddition (3): cannot calculate
Relative perf in group Polybench (3): cannot calculate
Relative perf in group Kmeans (1): cannot calculate
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Relative perf in group MolecularDynamics (1): cannot calculate
Relative perf in group llama.cpp (6): cannot calculate
DetailsBenchmark details - environment, command, output...api_overhead_benchmark_sycl SubmitKernel out of orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl SubmitKernel in orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100 Output:TestCase,Mean,Median,StdDev,Min,Max,Type memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024 Output:TestCase,Mean,Median,StdDev,Min,Max,Type miscellaneous_benchmark_sycl VectorSumEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=1 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=1 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=1 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=0 --DstUSM=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_ur SubmitKernel out of orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type api_overhead_benchmark_ur SubmitKernel in orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Output:TestCase,Mean,Median,StdDev,Min,Max,Type Velocity-Bench HashtableEnvironment Variables:Command:/home/pmdk/bench_workdir/hashtable/hashtable_sycl --no-verify Output:hashtable - total time for whole calculation: 0.351387 s Velocity-Bench BitcrackerEnvironment Variables:Command:/home/pmdk/bench_workdir/bitcracker/bitcracker -f /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000 Output:---------> BitCracker: BitLocker password cracking tool <--------- ==================================
|
- Implement native handle API for events - Always allocate events as multi-device when using provider_normal, as all events can be used on multiple devices
…ueue Wait should be done on a differnt queue as the name suggests.
No description provided.