QCircuitNet is the first benchmark dataset designed to evaluate the capabilities of AI in designing and implementing quantum algorithms from the perspective of programming languages.
Key features of QCircuitNet include:
- Comprehensive Framework: A general framework which formulates the key features of quantum algorithm design task for Large Language Models, including problem description, quantum circuit codes, classical post-processing, and verification functions.
- Wide Range of Quantum Algorithms: Implementation for a wide range of quantum algorithms from basic primitives to advanced applications, with easy extension to more quantum algorithms.
- Validation and Verification Functions: Automatic validation and verification functions, allowing for iterative evaluation and interactive reasoning without human inspection.
- Training Potential: Promising potential as a training dataset through primitive fine-tuning results.
The workflow of our benchmark is illustrated as follows:
Below is an illustration of the directory structure of QCircuitNet.
π QCircuitNet/
β
βββπ Oracle Construction/
β βββπ Quantum Logic Synthesis/
β β βββ Contains textbook-level and advanced oracles.
β βββπ Problem Encoding/
β βββ Oracles encoding application scenarios.
β
βββπ Algorithm Design/
β βββπ Quantum Computing/
β β βββ Contains universal quantum computing algorithms.
β βββπ Quantum Information/
β βββ Includes tasks related to quantum information protocols.
β
βββπ Random Circuits/
βββπ Clifford/
β βββ Random circuits with Clifford gate set.
βββπ Universal/
βββ Random circuits with universal gate set.
Each subdirectory contains algorithm-specific data. For instance, the directory structure for Simon's algorithm under "Algorithm Design" is as follows:
π Algorithm Design/
βββπ Quantum Computing/
βββπ simon/ # All data for the Simon's Problem
βββπ simon-dataset.py # Dataset creation script
βββπ simon-generation.py # Qiskit generation code
βββπ simon-post-processing.py # Post-processing function
βββπ simon-utils.py # Utility functions for verification
βββπ simon-verification.py # Verification function
βββπ simon-description.txt # Problem description
βββπ simon-verification.txt # Verification results of the data points
βββπ full-circuit/ # Raw data of quantum circuits
β βββπ simon-n2/
β β βββπ simon-n2-s11-k11.qasm # Full circuit for a concrete setting
β βββπ simon-n3/
β βββπ simon-n3-s011-k001.qasm
β βββπ simon-n3-s011-k101.qasm
β βββ ...
βββπ test-oracle/ # Extracted oracle definitions
βββπ n2/
β βββπ trial1/
β β βββπ oracle.inc # Oracle definition as a .inc file
β β βββπ oracle-info.txt # Oracle information (such as key strings)
βββπ n3/
βββπ trial1/
β βββπ oracle.inc
β βββπ oracle-info.txt
βββπ trial2/
β βββπ oracle.inc
β βββπ oracle-info.txt
βββ ...
βββπ simon-n2.qasm # Algorithm circuit for model output
βββπ simon-n3.qasm
βββ ...
In this subsection, we provide concrete examples to illustrate the different components of QCircuitNet. We use the case of Simon's Problem throughout the demonstration to achieve better consistency. For further details, please check the code repository.
The problem description is provided in a text file named {algorithm_name}_description.txt
. Below is an example template for Simon's Problem:
Given a black box function f: {0,1}^n β {0,1}^n. The function is guaranteed to be a
two-to-one mapping according to a secret string s β {0, 1}^n, s β 0^n, where given
x_1 β x_2, f(x_1) = f(x_2) iff x_1 β x_2 = s.
Design a quantum algorithm to find the secret string s. The function is provided as a
black-box oracle gate named "Oracle" in the oracle.inc file, operating as:
O_f |xβ©|yβ© = |xβ© |y β f(x)β©
The input qubits |xβ© are indexed from 0 to n-1, and the output qubits |f(x)β© are
indexed from n to 2n-1. For the algorithm, provide:
1. The quantum circuit implementation using QASM or Qiskit.
2. The post-processing code run_and_analyze(circuit, aer_sim) in Python, which simulates
the circuit with AerSimulator and returns the secret string s based on the simulation results.
The following is the Qiskit code to generate a quantum circuit for Simon's Problem. The file is named {algorithm_name}_generation.py
.
from Qiskit import QuantumCircuit
def simon_algorithm(n, oracle):
"""Generates a Simon algorithm circuit.
Parameters:
- n (int): number of qubits
- oracle: the oracle function
Returns:
- QuantumCircuit: the Simon algorithm circuit
"""
# Create a quantum circuit on 2n qubits
simon_circuit = QuantumCircuit(2 * n, n)
# Initialize the first register to the |+> state
simon_circuit.h(range(n))
# Append the Simon's oracle
simon_circuit.append(oracle, range(2 * n))
# Apply a H-gate to the first register
simon_circuit.h(range(n))
# Measure the first register
simon_circuit.measure(range(n), range(n))
return simon_circuit
The OpenQASM 3.0 file stores the quantum circuit for specific settings. Below is an example for Simon's algorithm with n = 3
. This file is named {algorithm_name}_n{qubit_number}.qasm
.
OPENQASM 3.0;
include "stdgates.inc";
include "oracle.inc";
bit[3] c;
qubit[6] q;
h q[0];
h q[1];
h q[2];
Oracle q[0], q[1], q[2], q[3], q[4], q[5];
h q[0];
h q[1];
h q[2];
c[0] = measure q[0];
c[1] = measure q[1];
c[2] = measure q[2];
This Python function simulates the circuit and derives the final answer to Simon's Problem. The file is named {algorithm_name}_post_processing.py
.
from sympy import Matrix
import numpy as np
from Qiskit import transpile
def mod2(x):
return x.as_numer_denom()[0] % 2
def solve_equation(string_list):
"""Solve A^T * X = 0 mod 2 for the null space."""
M = Matrix(string_list).T
M_I = Matrix(np.hstack([M, np.eye(M.shape[0], dtype=int)]))
M_I_rref = M_I.rref(iszerofunc=lambda x: x % 2 == 0)
M_I_final = M_I_rref[0].applyfunc(mod2)
if all(value == 0 for value in M_I_final[-1, :M.shape[1]]):
result_s = "".join(str(c) for c in M_I_final[-1, M.shape[1]:])
else:
result_s = "0" * M.shape[0]
return result_s
def run_and_analyze(circuit, aer_sim):
n = circuit.num_qubits // 2
circ = transpile(circuit, aer_sim)
results = aer_sim.run(circ, shots=n).result()
counts = results.get_counts()
equations = [list(map(int, result)) for result in counts if result != "0" * n]
prediction = solve_equation(equations) if len(equations) > 0 else "0" * n
return prediction
The oracle is defined in the file oracle.inc
, which is used for algorithm design tasks.
gate Oracle q[0], q[1], q[2], q[3], q[4], q[5] {
cx q[0], q[3];
cx q[1], q[4];
cx q[2], q[5];
cx q[2], q[5];
x q[3];
}
This information is stored in the oracle_info.txt
file and provides additional metadata for the oracle, such as the secret and key strings. Below is an example for Simon's Problem with n = 3
and test case 2:
Secret string: 100
Key string: 001
The following function checks the correctness of the generated model. The file is named {algorithm_name}_verification.py
.
from simon_utils import *
def check_model(qasm_string, code_string, n):
"""Check the Simon model."""
with open(f"test_oracle/n{n}/trial1/oracle.inc", "r") as file:
oracle_def = file.read()
full_qasm = plug_in_oracle(qasm_string, oracle_def)
circuit = verify_qasm_syntax(full_qasm)
if circuit is None:
return -1
try:
exec(code_string, globals())
aer_sim = AerSimulator()
total_success, total_fail = 0, 0
shots = 10
for t in range(1, 11):
with open(f"test_oracle/n{n}/trial{t}/oracle_info.txt", "r") as file:
secret_string = re.search(r"Secret string: ([01]+)", file.read()).group(1)
circuit = transpile(circuit, aer_sim)
prediction = run_and_analyze(circuit.copy(), aer_sim)
if prediction == secret_string:
total_success += 1
else:
total_fail += 1
return total_success / (total_success + total_fail)
except Exception as e:
print(f"Error: {e}")
return -1
This script creates the dataset from scratch, generating the circuits, extracting gate definitions, and ensuring dataset validity. The file is named {algorithm_name}_dataset.py
.
import argparse
def main():
parser = argparse.ArgumentParser()
parser.add_argument(
"-f", "--func",
choices=["qasm", "json", "gate", "check"],
help="Function to call: generate qasm, json, extract gate, or check dataset."
)
args = parser.parse_args()
if args.func == "qasm":
generate_circuit_qasm()
elif args.func == "json":
generate_dataset_json()
elif args.func == "gate":
extract_gate_definition()
elif args.func == "check":
check_dataset()
The following charts provide an overview of the proportion of different data types within the QCircuitNet dataset.
To provide users with an intuitive view of the dataset content, facilitate display on GitHub pages, and comply with GitHub's policies regarding file number and size, we performed a hierarchical sampling of the dataset. Random sampling was employed to minimize bias inherent in manual selection, thereby maintaining the dataset's diversity and representativeness.
The sampling process is implemented via a function named hierarchical_sample
, which processes the dataset's root directory and handles subdirectories based on their specific structures. The sampling methodology for each main directory is detailed below:
-
Algorithm_Design
This directory contains two subdirectories:
Quantum_Computing
andQuantum_Information
.a. Quantum_Computing
The
Quantum_Computing
folder includes the following subdirectories:bernstein_vazirani
deutsch_jozsa
generalized_simon_multi
generalized_simon_ternary
grover
phase_estimation
quantum_fourier_transformation
simon
Each of these subdirectories contains files named
algorithmname_n{}.qasm
(e.g.,bernstein_vazirani_n{}.qasm
), wherealgorithmname
matches the folder name. From each subdirectory, we randomly sampled 10 files to retain.Additionally, each subdirectory contains a
test_oracle
folder, which further containsn{}
subfolders (with varying counts). For eachtest_oracle
folder, we randomly sampled 10n{}
subfolders to retain. Within each retainedn{}
folder, if the number oftrial{}
subfolders exceeds 5, we randomly sampled 5trial{}
subfolders to retain and deleted the rest.b. Quantum_Information
The
Quantum_Information
folder includes the following subdirectories:-
ghz_state
-
quantum_key_distribution
-
quantum_teleportation
-
random_number_generator
-
superdense_coding
-
swap_test
-
w_state
-
ghz_state
: Contains files namedghz_state_n{}.qasm
(forn
from 2 to 133). We randomly sampled 10 files to retain. -
quantum_key_distribution
: Contains subdirectories namedqkd_n{}
(forn
from 20 to 50). We randomly sampled 10qkd_n{}
subdirectories to retain. Eachqkd_n{}
folder contains files namedqkd_n{}_trial{True/False}
with corresponding.qasm
and.txt
files (e.g.,qkd_n47_trial1_False.qasm
andqkd_n47_trial1_False.txt
). For each retainedqkd_n{}
folder, we randomly sampled 5 trials to retain, where each trial consists of four files:qkd_n{}_trial{}_True.qasm
qkd_n{}_trial{}_True.txt
qkd_n{}_trial{}_False.qasm
qkd_n{}_trial{}_False.txt
-
quantum_teleportation
: Contains atest_oracle
folder withtrial{}
subfolders. We randomly sampled 5trial{}
subfolders to retain. -
random_number_generator
: Contains files namedqrng_n{}.qasm
(forn
from 1 to 133). We randomly sampled 10 files to retain. -
swap_test
: Contains files namedswap_test_n{}.qasm
(forn
from 1 to 20). We randomly sampled 10 files to retain.Additionally, there is a
test_oracle
folder containingn{}
subfolders (number varies). We randomly sampled 10n{}
subfolders to retain. Within each retainedn{}
folder, if the number oftrial{}
subfolders exceeds 5, we randomly sampled 5trial{}
subfolders to retain and deleted the rest. -
w_state
: Contains files namedw_state_n{}.qasm
(forn
from 2 to 133). We randomly sampled 10 files to retain.Additionally, there is a
gate_circuit
folder containing files namedw_state_n{}.qasm
(forn
from 2 to 133). We randomly sampled 10 files from this folder to retain.
-
Oracle_Construction
This directory contains two subdirectories:
Problem_Encoding
andQuantum_Logic_Synthesis
.a. Problem_Encoding
No sampling was performed on this folder; all contents are retained.
b. Quantum_Logic_Synthesis
This folder contains the following subdirectories:
bernstein_vazirani
deutsch_jozsa
diffusion_operator
generalized_simon_multi
generalized_simon_ternary
grover
simon
Each subdirectory contains subfolders named
algorithmname_n{}
(e.g.,bernstein_vazirani_n{}
forn
from 2 to 14). For each subdirectory:- If the number of
algorithmname_n{}
subfolders exceeds 5, we randomly sampled 5 subfolders to retain. - Within each retained
algorithmname_n{}
subfolder, if the number of items (files or subfolders) exceeds 5, we randomly sampled 5 items to retain. - If the contents of an
algorithmname_n{}
subfolder are themselves subfolders, we applied the same rule recursively: if a subfolder contains more than 5 items, we randomly sampled 5 items to retain.
-
Random_Circuits
This directory contains two subdirectories:
clifford
anduniversal
.a. clifford
Contains subdirectories named
clifford_n{}
. We randomly sampled 10clifford_n{}
subfolders to retain.- Within each retained
clifford_n{}
folder, there are multipleI{}
subfolders. We randomly sampled 5I{}
subfolders to retain. - Within each retained
I{}
subfolder, we randomly sampled 5 pairs of.qasm
and.txt
files to retain. Each pair consists of the corresponding.qasm
and.txt
files (e.g.,clifford_n6_I100_4.qasm
andclifford_n6_I100_4.txt
).
b. universal
Contains subdirectories named
universal_n{}
. We randomly sampled 10universal_n{}
subfolders to retain.- Within each retained
universal_n{}
folder, there are multipleI{}
subfolders. We randomly sampled 5I{}
subfolders to retain. - Within each retained
I{}
subfolder, we randomly sampled 5 pairs of.qasm
and.txt
files to retain. Each pair consists of the corresponding.qasm
and.txt
files (e.g.,universal_n6_I100_4.qasm
anduniversal_n6_I100_4.txt
).
- Within each retained
By applying this hierarchical sampling procedure, we significantly reduced the dataset size while preserving its diversity and representativeness. This makes the dataset more accessible and manageable for users and ensures compliance with GitHub's file size and number policies.
We host a demo version of the dataset on this GitHub repository to illustrate its structure and provide sample data. For the complete dataset, please download it from Google Drive.