A RISC-V emulator built with ClickHouse SQL.
This emulator makes ClickHouse truly Turing complete. We are one step closer to running ClickHouse in ClickHouse.
This project/repository isn't dev-friendly yet, I'm just uploading it here as a backup in case my PC catches fire.
The system will react to the following insert command:
INSERT INTO clickv.clock (_) VALUES ()
This command will trigger a large set of branched materialized views and Null
tables that filter out the program's instructions to simulate reading/writing from registers and memory.
External host machine access works via a single UDF with a custom binary format that gets read/written as an Array(UInt8)
.
The program is able to perform any logic. Printing to a console table and drawing are built-in. It can also open/close/read/write/seek files and sockets via the ClickOS UDF.
For more details, see the architecture section.
I tried to use every optimization trick in the book to get this to run fast, unfortunately there is a MAJOR bottleneck to the performance of this emulator due to a bug in ClickHouse KVStorage logic. Because ClickHouse doesn't have an internal KV-type storage engine, I use Redis for registers/memory. But there is a bug with allow_experimental_analyzer=1
where instead of doing a single MGET
, it will SCAN
all keys and then MGET
multiple times.
I haven't submitted a bug report yet, but I did investigate it. More notes are commented in the file /sql/click-v.sql:11
.
As it is now, the CPU runs at around 17hz
, but during early development this was significantly higher. It can perform better, but when every almost every instruction depends on a register read, it kills performance quickly. It gets worse with more memory allocated in the emulator.
Steps:
- Set up a ClickHouse v24 image
- Set up a Redis-like server for registers/memory access (plain redis works fine, dragonfly was slower, there's also a built-in server in
/system/mem
) - Run all SQL statements in
/sql/click-v.sql
(confirm your redis host is correct, right now it points tohost.docker.internal:6379
) - Load your own RISC-V 32i program into
INSERT INTO clickv.load_program (hex) VALUES ('FFFFFFFF')
(make sure your hex instructions are in the correct direction) - Either clock the system via
INSERT INTO clickv.clock (_) VALUES ()
, or use the auto-clock in/system/clock
You can now monitor the program with the following commands:
- Show program instructions + current instruction:
SELECT * FROM clickv.display_program;
- Show all 32 registers:
SELECT * FROM clickv.display_registers;
- Show memory (with o parameter for offset):
SELECT * FROM clickv.display_memory(o=1024);
- Show console:
SELECT * FROM clickv.display_console FORMAT TSV;
- Setup live view (optional):
SET allow_experimental_live_view = 1;
- (After frame setup) Show current drawn frame:
SELECT * FROM clickv.display_frame FORMAT RawBLOB;
- (After frame setup) Show live-updating frame:
WATCH clickv.display_frame FORMAT RawBLOB;
For more help/commands, see the bottom of /sql/click-v.sql
file.
ROM/RAM/Graphics Memory is configurable.
Depends on ClickHouse v24. No other setup is required for basic emulator. For handling syscalls, you will need to set up the ClickOS UDF, but this is optional.
path: /system/cmd/clock
This program simply runs the clock for you, as fast as possible. Will output clock speed and total cycles to console.
path: /system/cmd/clickos-server
path: /system/cmd/clickos-client
Optional program to give the emulated program access to the host system/network.
This is a client/server application. The client runs as a ClickHouse executable UDF, and then forwards requests to the server. The server will then handle all syscalls (such as reading/writing to a file, opening a UDP socket, etc.)
You will need to set up the UDF in your ClickHouse server. Easiest way is to make two Docker volume binds: one to the UDF XML, and the other to built binary (you must go build
for your docker env/arch)
Run the server to listen/handle syscalls. File paths are relative to the working directory of the ClickOS server process.
path: /rs-demo
This is a demo rust program that can be compiled to run in the emulator.
I have some boilerplate for syscalls, with some OS abstractions for read
, write
, seek
, socket
, open
, close
, etc.
I also have some code that handles drawing to the screen.
To get the program hex, I made a script called gethex.sh
.
You can copy/paste this directly into the program input for the emulator.
This program contains a linker script that defines the memory ranges for ROM, RAM, Stack size, and VRAM.
path: /system/cmd/mem
This program will store the registers/memory for the emulator. Dragonfly was slow for this use case, Redis was faster, but this program is optimized to use exact amounts of memory + sequential reads.
Note: there is a bug with ClickHouse where ALL queries use SCAN
, even direct k=1
queries.
This is a huge hit to performance, and will require a patch to ClickHouse to fix.
path: /system/test/instruction_test.go
How do we know any of these instructions do what they're supposed to do? To answer this, I made a unit test for each instruction. It is now much easier to see if the instructions are compliant with the specification when isolated.
This file will run a test for each instruction, some with different test cases. It also prints out the performance of each instruction. You'll notice some instructions are more costly than others.
I will simplify this into several components:
- Clock
- Program Counter (PC)
- Memory
- Registers
- Instructions
- Syscalls
Schema: no schema
As the name suggests, this is the clock for the emulated CPU.
This is implemented as a Null
table. When you insert into this, it will cascade down a set of materialized views.
Schema: value UInt32
This is a Memory
table with limits to store exactly 1
row.
It stores a single UInt32
, which represents the current instruction.
Schema: address UInt32, value UInt8
Memory contains the program instructions (ROM), as well as RAM and VRAM (for the display).
While I originally had this implemented as a Memory
table, it was clear that this would not
work for larger programs.
When writing to memory, it would push out the oldest row.
It would also require adding a timestamp
field of some kind to each row, since it could contain duplicates. ReplacingMergeTree
was also considered, but this writes to disk, and would have duplicates before the parts are processed (which is likely in a high-speed emulator environment).
It can be done, but it would require having a lot of duplicated rows, with enough space so that old memory would have a low probability of falling out of the table. Too much memory usage.
So I then switched to a Redis
table engine. This is the optimal structure, since it operates as a fast in-memory KV store with no duplicates.
This works perfectly, except for how the newer version of ClickHouse ALWAYS runs a full SCAN
with multiple MGET
calls.
Memory can be read via a JOIN
or sub-query, even in multiple bytes.
Memory can be written in multiple bytes using arrayJoin
into the memory
table.
Schema: address UInt8, value UInt32
Registers are implemented the same as memory, but with 32 fixed registers.
The first materialized view hit by the clock
table is get_next_instruction
.
This will parse the pc
, instruction
, opcode
, and funct3
and send it to the next layer of materialized views. The idea with these layers is to reduce the number of function calls and queries for parsing the instruction.
The next layer will then split by instruction type. For example: R-type, I-type, S-type, jump, ecall, etc.
These views have a WHERE
condition that blocks them from inserting into the next layer of Null
tables, which again reduces the number of queries/function calls.
Within each of these types (such as R-type) is the materialized views for the individual instruction. At this point it will do the final check to see which instruction it is, and then forward to another Null
table for executing the instruction. By this point, there's no other path for that instruction, and all the expensive queries can be made.
Each instruction (with the exception of jumps and branches) will have another materialized view at the end that increments the pc
by 4
. Materialized views are executed in the order they are created, so this works flawlessly for executing sequential logic.
Depending on the instruction, the output will either write to the main registers
or memory
table. Instructions can also read from these table via a JOIN
.
With the layers of filtering, it keeps the execution path short for the ClickHouse server.
This also offers an easy way to measure performance per-instruction, since the original clock
insert will not return until the last materialized view is finished.
RISC-V has a special instruction for returning control to the operating system: ecall
.
The Click-V emulator is able to make use of this special instruction for 3 major features:
- writing to a
print
table, to replicatestdout
- writing to a
frame
table, trigger rendering the data within VRAM into a terminal-displayed frame. - making external calls to the host system via ClickOS (read/write files, communicate over UDP socket, anything else you can imagine)
ecall
is implemented same as the other instructions, but due to the expensive nature of these calls, they are hidden behind another layer of materialized views to prevent unnecessary sub-queries from being triggered.
The syscall number is read from register a7
, and the arguments are passed in the other aX
registers. Depending on the call, the result/status code will be returned back in a0
.
All syscalls have been implemented in the rs-demo
program.
This call is really simple, it just reads from memory using text_ptr
and text_len
, and then inserts the result into the print
table.
This call will read from video memory and split up the bytes into a terminal-based image with ANSI colors. You can use the LIVE VIEW
/ WATCH
API to get this to update in real time.
External system access is managed by ClickOS. These calls are able to read/write to/from emulator memory in order to implement file descriptors for interacting with the host system.
Access to the host system is implemented via a ClickHouse executable UDF. The memory gets inserted/returned as an Array(UInt8)
.
With a similar API to the Linux kernel, these usually rely on a buffer_ptr
and buffer_len
for exposing program memory.