-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As a SQLite extension #3
Comments
Thanks @asg017 for the detailed feedback and suggestions!
Yes this is the intended end-goal but we had to start somewhere!
Actually PostgresML uses both scalar functions ( But I do want to go the virtual-table way! This would allow to get rid of convoluted JSON output for the above mentioned functions. FYI, I tried to use
The named virtual-table syntax is kinda convoluted, couldn't we get by using eponymous virtual-tables as all the work need to be done in shadow tables behind the scenes anyway? I need to think about it as there is an experiment tracker behind the scenes.
Wow! I did not know we could have generated columns like this in SQLite! This library will never cease to amaze me...
Yes this makes sens to be an argument of the vtable, as the current
This is currently my biggest roadblock with Concerning the availability of ML algorithms: I think we should stick to the Python route for now, using
The serialization mechanism is already implemented in |
To give you some more background, the current implementation is heavily influenced by PostgresML and MLFlow:
|
Here is my train of thought to start working on this:
I will dig-up and clean up my |
I've extracted the
|
@asg017 I've pushed my initial experimentation building a native I'm going to add you as a collaborator on the repo, feel free to experiment from there! |
Closing this issue in favor of rclement/sqlite-ml#1 |
I love the idea of doing ML tasks in SQLite, and would love to see this work as a SQLite extension! That way it's portable between programming languages, and usable outside of Datasette.
Though finding the right SQL API to use will be a challenge: a PostgresML-inspired option would involve a lot of special one-off scalar functions like your
sqml_load_dataset()
orsqml_train()
, but I think we can take advantage of SQLite's virtual table mechanism to get a nicer API that plays well with Datasette.For example, instead of:
We could instead have a virtual table module like
ml_classification
, that makes virtual tables like so:So instead of making predictors with scalar functions, they're instead created with virtual tables.
Though looking at the above, maybe it's nicer to use JSON array as inputs to the table function predictor:
Some other random thoughts:
sqlite-loadable-rs
would be great here, but it's a bit lacking: There's not good shadow table support there yet, and I'd imagine we'd need that here. Also not sure how many ML algorithms are available in Rust. Could also do it in C++, which probably has all the ML algorithms we need, and easier to use shadow tables theresqlite-vss
the Faiss library has an API for serializing an index to a blob, which I store in a shadow table so it can live on across reconnectsWould love to hear your thoughts! Also happy to do this work in a separate repository (with your guidance!), since it'll be a significant amount of non-python code and I don't wanna override your work
The text was updated successfully, but these errors were encountered: