-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support remote/httpfs URLs in the from
field
#567
Comments
Right now Mosaic assumes you first create a named table or view (as an exec call), which helps with reuse and ensures it is only loaded once. HTTPFS is of course supported for this. |
It still seems like we are unnecessarily and incorrectly splitting the string at the . here because we assume the user meant to access a table in a schema. The workaround is to create a view but we can probably be a bit more robust here so it's not required. IIRC duckdb does cache the file reads. |
It's not "incorrect", in the sense that the method's proper argument is a relation reference (possibly including database and schema names). It's not intended to take an arbitrary expression. However, this is certainly something we can revisit. When my schedule allows I'm hoping to make a major revision of the mosaic-sql helper library, at which point I should be able to reconsider table references more generally. |
@jheer @domoritz I'll test out the view workaround, thank you! @jheer I'm excited for those mosaic-sql updates, but I'm wondering if there's a way to make a smaller change for HTTPFS.
Would it break any other parts of Mosaic if mosaic-sql started generating queries that referenced HTTPFS URLs? If not, would you be open to a PR to update Another option could be to change mosaic-spec PlotData type so it supports SQL expressions similar to those in encoding channels using the mosaic/packages/spec/src/spec/PlotFrom.ts Lines 10 to 20 in 56756b0
But the additional complexity might not be worth adding. |
DuckDB's HTTPFS feature, which can read parquet, csv, json, and other files on HTTP servers or cloud object storage, is an incredibly powerful tool that allows the query engine to use range reads to push down queries on parquet (and use its builtin statistics) to limit the amount of data transferred over the network. This helps DuckDB run queries really quickly even over files that might be too large to load into DuckDB WASM's memory.
When I tried this spec in Mosaic Playground:
Mosaic created this query:
And when I changed it to remove the read_parquet function I got
It would be great to add some logic to detect
https://
andhttp://
strings (and maybes3://
andhf://
which are also supported by the httpfs extension) in the from field, and output them directly into the output SQL.mosaic/packages/sql/src/Query.js
Lines 158 to 185 in 56756b0
And to add docs/examples for mosaic-sql, vgplot, and mosaic-spec.
The text was updated successfully, but these errors were encountered: