-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schema description endpoint #459
base: master
Are you sure you want to change the base?
Conversation
The ADS copilot will need a SQL description of the schema in the form of a series of CREATE TABLE statements. This begins the creation of a endpoint -- tentatively named metadata/sql -- to provide that info. This commit creates the endpoint to return a hardcoded string. My code is cargo-culted from the surrounding patterns and feels unnecessarily repetitive. To test it manually, send this payload Content-Length: 96 { "jsonrpc": "2.0", "id": 1, "method": "metadata/sql", "params": { "owner_uri": "foo" } } to stdin of `python3 ossdbtoolsservice/ossdbtoolsservice_main.py` The result is: Content-Length: 66 {"id": 1, "jsonrpc": "2.0", "result": {"metadata": "Hello world"}}
Make a simple database call in the endpoint rather than returning a hardcoded string. Attempt to test the result. TODO: * Make test pass * Create proper response in endpoint
TODO: make realistic passing integration test
The metadata/schema endpoint now describes the schema in natural language (for use by LLMs) with the payload { "description": "..." }
f110ad7
to
735a162
Compare
Thanks @nasc17, I updated the handler mock check as you suggested. Any other suggestions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are we trying to achieve here eventually? What kind of user interaction will be added to use this feature?
# Get current connection | ||
connection_service = self._service_provider[constants.CONNECTION_SERVICE_NAME] | ||
connection: ServerConnection = connection_service.get_connection(owner_uri, ConnectionType.DEFAULT) | ||
schema = SchemaMetadata(connection.cursor(), 'public') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why do we only support 'public' schema? In the SchemaMetadata, init set param schema's default value as public
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, we're using the public schema for now, but I'll let the default argument take care of that rather than specifying it again here.
) | ||
tables = self.cur.fetchall() | ||
cur_table = None | ||
for table, indx in tables: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's less likely to happen here but in case of large schema with 1000+ tables, it's always better to use list append + join than string concatenation. (applies to other two methods as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave that a try in dc8d5d9 -- see what you think. I'm not sure whether that particular implementation really will go faster with all the list comprehensions and dictionary lookups, but curious what your opinion is.
With all the list comprehensions and dictionary lookups, it's questionable whether the new code goes faster, but it's an idea.
Upon connection to the database, our front-end tool will request a schema description from this endpoint. The front-end will include the description in copilot completion requests to improve the relevance of generated code. |
Add a
metadata/schema
endpoint which outputs a terse natural-language description of the public schema.