Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Execute query in one shot to improve query execution performance #482

Draft
wants to merge 9 commits into
base: master_GA_2.0.0
Choose a base branch
from

Conversation

samir-puranik
Copy link
Collaborator

@samir-puranik samir-puranik commented Oct 3, 2023

This is a WIP PR to begin the overhaul of improve query execution performance.

  1. Instead of making separate execute calls on a cursor we would like to make a single execute call and iterate over the result sets from the cursor using Cursor.nextset from psycopg
  2. Because we are executing in one shot, parsing to retrieve the location in the query of each separate SQL statement we are executing is unnecessary and can be removed. Although selection data cannot be computed, absence of parsing also improves performance significantly.

Below is an overall summary of the work that has been done.
  
Main Changes:

  • Execute query in one shot to improve performance
  • Follow pgAdmin4 by using a single execute call (however, we are using the synchronous psycopg execute methods)
  • No longer uses sqlparse for any parsing, significantly increasing execution speed

Caveats:

  • Only shows results from first 100 statements. Showing all statement results reduces performance in loading the results and can slow down ADS performance. pgAdmin only shows very last result, the last result's status message, and any notices throughout the entire query.
  • Every batch says "Started executing query on Line 1" because selection data cannot be null. Default to selecting the entire query
  • All notices show up in the last batch result message (e.g. "schema already exists, skipping", "schema does not exist, skipping", etc.)

Todo:

  • Remove sqlparse from requirements.txt
  • Improve documentation in code
  • Style check
  • Tests. Fix broken tests and write new ones to reflect these changes. We should be disregarding assertions for selection data for example.
  • Show LAST 100 results instead of first 100 results
  • Give indicator for when the connection is in an active transaction state (enhancement)
  • Allow for autocommit to be turned off (indicator for active transaction must be there for this to be useful to customers) (enhancement)
  • If possible to do without parsing, figure out correct selection data. May not be possible (enhancement)

Resources:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant