Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Migrate parser library in Spanner support from spansql to memefish #1194

Open
apstndb opened this issue Nov 12, 2024 · 4 comments · May be fixed by #1197
Open

Proposal: Migrate parser library in Spanner support from spansql to memefish #1194

apstndb opened this issue Nov 12, 2024 · 4 comments · May be fixed by #1197

Comments

@apstndb
Copy link

apstndb commented Nov 12, 2024

Is your feature request related to a problem? Please describe.

It seems that spansql package which is used in Spanner support is not actively developed.
googleapis/google-cloud-go#11115

Many of the issues opened due to spansql's lack of support for Spanner SQL syntax.

(Although I think some issues are already resolved in latest spansql.)

Describe the solution you'd like

Migrating to another Spanner SQL parser implementation is a solution.
I recommend to consider cloudspannerecosystem/memefish.
Whose Spanner SQL coverage is already superset of spansql.

Describe alternatives you've considered

Current usage of spansql is only splitting DDL statements and strip comments.

func cleanStatements(migration []byte) ([]string, error) {
// The Spanner GCP backend does not yet support comments for the UpdateDatabaseDdl RPC
// (see https://issuetracker.google.com/issues/159730604) we use
// spansql to parse the DDL and output valid stamements without comments
ddl, err := spansql.ParseDDL("", string(migration))

I think it can be implemented without full syntax parsing using only a lexer.

@apstndb apstndb changed the title Proposal: Migrate parser library in Spanner from spansql to memefish Proposal: Migrate parser library in Spanner support from spansql to memefish Nov 12, 2024
@jtwatson
Copy link

I was working on this issue when I noticed your proposal.

This issue came up again when I tried to use the TOKENLIST type, which is not supported via spansql.

I agree that full syntax parsing is overkill and is like playing whack-a-mole with spanner features. Forever chasing them and often lagging behind. This is ultimately why spansql is struggling.

I think we need the following features for more robust migration support for Spanner:

  1. DML support
  2. Ability to put multiple statements in a single migration script
  3. Ability to include comments in the migration scripts

I was planning to implement the following:

  1. Strip SQL comments from the migration script (Spanner API does not accept comments)
  2. Split migration script into statements (split on semi-colons)
  3. Detect Statement Type for each Statement (DML vs DDL)
  4. Run the statement with the appropriate UpdateDatabaseDdl() or ReadWriteTransaction()

The memefish lexer may be a good solution. I am going to explore using it.

@apstndb
Copy link
Author

apstndb commented Nov 28, 2024

Good!

  1. Strip SQL comments from the migration script (Spanner API does not accept comments)
  2. Split migration script into statements (split on semi-colons)
  3. Detect Statement Type for each Statement (DML vs DDL)
  4. Run the statement with the appropriate UpdateDatabaseDdl() or ReadWriteTransaction()

Actually, I have done similar thing to migrate another OSS, wrench.

It needs all of 1, 2, 3, 4 so I have implemented some helper functions using memefish/lexer in apstndb/gsqlutils (This code may be too much abusing Go 1.23 iterators, but it would be helpful).

@jtwatson
Copy link

jtwatson commented Nov 28, 2024

I have a question regarding backward compatibility. Looking for opinions.

The current state without x-clean-statements option:

  • DDL support ONLY
  • Single DDL statement per file
  • No SQL comments allowed

The current state with x-clean-statements option:

  • DDL support ONLY
  • Multiple DDL statements per file
  • SQL comments are allowed
  • Depends on spansql library, which has numerous shortcomings

I am considering changing the default action to always parse the migration scripts since this is required for DML. There is a narrow use case where retaining the ability to process migration scripts in their raw format would prevent a backward compatibility issue if the memefish lexer failed to parse a valid migration script. There is also an argument that the typical user would expect support for both DML and DDL without having to add the x-clean-statements option. For the average user, this would be a better experience. The caveat is when memefish failed to lex a valid SQL Migration script.

We have the following options:

  1. Retain the x-clean-statements option as it is today
  2. Remove the x-clean-statements option and fully depend on memefish
  3. Change the default to x-clean-statements=true and allow it to be disabled

In my opinion, the current functionality with x-clean-statements=false is not very useful and isn't a good experience for new users.

But potentially causing a backward compatibility issue with no way to disable the new functionality isn't very good either.

I am leaning towards 3. above, with the acknowledgment that this could break some existing code, but the break could be rolled back with a config change. This would also provide a better user experience for new users.

Thoughts?

@apstndb
Copy link
Author

apstndb commented Nov 28, 2024

It is a golang-migrate design issue and I would be not a right person to answer it, but I agree x-clean-statements=true can achieve both of useful default and safety fallback.

FYI: As far as I know, memefish lexer won't fail to tokenize all current-valid GoogleSQL queries because it knows all tokens of ZetaSQL(GoogleSQL frontend).

@jtwatson jtwatson linked a pull request Nov 29, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants