Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHOENIX-7434 Extend atomic support to single row delete with condition on non-pk columns #2008

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

virajjasani
Copy link
Contributor

Jira: PHOENIX-7434

PHOENIX-7411 introduces Atomic Delete support for the single row deletion. It returns result (row) back to the client (using new API) only if the given client initiated Delete operation deleted the row when the row was alive. In the context of atomicity, this means that the delete operation is already idempotent and will continue to be idempotent, however only the client that was able to successfully delete the row by taking lock on the live row (row without delete marker) is the only one that can return the old row result back.

PHOENIX-7411 support includes only single row if the WHERE clause contains only pk columns as single row point lookup plan. However, user can also add additional non-pk columns in the WHERE clause in addition to all pk columns, this also makes the Delete as single row delete only, but due to addition of non-pk columns, server side delete plan is used with auto-commit connections.

The sever side delete plans goes from DeleteCompiler initiating the Scan with attribute "_DeleteAgg" which allows UngroupedAggregateRegionObserver to execute Delete Mutations on the rows read by the UngroupedAggregateRegionScanner. This Mutation is used by IndexRegionObserver to perform the mutation at the server side.

Given that IndexRegionObserver already has the support for atomic single row delete as part of PHOENIX-7411, the proposal for this Jira is to introduce single row delete with condition on non-pk columns such that atomic delete returning the result can be used by UngroupedAggregateRegionObserver and then sent back to DeleteCompiler which can later on be consumed by the PhoenixStatement API similar to PHOENIX-7411.

isUpsert = stmt instanceof ExecutableUpsertStatement;
isDelete = stmt instanceof ExecutableDeleteStatement;
if (isDelete && connection.getAutoCommit() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole purpose of this is to identify if it is a ServerSelectDeleteMutation plan and then set an additional scan attribute. You can do this later too along with other checks for singleRowUpdate. You have the plan which has the statementcontext which has the scan. This way you don't have to introduce this specialized compilePlan method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, scratch that comment.

Comment on lines 759 to 771
private Result annotateCommitAndReturnResult(
UngroupedAggregateRegionObserver.MutationList mutations) throws IOException {
annotateDataMutations(mutations, scan);
if (isDelete || isUpsert) {
annotateDataMutationsWithExternalSchemaId(mutations, scan);
}
Result result = ungroupedAggregateRegionObserver.commitWithResultReturned(mutations,
indexUUID, indexMaintainersPtr, txState, targetHTable,
useIndexProto, clientVersionBytes);
mutations.clear();
return result;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't like how the code has been duplicated with the regular path just to handle the specialized single row delete case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplicated portion is very small

        annotateDataMutations(mutations, scan);
        if (isDelete || isUpsert) {
            annotateDataMutationsWithExternalSchemaId(mutations, scan);
        }

Hence, I wanted to avoid creating small method but it makes sense to have common utility which does apply annotations to the list of mutations.
Done.

Comment on lines +855 to +858
if (isSingleRowDelete) {
return deleteRowAndGetMutationState(table);
} else {
Tuple row = iterator.next();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we going out of the way to do a separate scan? We already know it is a singleRowDelete. We can take different actions based on that information on the tuple that is returned by the server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client expects only COUNT(*) result returned in Tuple at this place. Now that we need to return the row result, we need to break this mandatory translation of count result anyway right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more specific, for single row atomic delete returning result case, the server is going to return the entire row and not count value as non-aggregate CF:CQ format, hence Tuple row = iterator.next() just does not work, it throws error because the iterator#next expects Cell result to contain only non-aggregate CF:CQ (s:s).

@tkhurana
Copy link
Contributor

@virajjasani The changes look functionally correct but I didn't like how we are creating a specialized path in the code just for this use case. I would like it better if we can implement it in a much cleaner way without creating so many if/else's. Maybe @kadirozde has some ideas.

@virajjasani
Copy link
Contributor Author

virajjasani commented Nov 16, 2024

@virajjasani The changes look functionally correct but I didn't like how we are creating a specialized path in the code just for this use case. I would like it better if we can implement it in a much cleaner way without creating so many if/else's. Maybe @kadirozde has some ideas.

I agree that having separate path is making it difficult to comprehend but when i looked at how things are implemented for server side delete workflow, i didn't feel we could do much because this path of server side delete is all about scanner performing delete at server side and then returning count of rows deleted. This itself looks a bit hacky to me in the first place because client expects the result with single cell containing count value with non-aggregate CF:CQ as s:s.

Now, our goals are:

  • We should not change the above hacky behavior because we need to maintain old client compatibility. Moreover, we might cause regression if we try to change the client aggregation interface to manage both "count(*)" and "entire row" return cases.
  • We need server to return full row as result for the atomically deleted row, and let the client parse the result correctly and return to user as new PhoenixStatement API.

Therefore, we need special code path that changes both client and server for the specific "single row atomic delete" case. I don't think there is any other way we can achieve both of above goals without the risk of causing some regression with old and new clients.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants