Retry producing on next partition if possible when a partition is not… #887
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
When a topic is a partitioned topic and a partition is not available for producing messages, currently pulsar client will still try to produce messages on unavailable partitions, which it may not necessarily need to do in certain cases. Pulsar Client may simply pick up another partition and try producing in certain cases.
Partition Unavailable
There could be a plethora of reasons a partition can become unavailable. But the most prominent reason is partition is moving from one broker to another, and until every actor is in sync with which broker owns the partition, the partition will be unavailable for producing. Actors are producers, old broker, new broker.
Client Behavior
This is the typical produce code.
producer.sendAsync(payLoad.getBytes(StandardCharsets.UTF_8));
When send is called message is enqueued in a queue(called pending message queue) and the future is returned.
And future is only completed when the message is picked from the queue and sent to the broker asynchronously and ack is received asynchronously again. Max size of the pending message queue is controlled by producer config maxPendingMessages.
When pending message queue is full, the application will start getting publish failures. Pending message queue provide a cushion towards unavailable partitions. But again it has some limits.
When another partitions can be picked
When the message is not keyed. That means the message is not ordered based on a key.
When routing mode is round-robin, that means a message can be produced to any of the partitions. So If a partition is unavailable try and pick up another partition for producing, by using the same round-robin algorithm.