Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming transformer's batch emit times should be more flexible #1198

Open
istreeter opened this issue Mar 1, 2023 · 0 comments
Open

Streaming transformer's batch emit times should be more flexible #1198

istreeter opened this issue Mar 1, 2023 · 0 comments

Comments

@istreeter
Copy link
Contributor

istreeter commented Mar 1, 2023

Currently, if the streaming transformer is configured with 5 minute windows, then it emits batches at exactly 12:00, 12:05, 12:10 etc. If there are, say, 50 instances of the streaming transformer running in parallel, then we get 50 batches all emitted at exactly the same time. This creates a backlog for the loader, which the loader slowly handles over the course of a few minutes.

It would be slightly better if the 50 instances emit batches at slight offsets to each other. For example, instance 1 emits batches at 12:01, 12:06, 12:11, and instance 2 emits batches at 12:02, 12:07, 12:12. This way, the loader receives a more steady stream of batches to load, and it could reduce the overall latency of events reaching the warehouse.

This is best implemented by letting the transformer randomly choose the time of its first window when it first starts up.

See also #1197, which is the main reason we're going to need flexible emit times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant