-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clickhouse Target High Memory Usage #312
Comments
Hi, I'm not sure I understand the problem.
Are you facing OOM errors? is the process being killed? The way sling works is by inserting everything in a temp table inside a transaction, then insert into final table from temp table. So, only when the transaction closes that you would be able to see the data. |
I killed sling process before oom. |
Ah yes, I opened an issue here for this: ClickHouse/clickhouse-go#1293 This is a problem in the 3rd party clickhouse driver that Sling uses. |
What you could do it use |
For clickhouse sling does not try to use multiple inserts, but writes the whole result at the end. I'm talking about _tmp table, not final. |
@flarco I can see inserts now
|
Try again with env var |
@max-yan how is the memory usage? did it improve? |
@flarco |
Great. @alisman FYI, looks like the batching the inserts works. |
I used batch.Limit = 2000000. |
@max-yan yes agreed, will add. This was to test. |
Excellent! THanks all!
…--Aaron
On Fri, May 31, 2024 at 9:33 AM Fritz Larco ***@***.***> wrote:
@max-yan <https://github.com/max-yan> yes agreed, will add. This was to
test.
—
Reply to this email directly, view it on GitHub
<#312 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABNRGPYZTZ3WV7YOSBIDI3ZFB34FAVCNFSM6AAAAABIOV6ZZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBSGE3TMNZZGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Added target_options.batch_limit. |
Issue Description
With --tgt-conn clickhouse all rows are inserted at the last moment.
Memory consumption does not allow to load large tables.
In progress _tmp table is empty it's not a memory leak.
same result with: --src-conn mysql, --src-conn postgres
works as expected: --src-conn postgres --tgt-conn mysql (any use_bulk option)
I'm not a go developer but tried to find the problem. In ClickhouseConn::BulkImportStream only one element of ds.BatchChan is obtained and I don't understand how --tgt-conn mysql works if BatchChan filled in independent of tgt-conn place.
Sling version (
sling --version
): 1.2.10Operating System (
linux
,mac
,windows
): linuxLog Output (please run command with
-d
):The text was updated successfully, but these errors were encountered: