Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clickhouse Target High Memory Usage #312

Closed
max-yan opened this issue May 29, 2024 · 15 comments
Closed

Clickhouse Target High Memory Usage #312

max-yan opened this issue May 29, 2024 · 15 comments
Labels
external Depends on external issue or requirement

Comments

@max-yan
Copy link

max-yan commented May 29, 2024

Issue Description

With --tgt-conn clickhouse all rows are inserted at the last moment.
Memory consumption does not allow to load large tables.
In progress _tmp table is empty it's not a memory leak.

same result with: --src-conn mysql, --src-conn postgres
works as expected: --src-conn postgres --tgt-conn mysql (any use_bulk option)

I'm not a go developer but tried to find the problem. In ClickhouseConn::BulkImportStream only one element of ds.BatchChan is obtained and I don't understand how --tgt-conn mysql works if BatchChan filled in independent of tgt-conn place.

  • Sling version (sling --version): 1.2.10

  • Operating System (linux, mac, windows): linux

  • Log Output (please run command with -d):

/usr/local/bin/sling run --src-conn mysql --src-stream "SELECT * FROM src_table_202404" --tgt-conn clickhouse --tg
t-object dst_table_20240430 --mode full-refresh -d
2024-05-29 13:07:44 DBG Sling version: 1.2.10 (linux amd64)
2024-05-29 13:07:44 DBG type is db-db
2024-05-29 13:07:44 DBG using source options: {"empty_as_null":false,"null_if":"NULL","datetime_format":"AUTO","max_decimals":11}
2024-05-29 13:07:44 DBG using target options: {"datetime_format":"auto","file_max_rows":0,"max_decimals":11,"use_bulk":true,"add_new_columns":true,"adjust_column_type":false,"column_casing":"source"}
2024-05-29 13:07:44 INF connecting to source database (mysql)
2024-05-29 13:07:44 DBG opened "mysql" connection (conn-mysql-DGF)
2024-05-29 13:07:44 INF connecting to target database (clickhouse)
2024-05-29 13:07:44 DBG opened "clickhouse" connection (conn-clickhouse-Jo4)
2024-05-29 13:07:44 INF reading from source database
2024-05-29 13:07:44 DBG SELECT * FROM src_table_202404
2024-05-29 13:07:44 INF writing to target database [mode: full-refresh]
2024-05-29 13:07:44 DBG drop table if exists `db1`.`dst_table_20240430_tmp`
2024-05-29 13:07:44 DBG table `db1`.`dst_table_20240430_tmp` dropped
2024-05-29 13:07:44 DBG create table `db1`.`dst_table_20240430_tmp` (`link_key` Nullable(Int64),
`src` Nullable(Int32),
`address` Nullable(String),
`load_date` Nullable(DateTime64(6))) engine=MergeTree  ORDER BY tuple()
2024-05-29 13:07:44 INF streaming data
2024-05-29 13:07:44 DBG use `db1`
9s 1,119,157 124952 r/s 137 MB | 19% MEM | 7% CPU 2024-05-29 13:07:54 DBG could not select execution from local .sling.db. record not found
19s 2,387,592 126099 r/s 292 MB | 19% MEM | 4% CPU 2024-05-29 13:08:04 DBG could not select execution from local .sling.db. record not found
29s 3,534,503 119697 r/s 434 MB | 20% MEM | 4% CPU 2024-05-29 13:08:14 DBG could not select execution from local .sling.db. record not found
39s 4,870,173 126203 r/s 608 MB | 20% MEM | 6% CPU 2024-05-29 13:08:24 DBG could not select execution from local .sling.db. record not found
49s 6,256,819 132619 r/s 789 MB | 21% MEM | 4% CPU 2024-05-29 13:08:34 DBG could not select execution from local .sling.db. record not found
59s 7,434,366 123818 r/s 942 MB | 21% MEM | 4% CPU 2024-05-29 13:08:44 DBG could not select execution from local .sling.db. record not found
1m9s 8,741,617 126764 r/s 1.1 GB | 22% MEM | 6% CPU 2024-05-29 13:08:54 DBG could not select execution from local .sling.db. record not found
1m19s 10,164,877 133411 r/s 1.3 GB | 22% MEM | 4% CPU 2024-05-29 13:09:04 DBG could not select execution from local .sling.db. record not found
1m29s 11,405,509 129408 r/s 1.4 GB | 22% MEM | 4% CPU 2024-05-29 13:09:14 DBG could not select execution from local .sling.db. record not found
1m39s 12,797,446 134755 r/s 1.6 GB | 23% MEM | 6% CPU 2024-05-29 13:09:24 DBG could not select execution from local .sling.db. record not found
1m49s 13,976,436 127018 r/s 1.7 GB | 23% MEM | 4% CPU 2024-05-29 13:09:34 DBG could not select execution from local .sling.db. record not found
1m59s 15,333,729 131470 r/s 1.9 GB | 24% MEM | 4% CPU 2024-05-29 13:09:44 DBG could not select execution from local .sling.db. record not found
2m9s 16,588,465 128459 r/s 2.1 GB | 24% MEM | 5% CPU 2024-05-29 13:09:54 DBG could not select execution from local .sling.db. record not found
2m19s 17,923,016 131190 r/s 2.2 GB | 24% MEM | 5% CPU 2024-05-29 13:10:04 DBG could not select execution from local .sling.db. record not found
2m29s 19,279,551 131652 r/s 2.4 GB | 25% MEM | 6% CPU 2024-05-29 13:10:14 DBG could not select execution from local .sling.db. record not found
2m39s 20,664,185 134413 r/s 2.6 GB | 25% MEM | 4% CPU 2024-05-29 13:10:24 DBG could not select execution from local .sling.db. record not found
2m49s 21,480,548 102907 r/s 2.7 GB | 28% MEM | 5% CPU 2024-05-29 13:10:34 DBG could not select execution from local .sling.db. record not found
2m58s 21,480,548 56464 r/s 2.7 GB | 30% MEM | 2% CPU 2024-05-29 13:10:43 DBG COPY 21480548 ROWS
2m58s 21,480,548 121167 r/s 2.7 GB | 28% MEM | 2% CPU
2024-05-29 13:10:43 DBG select count(*) cnt from `db1`.`dst_table_20240430_tmp`
2024-05-29 13:10:43 DBG drop table if exists `db1`.`dst_table_20240430`
2024-05-29 13:10:43 DBG table `db1`.`dst_table_20240430` dropped
2024-05-29 13:10:43 DBG create table `db1`.`dst_table_20240430` (`link_key` Nullable(Int64),
`src` Nullable(Int32),
`address` Nullable(String),
`load_date` Nullable(DateTime64(6))) engine=MergeTree  ORDER BY tuple()
2024-05-29 13:10:43 INF created table `db1`.`dst_table_20240430`
2024-05-29 13:10:43 DBG insert into `db1`.`dst_table_20240430` (`link_key`, `src`, `address`, `load_date`) select `link_key`, `src`, `address`, `load_date` from `db1`.`dst_table_20240430_tmp`
2024-05-29 13:10:52 DBG inserted rows into db1.`dst_table_20240430` from temp table `db1`.`dst_table_20240430_tmp`
2024-05-29 13:10:52 INF inserted 21480548 rows into db1.`dst_table_20240430` in 188 secs [114,187 r/s] [2.7 GB]
2024-05-29 13:10:52 DBG drop table if exists `db1`.`dst_table_20240430_tmp`
2024-05-29 13:10:52 DBG table `db1`.`dst_table_20240430_tmp` dropped
2024-05-29 13:10:52 DBG closed "clickhouse" connection (conn-clickhouse-Jo4)
2024-05-29 13:10:52 DBG closed "mysql" connection (conn-mysql-DGF)
2024-05-29 13:10:52 DBG could not select execution from local .sling.db. record not found
2024-05-29 13:10:52 INF execution succeeded
2024-05-29 13:10:52 DBG could not select execution from local .sling.db. record not found
@flarco
Copy link
Collaborator

flarco commented May 31, 2024

Hi, I'm not sure I understand the problem.

Memory consumption does not allow to load large tables

Are you facing OOM errors? is the process being killed?

The way sling works is by inserting everything in a temp table inside a transaction, then insert into final table from temp table. So, only when the transaction closes that you would be able to see the data.

@max-yan
Copy link
Author

max-yan commented May 31, 2024

Are you facing OOM errors? is the process being killed?

I killed sling process before oom.
Memory usage was more than 100GB.
_tmp table was empty.

@flarco
Copy link
Collaborator

flarco commented May 31, 2024

Ah yes, I opened an issue here for this: ClickHouse/clickhouse-go#1293

This is a problem in the 3rd party clickhouse driver that Sling uses.

@flarco flarco changed the title Clickhouse tgt always one batch Clickhouse Target High Memory Usage May 31, 2024
@flarco flarco added the external Depends on external issue or requirement label May 31, 2024
@flarco
Copy link
Collaborator

flarco commented May 31, 2024

What you could do it use source_options.limit? or CLI flag --limit with --incremental.
I'll see if there is a good way to flush the records so that memory is released.

@max-yan
Copy link
Author

max-yan commented May 31, 2024

For clickhouse sling does not try to use multiple inserts, but writes the whole result at the end. I'm talking about _tmp table, not final.

@flarco
Copy link
Collaborator

flarco commented May 31, 2024

@max-yan can you compile the binary from branch v1.2.11 and try?
I added this line: cc6d22b

@max-yan
Copy link
Author

max-yan commented May 31, 2024

@flarco I can see inserts now
but SIGSEGV too

[signal SIGSEGV: segmentation violation code=0x2 addr=0xc05d800000 pc=0x561872]

goroutine 162 gp=0xc001c816c0 m=32 mp=0xc000d2a808 [running]:
runtime.throw({0x346e1ba?, 0x4c4?})
        /usr/local/go/src/runtime/panic.go:1023 +0x5c fp=0xc0000f0d60 sp=0xc0000f0d30 pc=0x4478bc
runtime.sigpanic()
        /usr/local/go/src/runtime/signal_unix.go:895 +0x285 fp=0xc0000f0dc0 sp=0xc0000f0d60 pc=0x460445
strings.IndexAny({0xc05151d304?, 0xc05151d303?}, {0x346c985?, 0x0?})
        /usr/local/go/src/strings/strings.go:161 +0x172 fp=0xc0000f0e38 sp=0xc0000f0dc0 pc=0x561872
github.com/flarco/g/csv.(*Writer).Write(0xc00220fa70, {0xc0048be360?, 0x9, 0xc0048be360?})
        /home/maxim/go/pkg/mod/github.com/flarco/[email protected]/csv/writer.go:84 +0x334 fp=0xc0000f0ee0 sp=0xc0000f0e38 pc=0xa640b4
github.com/slingdata-io/sling-cli/core/dbio/iop.(*Datastream).writeBwCsv(0xc00194a680, {0xc0048be360?, 0x9?, 0x10?})
        /home/maxim/soft_src/sling-cli/core/dbio/iop/datastream.go:282 +0x27 fp=0xc0000f0f10 sp=0xc0000f0ee0 pc=0x1187407
github.com/slingdata-io/sling-cli/core/dbio/iop.(*Datastream).processBwRows.func1()
        /home/maxim/soft_src/sling-cli/core/dbio/iop/datastream.go:210 +0xc5 fp=0xc0000f0f80 sp=0xc0000f0f10 pc=0x1186be5
github.com/slingdata-io/sling-cli/core/dbio/iop.(*Datastream).processBwRows(0xc00194a680)
        /home/maxim/soft_src/sling-cli/core/dbio/iop/datastream.go:219 +0x91 fp=0xc0000f0fc8 sp=0xc0000f0f80 pc=0x1186af1
github.com/slingdata-io/sling-cli/core/dbio/iop.(*Datastream).Start.gowrap1()
        /home/maxim/soft_src/sling-cli/core/dbio/iop/datastream.go:774 +0x25 fp=0xc0000f0fe0 sp=0xc0000f0fc8 pc=0x118cb05
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000f0fe8 sp=0xc0000f0fe0 pc=0x482a41
created by github.com/slingdata-io/sling-cli/core/dbio/iop.(*Datastream).Start in goroutine 151
        /home/maxim/soft_src/sling-cli/core/dbio/iop/datastream.go:774 +0x121c

@flarco
Copy link
Collaborator

flarco commented May 31, 2024

Try again with env var SLING_PROCESS_BW=false? I have researched but don't know why the segmentation violation occurs... It happens randomly, non-deterministic.

@flarco
Copy link
Collaborator

flarco commented May 31, 2024

@max-yan how is the memory usage? did it improve?

@max-yan
Copy link
Author

max-yan commented May 31, 2024

@flarco
SIGSEGV not repeatable.
I was able to load without "SLING_PROCESS_BW=false".
I don't see memory leaks with a 20GB csv file (238576508 rows).

@flarco
Copy link
Collaborator

flarco commented May 31, 2024

Great. @alisman FYI, looks like the batching the inserts works.
Will be fixed in next release v1.2.11.

@max-yan
Copy link
Author

max-yan commented May 31, 2024

I used batch.Limit = 2000000.
option needed

@flarco
Copy link
Collaborator

flarco commented May 31, 2024

@max-yan yes agreed, will add. This was to test.

@alisman
Copy link

alisman commented May 31, 2024 via email

@flarco
Copy link
Collaborator

flarco commented Jun 1, 2024

Added target_options.batch_limit.
Closing.

@flarco flarco closed this as completed Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external Depends on external issue or requirement
Projects
None yet
Development

No branches or pull requests

3 participants