You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description of the issue:
When exporting a MSSQL table to parquet I get a parquet file where DuckDB complains about string encoding issues.
"select * from output.parquet" with the latest duckdb results in: Invalid Input Error: Invalid string encoding found in Parquet file: value "\x00\x00\x00\x00\xA8Y\xE2w" is not valid UTF8!
Sling version (sling --version): 1.2.22
Operating System (linux, mac, windows): Ubuntu 20.04
Replication Configuration:
export MSSQL='sqlserver://user:pw@server:1433?database=mytable'
sling run --src-conn MSSQL --src-stream \"SELECT * FROM dbo.[ConfigurationItem]\" --tgt-object 'file:///runner/project/output.parquet' -d
Log Output (please run command with -d):
�[90m2024-11-06 15:01:15�[0m �[33mDBG�[0m Sling version: 1.2.22 (linux amd64)
�[90m2024-11-06 15:01:15�[0m �[33mDBG�[0m type is db-file
�[90m2024-11-06 15:01:15�[0m �[33mDBG�[0m using: {"columns":null,"mode":"full-refresh","transforms":null}
�[90m2024-11-06 15:01:15�[0m �[33mDBG�[0m using source options: {"empty_as_null":false,"null_if":"NULL","datetime_format":"AUTO","max_decimals":-1}
�[90m2024-11-06 15:01:15�[0m �[33mDBG�[0m using target options: {"header":true,"compression":"auto","concurrency":7,"datetime_format":"auto","delimiter":",","file_max_rows":0,"file_max_bytes":0,"max_decimals":-1,"use_bulk":true,"add_new_columns":true,"adjust_column_type":false,"column_casing":"source"}
�[90m2024-11-06 15:01:15�[0m �[33mDBG�[0m opened "sqlserver" connection (conn-sqlserver-nU9)
�[90m2024-11-06 15:01:15�[0m �[32mINF�[0m connecting to source database (sqlserver)
�[90m2024-11-06 15:01:15�[0m �[32mINF�[0m reading from source database
�[90m2024-11-06 15:01:15�[0m �[33mDBG�[0m �[36mSELECT * FROM dbo.[ConfigurationItem]�[0m
�[90m2024-11-06 15:01:16�[0m �[32mINF�[0m writing to target file system (file)
�[90m2024-11-06 15:01:16�[0m �[33mDBG�[0m opened "file" connection (conn-file-DLa)
�[90m2024-11-06 15:01:16�[0m �[33mDBG�[0m writing to file:///runner/project/output.parquet [fileRowLimit=0 fileBytesLimit=0 compression=auto concurrency=7 useBufferedStream=false fileFormat=parquet singleFile=true]
[90m2024-11-06 15:05:47�[0m �[33mDBG�[0m wrote 138 MB: 467182 rows [1,714 r/s]
4m29s 466,602 1737 r/s 1.4 GB | 58% MEM | 86% CPU �[90m2024-11-06 15:05:47�[0m �[32mINF�[0m wrote 467182 rows [1,714 r/s] to file:///runner/project/output.parquet
�[90m2024-11-06 15:05:47�[0m �[33mDBG�[0m closed "sqlserver" connection (conn-sqlserver-nU9)
�[90m2024-11-06 15:05:47�[0m �[32mINF�[0m execution succeeded
The text was updated successfully, but these errors were encountered:
Yes, sling will actually soon use duckdb under the hood to read/write parquet files.
The Go driver (github.com/apache/arrow/go) is unfortunately not great quality, and has given many issues. Stay tuned.
Issue Description
When exporting a MSSQL table to parquet I get a parquet file where DuckDB complains about string encoding issues.
"select * from output.parquet" with the latest duckdb results in:
Invalid Input Error: Invalid string encoding found in Parquet file: value "\x00\x00\x00\x00\xA8Y\xE2w" is not valid UTF8!
Sling version (
sling --version
): 1.2.22Operating System (
linux
,mac
,windows
): Ubuntu 20.04Replication Configuration:
-d
):The text was updated successfully, but these errors were encountered: