Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single option not working properly with Snowflake #436

Open
TimPossiblee opened this issue Nov 8, 2024 · 1 comment
Open

Single option not working properly with Snowflake #436

TimPossiblee opened this issue Nov 8, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@TimPossiblee
Copy link

Issue Description

  • Description of the issue:

    Loading multiple CSVs (Local to Snowflake) from a directory using single: true always produces an error because it tries to upload the same temporary file multiple times, while also deleting it.

  • Sling version (sling --version): 1.2.20

  • Operating System (linux, mac, windows): linux

  • Replication Configuration:

source: local
target: snowflake

env:
  SLING_STREAM_URL_COLUMN: true
  SAMPLE_SIZE: 0

defaults:
  mode: full-refresh
  single: true

  source_options:
    format: csv
    header: true
    # delimiter: '\t'
    columns:
      "*": string

  target_options:
    column_casing: target

streams:
  "file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/progression/*stock_vehicle_*.txt":
    object: CE_STG.PROG_STOCK_VEHICLE
  • Log Output (please run command with -d):
2024-11-08 07:49:10 DBG opened "file" connection (conn-file-4d5)
2024-11-08 07:49:10 INF Sling Replication [1 streams] | local -> snowflake

2024-11-08 07:49:10 INF [1 / 1] running stream file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/progression/*stock_vehicle_*.txt
2024-11-08 07:49:10 DBG Sling version: 1.2.20 (linux amd64)
2024-11-08 07:49:10 DBG type is file-db
2024-11-08 07:49:10 DBG using: {"columns":[{"name":"*","type":"string"}],"mode":"full-refresh","transforms":null}
2024-11-08 07:49:10 DBG using source options: {"trim_space":false,"empty_as_null":true,"header":true,"fields_per_rec":-1,"compression":"auto","format":"csv","null_if":"NULL","datetime_format":"AUTO","skip_blank_lines":false,"max_decimals":-1}
2024-11-08 07:49:10 DBG using target options: {"datetime_format":"auto","file_max_rows":0,"file_max_bytes":0,"max_decimals":-1,"use_bulk":true,"add_new_columns":true,"adjust_column_type":false,"column_casing":"target"}
2024-11-08 07:49:10 DBG opened "snowflake" connection (conn-snowflake-QsU)
2024-11-08 07:49:10 INF connecting to target database (snowflake)
2024-11-08 07:49:10 INF reading from source file system (file)
2024-11-08 07:49:10 DBG opened "file" connection (conn-file-EuF)
2024-11-08 07:49:10 DBG reading single datastream from file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/progression/*stock_vehicle_*.txt [format=csv]
2024-11-08 07:49:10 DBG merging csv readers of 6 files [concurrency=3] from file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/progression/*stock_vehicle_*.txt
2024-11-08 07:49:10 DBG processing reader from file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/progression/stock_vehicle_20241107043759.txt
2024-11-08 07:49:10 DBG processing reader from file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/progression/CAI/stock_vehicle_20241107043759.txt
2024-11-08 07:49:10 DBG processing reader from file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/progression/EFI/stock_vehicle_20241107043759.txt
2024-11-08 07:49:10 DBG processing reader from file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/progression/MMI/stock_vehicle_20241107043759.txt
2024-11-08 07:49:10 DBG processing reader from file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/progression/FAI/stock_vehicle_20241107043759.txt
2024-11-08 07:49:10 DBG processing reader from file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/progression/PAI/stock_vehicle_20241107043759.txt
2024-11-08 07:49:10 DBG delimiter auto-detected: "\t"
2024-11-08 07:49:10 DBG casting column 'monthdayyear' as 'string'
2024-11-08 07:49:10 DBG casting column 'company' as 'string'
2024-11-08 07:49:10 DBG casting column 'company_name' as 'string'
2024-11-08 07:49:10 DBG casting column 'modelcode' as 'string'
2024-11-08 07:49:10 DBG casting column 'vehicletype' as 'string'
2024-11-08 07:49:10 DBG casting column 'vehiclegroup' as 'string'
2024-11-08 07:49:10 DBG casting column 'currency' as 'string'
2024-11-08 07:49:10 DBG casting column 'stockunits' as 'string'
2024-11-08 07:49:10 DBG casting column 'stockvalue' as 'string'
2024-11-08 07:49:10 DBG casting column 'openorders' as 'string'
2024-11-08 07:49:10 DBG casting column 'opencontract' as 'string'
2024-11-08 07:49:10 DBG casting column 'purchasedate' as 'string'
2024-11-08 07:49:10 DBG casting column 'demostock' as 'string'
2024-11-08 07:49:10 DBG casting column 'commissionnumber' as 'string'
2024-11-08 07:49:10 DBG casting column 'chassisnumber' as 'string'
2024-11-08 07:49:10 DBG casting column 'description' as 'string'
2024-11-08 07:49:10 DBG casting column 'brand' as 'string'
2024-11-08 07:49:10 DBG casting column 'fuel_type' as 'string'
2024-11-08 07:49:10 DBG casting column 'col' as 'string'
2024-11-08 07:49:10 INF writing to target database [mode: full-refresh]
2024-11-08 07:49:10 DBG drop table if exists "CE_STG"."PROG_STOCK_VEHICLE_TMP"
2024-11-08 07:49:10 DBG table "CE_STG"."PROG_STOCK_VEHICLE_TMP" dropped
2024-11-08 07:49:10 DBG create transient table "CE_STG"."PROG_STOCK_VEHICLE_TMP" ("MONTHDAYYEAR" varchar,
"COMPANY" varchar,
"COMPANY_NAME" varchar,
"MODELCODE" varchar,
"VEHICLETYPE" varchar,
"VEHICLEGROUP" varchar,
"CURRENCY" varchar,
"STOCKUNITS" varchar,
"STOCKVALUE" varchar,
"OPENORDERS" varchar,
"OPENCONTRACT" varchar,
"PURCHASEDATE" varchar,
"DEMOSTOCK" varchar,
"COMMISSIONNUMBER" varchar,
"CHASSISNUMBER" varchar,
"DESCRIPTION" varchar,
"BRAND" varchar,
"FUEL_TYPE" varchar,
"COL" varchar,
"_SLING_LOADED_AT" timestamp_tz,
"_SLING_STREAM_URL" varchar)
2024-11-08 07:49:11 INF streaming data
2024-11-08 07:49:11 DBG USE SCHEMA CE_STG
2024-11-08 07:49:11 DBG opened "file" connection (conn-file-jAJ)
2024-11-08 07:49:11 DBG writing to file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/c07fbe89/snowflake/put/2024-11-08T074911.178 [fileRowLimit=0 fileBytesLimit=0 compression=zstd concurrency=7 useBufferedStream=false fileFormat=csv singleFile=true]
2024-11-08 07:49:11 DBG REMOVE @CE_STG.sling_staging/"CE_STG"."PROG_STOCK_VEHICLE_TMP"/2024-11-08T074911.179
2024-11-08 07:49:11 DBG adding new column: custorderdate
2024-11-08 07:49:11 DBG alter table "CE_STG"."PROG_STOCK_VEHICLE_TMP" add column "CUSTORDERDATE" varchar
2024-11-08 07:49:12 DBG adding new column: dealerorderdate
2024-11-08 07:49:12 DBG alter table "CE_STG"."PROG_STOCK_VEHICLE_TMP" add column "DEALERORDERDATE" varchar
2024-11-08 07:49:12 DBG adding new column: dealerstock
2024-11-08 07:49:12 DBG alter table "CE_STG"."PROG_STOCK_VEHICLE_TMP" add column "DEALERSTOCK" varchar
2024-11-08 07:49:12 DBG PUT 'file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/c07fbe89/snowflake/put/2024-11-08T074911.178.zst' @CE_STG.sling_staging/"CE_STG"."PROG_STOCK_VEHICLE_TMP"/2024-11-08T074911.179 PARALLEL=8 AUTO_COMPRESS=FALSE
2024-11-08 07:49:12 DBG PUT 'file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/c07fbe89/snowflake/put/2024-11-08T074911.178.zst' @CE_STG.sling_staging/"CE_STG"."PROG_STOCK_VEHICLE_TMP"/2024-11-08T074911.179 PARALLEL=8 AUTO_COMPRESS=FALSE
2024-11-08 07:49:12 DBG REMOVE @CE_STG.sling_staging/"CE_STG"."PROG_STOCK_VEHICLE_TMP"/2024-11-08T074911.179
2024-11-08 07:49:12 DBG REMOVE @CE_STG.sling_staging/"CE_STG"."PROG_STOCK_VEHICLE_TMP"/2024-11-08T074911.179
2024-11-08 07:49:12 DBG drop table if exists "CE_STG"."PROG_STOCK_VEHICLE_TMP"
2024-11-08 07:49:13 DBG table "CE_STG"."PROG_STOCK_VEHICLE_TMP" dropped
2024-11-08 07:49:13 DBG closed "snowflake" connection (conn-snowflake-QsU)
2024-11-08 07:49:13 INF execution failed


--- database_snowflake.go:792 func5 ---
--- database_snowflake.go:780 func4 ---
~ Error copying to Snowflake Stage: sling_staging
--- database_snowflake.go:931 StagePUT ---
~ could not PUT file file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/c07fbe89/snowflake/put/2024-11-08T074911.178.zst
--- database.go:1166 Query ---
~ Error with StreamRows
--- database.go:1179 QueryContext ---
~ SQL Error for:
PUT 'file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/c07fbe89/snowflake/put/2024-11-08T074911.178.zst' @CE_STG.sling_staging/"CE_STG"."PROG_STOCK_VEHICLE_TMP"/2024-11-08T074911.179 PARALLEL=8 AUTO_COMPRESS=FALSE
--- database.go:814 StreamRowsContext ---
~ Error executing query
--- transaction.go:115 QueryContext ---
264006: file does not exist: [/home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/c07fbe89/snowflake/put/2024-11-08T074911.178.zst]

context canceled

--- task_run.go:97 func1 ---
--- task_run.go:403 runFileToDB ---
--- task_run_write.go:331 WriteToDb ---
~ could not write to database
--- database_snowflake.go:459 BulkImportFlow ---
~ could not insert into "CE_STG"."PROG_STOCK_VEHICLE_TMP".
--- database_snowflake.go:887 CopyViaStage ---

--- database_snowflake.go:792 func5 ---
--- database_snowflake.go:780 func4 ---
~ Error copying to Snowflake Stage: sling_staging
--- database_snowflake.go:931 StagePUT ---
~ could not PUT file file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/c07fbe89/snowflake/put/2024-11-08T074911.178.zst
--- database.go:1166 Query ---
~ Error with StreamRows
--- database.go:1179 QueryContext ---
~ SQL Error for:
PUT 'file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/c07fbe89/snowflake/put/2024-11-08T074911.178.zst' @CE_STG.sling_staging/"CE_STG"."PROG_STOCK_VEHICLE_TMP"/2024-11-08T074911.179 PARALLEL=8 AUTO_COMPRESS=FALSE
--- database.go:814 StreamRowsContext ---
~ Error executing query
--- transaction.go:115 QueryContext ---
264006: file does not exist: [/home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/c07fbe89/snowflake/put/2024-11-08T074911.178.zst]

context canceled

2024-11-08 07:49:13 INF Sling Replication Completed in 2s | local -> snowflake | 0 Successes | 1 Failures

fatal:
--- proc.go:271 main ---
--- sling_cli.go:458 main ---
--- sling_cli.go:494 cliInit ---
--- cli.go:286 CliProcess ---
~ failure running replication (see docs @ https://docs.slingdata.io/sling-cli)
--- sling_run.go:202 processRun ---

--------------------------- file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/progression/*stock_vehicle_*.txt ---------------------------
--- proc.go:271 main ---
--- sling_cli.go:458 main ---
--- sling_cli.go:494 cliInit ---
--- cli.go:286 CliProcess ---
--- sling_run.go:200 processRun ---
--- sling_run.go:494 runReplication ---
--- sling_run.go:396 runTask ---
--- task_run.go:138 Execute ---


--- database_snowflake.go:792 func5 ---
--- database_snowflake.go:780 func4 ---
~ Error copying to Snowflake Stage: sling_staging
--- database_snowflake.go:931 StagePUT ---
~ could not PUT file file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/c07fbe89/snowflake/put/2024-11-08T074911.178.zst
--- database.go:1166 Query ---
~ Error with StreamRows
--- database.go:1179 QueryContext ---
~ SQL Error for:
PUT 'file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/c07fbe89/snowflake/put/2024-11-08T074911.178.zst' @CE_STG.sling_staging/"CE_STG"."PROG_STOCK_VEHICLE_TMP"/2024-11-08T074911.179 PARALLEL=8 AUTO_COMPRESS=FALSE
--- database.go:814 StreamRowsContext ---
~ Error executing query
--- transaction.go:115 QueryContext ---
264006: file does not exist: [/home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/c07fbe89/snowflake/put/2024-11-08T074911.178.zst]

--- task_run.go:97 func1 ---
--- task_run.go:403 runFileToDB ---
--- task_run_write.go:331 WriteToDb ---
~ could not write to database
--- database_snowflake.go:459 BulkImportFlow ---
~ could not insert into "CE_STG"."PROG_STOCK_VEHICLE_TMP".
--- database_snowflake.go:887 CopyViaStage ---

--- database_snowflake.go:792 func5 ---
--- database_snowflake.go:780 func4 ---
~ Error copying to Snowflake Stage: sling_staging
--- database_snowflake.go:931 StagePUT ---
~ could not PUT file file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/c07fbe89/snowflake/put/2024-11-08T074911.178.zst
--- database.go:1166 Query ---
~ Error with StreamRows
--- database.go:1179 QueryContext ---
~ SQL Error for:
PUT 'file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/c07fbe89/snowflake/put/2024-11-08T074911.178.zst' @CE_STG.sling_staging/"CE_STG"."PROG_STOCK_VEHICLE_TMP"/2024-11-08T074911.179 PARALLEL=8 AUTO_COMPRESS=FALSE
--- database.go:814 StreamRowsContext ---
~ Error executing query
--- transaction.go:115 QueryContext ---
264006: file does not exist: [/home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/c07fbe89/snowflake/put/2024-11-08T074911.178.zst]
@flarco flarco added the bug Something isn't working label Nov 18, 2024
@flarco
Copy link
Collaborator

flarco commented Nov 24, 2024

I could not reproduce using this config:

source: local
target: snowflake

defaults:
  mode: full-refresh
  single: true

streams:
  file://sling/tests/files/test1*.csv:
    object: public.test111

Looking at your log, the issue is the duplicate PUT call:

2024-11-08 07:49:12 DBG PUT 'file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/c07fbe89/snowflake/put/2024-11-08T074911.178.zst' @CE_STG.sling_staging/"CE_STG"."PROG_STOCK_VEHICLE_TMP"/2024-11-08T074911.179 PARALLEL=8 AUTO_COMPRESS=FALSE
2024-11-08 07:49:12 DBG PUT 'file:///home/tim/gitbox/eds-warehouse.git/v2/orchestrate/sling/tmp/c07fbe89/snowflake/put/2024-11-08T074911.178.zst' @CE_STG.sling_staging/"CE_STG"."PROG_STOCK_VEHICLE_TMP"/2024-11-08T074911.179 PARALLEL=8 AUTO_COMPRESS=FALSE

The first PUT call deletes the local temp file right away, so the second call errors.
I am not seeing why it's calling PUT twice, it should only call once since it's one file

I wonder if it has to do with the fact that you're adding new columns...
A work around is to put env var SLING_KEEP_TEMP=true, the temp file will not be deleted, and be found. But you shouldn't have to, it should not be calling PUT twice for the same file 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants