ENH: DataFrame.to_sql with if_exists='replace' should do truncate table instead of drop table #37210

tokorhon · 2020-10-17T22:39:28Z

Is your feature request related to a problem?

Dropping table when if_exists=‘replace’ will fail if the table has any objects, like a view, depending on it. Also, some databases like Oracle will end a transaction implicitly when a DLL statement like 'drop table' is issued.

Describe the solution you'd like

Better alternative would be to issue 'truncate table' command instead.

API breaking implications

Should not have any changes on API

Describe alternatives you've considered

As above or a new flag if_exists=‘truncate’

ukarroum · 2020-10-18T12:40:50Z

One of the problems i can see with using truncate instead of drop is that this add the limitation the the user's dataframe need to have the same columns as the table (which may not always be the case, depending on the usecase).

erfannariman · 2020-10-18T14:41:54Z

Yes I agree with @ukarroum , also the type of a column can change over time. But then again OP is suggesting to add a new method if_exists='truncate' which is a bit confusing, because the title states otherwise.

I can see the benefit of truncate over drop, also since the first will be more efficient.

tokorhon · 2020-10-18T15:49:23Z

There are pros and cons of truncate vs. drop. So maybe new method is the answer. Call it if_exists='delete' or something else. If truncate table is not supported by DBMS then this method should lead to 'delete from table' (which is slower than truncate).

fiendish · 2021-09-22T15:16:52Z

Could it use truncate if the columns are the same and drop otherwise?

mancunian1792 · 2022-01-03T06:40:43Z

One of the problems i can see with using truncate instead of drop is that this add the limitation the the user's dataframe need to have the same columns as the table (which may not always be the case, depending on the usecase).

I would assume that if the structure of the table changes a lot, then its not a right use case for a relational table in the first place ? Although, i can understand the datatype changing over time would cause an issue. I feel truncate use case is more common and would benefit more people including me.

igonro · 2022-03-24T13:39:29Z

This is a great idea. I think is better to have both options "replace" and "truncate", sometimes you will need "replace" when you want to delete and insert new columns. But "truncate" is a needed option when you need to keep some columns options in the SQL database. For example, with replace the autoincrement option in the "id" column will disappear.

TLDR:

replace: when your db table columns change
truncate: when you need to keep your db table columns

igonro · 2022-03-24T13:45:13Z

Also @lam-juice in #8673 pointed out a behaviour that I was also having in my SQL database:

Is it possible to make pandas delete all the rows instead of dropping the table for if_exists='replace'?
Having to drop the table results in deadlocks when there's a simultaneous SELECT going on, while a simple deletion would avoid such deadlocks.

So why not add truncate option, it solves some problems (deadlocks, column information loss), it preserves the current behaviour of replace, and I don't think it the implementation would be hard.

kylemcmearty · 2023-03-09T16:31:20Z

+1 for @igonro 's solution.

I'm having issues with my table view object because replace drops the table.

Having truncate and replace options would be great.

I'm gonna try out this solution for now and see if it works: https://stackoverflow.com/a/67235797

igonro · 2023-03-10T09:25:55Z

Yes, @kmcmearty; that's an easy workaround (truncating manually and then doing the to_sql with append).

gmcrocetti · 2024-08-23T17:18:05Z

I recently created a pull request adding the truncate option.

erfannariman · 2024-10-12T19:56:30Z

@pandas-dev/pandas-core if anyone or multiple persons have time, could you reply if you're +1 or -1 on this addition. I am +1 since I our team would use this quite often.

@gmcrocetti thank you for taking the time to create the PR. I think it's better that we get some replies from the core team before we spend time on the PR.

gmcrocetti · 2024-10-12T21:37:39Z

@pandas-dev/pandas-core if anyone or multiple persons have time, could you reply if you're +1 or -1 on this addition. I am +1 since I our team would use this quite often.

@gmcrocetti thank you for taking the time to create the PR. I think it's better that we get some replies from the core team before we spend time on the PR.

Thanks a ton, much appreciated !

rhshadrach · 2024-10-13T20:15:39Z

could you reply if you're +1 or -1 on this addition.

There are several proposals here, could you clarify which you're referring to? I think it's the addition of truncate rather than the original request in the OP.

gmcrocetti · 2024-10-13T21:45:30Z

Hello @rhshadrach , yes you got it right. It is the addition of truncate indeed.

WillAyd · 2024-10-14T01:48:31Z

if_exists="truncate" reads a little strange, since it actually replaces (at least from what I understand with the proposal).

What's the precedent from other libraries for using truncate? I definitely understand that there are performance implications and that for some databases this can affect auto-increment behavior, but I'm also under the impression that the standard replace with drop was chosen with intention.

Is there a prior art for this suggestion in sqlalchemy? What databases support TRUNCATE within a transaction and which don't?

shortywz · 2024-10-14T03:57:58Z

PostgreSQL and its derivatives support TRUNCATE within a transaction - as an end user looking forward to this feature, the drop implementation requires additional handling to restore permissions, and is not usable on tables with dependencies - TRUNCATE solves both of these problems.

gmcrocetti · 2024-10-14T12:15:37Z

Hi @WillAyd o/.

The proposal is preserve the behavior of replace as is today - no changes here. On the other hand adding a new option named truncate that instead of replacing a table will wipe out the data of the table and "restart" it. I don't think performance is the only pain point. Most of times you don't want to (or can't) recreate the "structure" of the table: indexes, permissions, triggers, etc. All one want is to remove all data and restart from fresh.

AFAIK there's no public API available on sqlalchemy to truncate tables despite this functionally being part of the SQL standard since 2008. There might be a reason for that but I don't know.

Dr-Irv · 2024-10-14T14:30:20Z

if_exists="truncate" reads a little strange, since it actually replaces (at least from what I understand with the proposal).

I agree with you @WillAyd , but it does seem that TRUNCATE is used in SQL to represent the operation of deleting all the rows and then adding new rows to "replace" the ones that are deleted.

I'm +1 on the proposal to add the if_exists="truncate" option.

WillAyd · 2024-10-14T15:05:53Z

I am not sure how relevant this is given it was written 15 years ago, but here is what I see upstream for SQLAlchemy:

sqlalchemy/sqlalchemy#1474

The trap here is going to be assuming what works for some databases (i.e. postgres and MSSQL) is going to work across all databases. AFAIU, we have very little (if any) hard-coded statements in our implementation that are RDBMS-specific, and we don't have a very good CI setup to cover non-standard behavior.

So generally I'd say I'm -1 on this, unless it were to be implemented upstream in SQLAlchemy or the DBAPI first. In the interim, users can manage the atomicity of the behavior as needed without too much extra effort, i.e. if you wanted an atomic truncate in postgres I think you could do:

engine = ...
with engine.connect() as conn:
    conn.execute("TRUNCATE table foo")
    df.to_sql("foo", con=conn, if_exists="append")

Dr-Irv · 2024-10-14T16:38:49Z

The trap here is going to be assuming what works for some databases (i.e. postgres and MSSQL) is going to work across all databases. AFAIU, we have very little (if any) hard-coded statements in our implementation that are RDBMS-specific, and we don't have a very good CI setup to cover non-standard behavior.

Thanks for this analysis. Given that TRUNCATE is not supported for all DB's, and there is an alternate solution, I switch my vote to -1.

gmcrocetti · 2024-10-14T17:12:54Z

Sorry @WillAyd , could you expand on what you meant by that ?

The trap here is going to be assuming what works for some databases (i.e. postgres and MSSQL) is going to work across all databases. AFAIU, we have very little (if any) hard-coded statements in our implementation that are RDBMS-specific, and we don't have a very good CI setup to cover non-standard behavior.

I'm not sure I followed but we don't need to add RDBMS-specific statements. The proposed implementation is using sqlalchemy behind the scenes (.execute(f"TRUNCATE TABLE {table}")). If the specific database doesn't support it then an exception is raised.

If you're talking about the specifics of how each RDBMS implements the execution of TRUNCATE then shouldn't pandas delegate this responsibility to the user - of course providing proper documentation ?

WillAyd · 2024-10-14T18:38:00Z

I'm not sure I followed but we don't need to add RDBMS-specific statements. The proposed implementation is using sqlalchemy behind the scenes (.execute(f"TRUNCATE TABLE {table}")).

Although vaguely worded, I wasn't referring to the TRUNCATE statement itself as much as any accompanying statements that we would have to implement to try and make this behavior work consistently across all RDBMS'es, if that is even possible (not sure it is)

If you're talking about the specifics of how each RDBMS implements the execution of TRUNCATE then shouldn't pandas delegate this responsibility to the user - of course providing proper documentation ?

Yes it should, which is why I think adding the truncate argument as proposed has some flaws. The pandas API is supposed to be atomic and avoid any implementation-defined behavior. Unfortunately with truncate, an expression of the form:

df.to_sql("table", con, if_exists="truncate")

Will yield implementation-defined behavior in case of failure during append. For postgres, it would treat that as an atomic operation and rollback the truncate. But for a DB like Oracle my guess is it would commit the truncation of "table" and leave it without any records.

Since that is tucked away within our API, that puts the onus on us to either bridge those differences or convince people that certain parts of the API are good for some databases yet not for others.

gmcrocetti · 2024-10-14T19:20:31Z

Got it, thanks for taking the time to reply.

Yeap, I do have to recognized you have a fair point. The proposed feature can lead to potential unexpected behaviors on databases where TRUNCATE is a DDL instead of a DML (as in postgres and its derivatives). This burden shouldn't be added to pandas as it delegates to SQLalchemy.

I'm gonna wait for the voting to end but in case it is rejected do you believe we could add the outcome of this discussion somewhere into the documentation ? Or going one step further, would be possible to extend the behavior of this function via a third-party (away for good reasons of the core) ?

WillAyd · 2024-10-14T19:47:57Z

Absolutely +1 to documenting this, either in the I/O tools guide, the to_sql docstrings, or both.

As far as a third-party package goes, are you thinking of an entirely new package just for this feature or are you asking if there's a way to integrate that behavior into our existing to_sql call from a third party? The latter does not exist, but I know I/O plugins has historically been an interest of @datapythonista

nickolay · 2024-10-15T09:53:01Z

The pandas API is supposed to be atomic and avoid any implementation-defined behavior. Unfortunately with truncate, an expression of the form: df.to_sql("table", con, if_exists="truncate") Will yield implementation-defined behavior in case of failure during append.

Same holds for if_exists="replace", no? A DROP TABLE preceeding a failed CREATE also won't be rolled back in Oracle and this behavior is inconsistent across databases.

(As an aside: unlike truncate, if_exists=replace fails to even load unicode data into MSSQL properly, #35627 since forever, so pretending pandas' database abstractions are or even can be non-leaky is wishful thinking.)

gmcrocetti · 2024-10-15T11:06:10Z

Nice analysis @nickolay . From what I understood this is indeed what happens today. pandas is not resilient to non-transactional DDL statements (Oracle and its derivations). The existing API is not atomic thus adding the truncate option will not change existing behavior.

tokorhon · 2024-10-15T12:30:36Z

Thanks to everyone for great views and opinions regarding this feature request! I may not understand every detail, but for me

the name of the value of if_exists is not relevant. It can be something else than truncate since SQL truncate operation is not available in all databases
important is that there should be another way to empty a table that drop + recreate. In databases not supporting truncate it can lead to delete from table.
as stated in the original post, drop + recreate does not work if there are objects depending of the table.

WillAyd · 2024-10-15T12:50:30Z

Same holds for if_exists="replace", no? A DROP TABLE preceeding a failed CREATE also won't be rolled back in Oracle and this behavior is inconsistent across databases.

That second link in particular is a good reason why we do not issue these statements directly from within pandas and rely on third parties like SQLAlchemy to abstract this as best as possible. It sounds like your statement may be true for Oracle versions < 11g Release 2 but possibly not thereafter.

nickolay · 2024-10-15T15:02:30Z

I'm pretty sure what I said about DROP implicitly committing an active transaction applies to the current versions of Oracle as well. I also don't see any evidence of sqlalchemy or pandas using edition-based redefinition for if_exists="replace" (and would be very surprised if it did so by default!)

But it's true that unlike DROP+CREATE, TRUNCATE TABLE is an optional and underspecified feature of the SQL standard, so I can see where you're coming from. I was surprised to find that pandas' sqlalchemy backend seems to have 0 instances of raw SQL as of now, and can understand if the maintainers would like to keep it that way.

If that's the case, is @tokorhon's suggestion of providing a mode to run DELETE FROM table; instead a viable alternative? It IS widely compatible across databases, works fine with transactions, and has an API in sqlalchemy. I agree performance is not the main concern here, as in many scenarios the sqlalchemy-based load process will be the bottleneck before the DELETE.

WillAyd · 2024-10-15T15:57:19Z

That's an interesting idea; I think delete may be more viable given it is provided by sqlalchemy

Maybe the keyword becomes if_exists="replace_rows" to differentiate from the existing case?

gmcrocetti · 2024-10-15T17:02:43Z

The usage of DELETE FROM instead of TRUNCATE for sure has some side effects. But being pragmatic they will achieve the very same result by the end of the day. I'd suggest using delete_rows to replicate the semantics of what's going on behind the scenes.

gmcrocetti · 2024-10-17T17:29:31Z

So just to summarize what we all discussed so far and members of the core team @pandas-dev/pandas can vote without reading the whole issue:

`if_exists=truncate` 1️⃣

The truncate option would mimic SQL's standard TRUNCATE TABLE behavior.

Pros

Performance wise Truncate is pretty fast and has very few drawbacks when compared to DELETE FROM
The table is not deleted, but data is wiped outn - truncate preserves index, permissions and etc.

Cons

Not widely support by all databases and when supported the implementation varies a ton;
- In some databases running this command creates a implicit commit which can lead to non-atomic actions on pandas.
SQLAlchemy does not provide an API for that therefore we need to add raw SQL into the codebase.

`if_exists=delete_rows` (replace_rows) 2️⃣

don't get hung up on the name (delete_rows). Anything that represents that rows are deleted but not the table is acceptable.

Pros

SQLAlchemy provides an API.
Supported by all SQL databases

Cons

Not as performant as truncate
Will activate triggers (in case they exist)

Feel free to vote with 1️⃣ or 2️⃣ reactions.

gmcrocetti · 2024-11-19T21:04:28Z

Hello @WillAyd,

Have you had the time to think about this matter ? I tried to summarize what we discussed right here ☝️.

@pandas-dev/pandas-core second thoughts are welcome. Let's bring a conclusion to this issue :)

WillAyd · 2024-11-19T21:08:31Z

I think it makes sense to go the DELETE route. I don't have a strong opinion on naming - maybe delete_replace?

It's a shame that what we have currently as "replace" is better thought of as "recreate" but there's too much history to try and change that

tokorhon added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 17, 2020

jbrockmendel added the IO SQL to_sql, read_sql, read_sql_query label Oct 25, 2020

mroeschke added Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 13, 2021

lzieglerpenn mentioned this issue Dec 6, 2022

Truncate table option for to sql #50088

Closed

5 tasks

lzieglerpenn mentioned this issue Apr 27, 2023

ENH: Add 'truncate' option for if_exists arg for DataFrame.to_sql method #52961

Closed

5 tasks

gmcrocetti mentioned this issue Aug 2, 2024

ENH: Implement option 'truncate' of argument if_exists in DataFrame.to_sql #59391

Closed

5 tasks

gmcrocetti linked a pull request Nov 20, 2024 that will close this issue

feat: Implement option 'delete_rows' of argument 'if_exists' in 'DataFrame.to_sql' API. #60376

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: DataFrame.to_sql with if_exists='replace' should do truncate table instead of drop table #37210

ENH: DataFrame.to_sql with if_exists='replace' should do truncate table instead of drop table #37210

tokorhon commented Oct 17, 2020 •

edited

Loading

ukarroum commented Oct 18, 2020

erfannariman commented Oct 18, 2020

tokorhon commented Oct 18, 2020

fiendish commented Sep 22, 2021

mancunian1792 commented Jan 3, 2022

igonro commented Mar 24, 2022

igonro commented Mar 24, 2022

kylemcmearty commented Mar 9, 2023

igonro commented Mar 10, 2023

gmcrocetti commented Aug 23, 2024

erfannariman commented Oct 12, 2024

gmcrocetti commented Oct 12, 2024

rhshadrach commented Oct 13, 2024

gmcrocetti commented Oct 13, 2024 •

edited

Loading

WillAyd commented Oct 14, 2024

shortywz commented Oct 14, 2024 •

edited

Loading

gmcrocetti commented Oct 14, 2024 •

edited

Loading

Dr-Irv commented Oct 14, 2024

WillAyd commented Oct 14, 2024

Dr-Irv commented Oct 14, 2024

gmcrocetti commented Oct 14, 2024 •

edited

Loading

WillAyd commented Oct 14, 2024

gmcrocetti commented Oct 14, 2024

WillAyd commented Oct 14, 2024

nickolay commented Oct 15, 2024

gmcrocetti commented Oct 15, 2024 •

edited

Loading

tokorhon commented Oct 15, 2024

WillAyd commented Oct 15, 2024

nickolay commented Oct 15, 2024

WillAyd commented Oct 15, 2024

gmcrocetti commented Oct 15, 2024

gmcrocetti commented Oct 17, 2024 •

edited

Loading

gmcrocetti commented Nov 19, 2024 •

edited

Loading

WillAyd commented Nov 19, 2024

ENH: DataFrame.to_sql with if_exists='replace' should do truncate table instead of drop table #37210

ENH: DataFrame.to_sql with if_exists='replace' should do truncate table instead of drop table #37210

Comments

tokorhon commented Oct 17, 2020 • edited Loading

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Describe alternatives you've considered

ukarroum commented Oct 18, 2020

erfannariman commented Oct 18, 2020

tokorhon commented Oct 18, 2020

fiendish commented Sep 22, 2021

mancunian1792 commented Jan 3, 2022

igonro commented Mar 24, 2022

igonro commented Mar 24, 2022

kylemcmearty commented Mar 9, 2023

igonro commented Mar 10, 2023

gmcrocetti commented Aug 23, 2024

erfannariman commented Oct 12, 2024

gmcrocetti commented Oct 12, 2024

rhshadrach commented Oct 13, 2024

gmcrocetti commented Oct 13, 2024 • edited Loading

WillAyd commented Oct 14, 2024

shortywz commented Oct 14, 2024 • edited Loading

gmcrocetti commented Oct 14, 2024 • edited Loading

Dr-Irv commented Oct 14, 2024

WillAyd commented Oct 14, 2024

Dr-Irv commented Oct 14, 2024

gmcrocetti commented Oct 14, 2024 • edited Loading

WillAyd commented Oct 14, 2024

gmcrocetti commented Oct 14, 2024

WillAyd commented Oct 14, 2024

nickolay commented Oct 15, 2024

gmcrocetti commented Oct 15, 2024 • edited Loading

tokorhon commented Oct 15, 2024

WillAyd commented Oct 15, 2024

nickolay commented Oct 15, 2024

WillAyd commented Oct 15, 2024

gmcrocetti commented Oct 15, 2024

gmcrocetti commented Oct 17, 2024 • edited Loading

if_exists=truncate 1️⃣

Pros

Cons

if_exists=delete_rows (replace_rows) 2️⃣

Pros

Cons

gmcrocetti commented Nov 19, 2024 • edited Loading

WillAyd commented Nov 19, 2024

tokorhon commented Oct 17, 2020 •

edited

Loading

gmcrocetti commented Oct 13, 2024 •

edited

Loading

shortywz commented Oct 14, 2024 •

edited

Loading

gmcrocetti commented Oct 14, 2024 •

edited

Loading

gmcrocetti commented Oct 14, 2024 •

edited

Loading

gmcrocetti commented Oct 15, 2024 •

edited

Loading

gmcrocetti commented Oct 17, 2024 •

edited

Loading

`if_exists=truncate` 1️⃣

`if_exists=delete_rows` (replace_rows) 2️⃣

gmcrocetti commented Nov 19, 2024 •

edited

Loading