-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: DataFrame.to_sql with if_exists='replace' should do truncate table instead of drop table #37210
Comments
One of the problems i can see with using truncate instead of drop is that this add the limitation the the user's dataframe need to have the same columns as the table (which may not always be the case, depending on the usecase). |
Yes I agree with @ukarroum , also the type of a column can change over time. But then again OP is suggesting to add a new method I can see the benefit of truncate over drop, also since the first will be more efficient. |
There are pros and cons of truncate vs. drop. So maybe new method is the answer. Call it if_exists='delete' or something else. If truncate table is not supported by DBMS then this method should lead to 'delete from table' (which is slower than truncate). |
Could it use truncate if the columns are the same and drop otherwise? |
I would assume that if the structure of the table changes a lot, then its not a right use case for a relational table in the first place ? Although, i can understand the datatype changing over time would cause an issue. I feel truncate use case is more common and would benefit more people including me. |
This is a great idea. I think is better to have both options "replace" and "truncate", sometimes you will need "replace" when you want to delete and insert new columns. But "truncate" is a needed option when you need to keep some columns options in the SQL database. For example, with replace the autoincrement option in the "id" column will disappear. TLDR:
|
Also @lam-juice in #8673 pointed out a behaviour that I was also having in my SQL database:
So why not add |
+1 for @igonro 's solution. I'm having issues with my table view object because Having I'm gonna try out this solution for now and see if it works: https://stackoverflow.com/a/67235797 |
Yes, @kmcmearty; that's an easy workaround (truncating manually and then doing the to_sql with append). |
I recently created a pull request adding the |
@pandas-dev/pandas-core if anyone or multiple persons have time, could you reply if you're +1 or -1 on this addition. I am +1 since I our team would use this quite often. @gmcrocetti thank you for taking the time to create the PR. I think it's better that we get some replies from the core team before we spend time on the PR. |
Thanks a ton, much appreciated ! |
There are several proposals here, could you clarify which you're referring to? I think it's the addition of |
Hello @rhshadrach , yes you got it right. It is the addition of |
What's the precedent from other libraries for using truncate? I definitely understand that there are performance implications and that for some databases this can affect auto-increment behavior, but I'm also under the impression that the standard replace with drop was chosen with intention. Is there a prior art for this suggestion in sqlalchemy? What databases support TRUNCATE within a transaction and which don't? |
PostgreSQL and its derivatives support TRUNCATE within a transaction - as an end user looking forward to this feature, the drop implementation requires additional handling to restore permissions, and is not usable on tables with dependencies - TRUNCATE solves both of these problems. |
Hi @WillAyd o/. The proposal is preserve the behavior of AFAIK there's no public API available on sqlalchemy to truncate tables despite this functionally being part of the SQL standard since 2008. There might be a reason for that but I don't know. |
I agree with you @WillAyd , but it does seem that I'm +1 on the proposal to add the |
I am not sure how relevant this is given it was written 15 years ago, but here is what I see upstream for SQLAlchemy: The trap here is going to be assuming what works for some databases (i.e. postgres and MSSQL) is going to work across all databases. AFAIU, we have very little (if any) hard-coded statements in our implementation that are RDBMS-specific, and we don't have a very good CI setup to cover non-standard behavior. So generally I'd say I'm -1 on this, unless it were to be implemented upstream in SQLAlchemy or the DBAPI first. In the interim, users can manage the atomicity of the behavior as needed without too much extra effort, i.e. if you wanted an atomic truncate in postgres I think you could do: engine = ...
with engine.connect() as conn:
conn.execute("TRUNCATE table foo")
df.to_sql("foo", con=conn, if_exists="append") |
Thanks for this analysis. Given that |
Sorry @WillAyd , could you expand on what you meant by that ?
I'm not sure I followed but we don't need to add RDBMS-specific statements. The proposed implementation is using sqlalchemy behind the scenes ( If you're talking about the specifics of how each RDBMS implements the execution of |
Although vaguely worded, I wasn't referring to the
Yes it should, which is why I think adding the df.to_sql("table", con, if_exists="truncate") Will yield implementation-defined behavior in case of failure during append. For postgres, it would treat that as an atomic operation and rollback the truncate. But for a DB like Oracle my guess is it would commit the truncation of "table" and leave it without any records. Since that is tucked away within our API, that puts the onus on us to either bridge those differences or convince people that certain parts of the API are good for some databases yet not for others. |
Got it, thanks for taking the time to reply. Yeap, I do have to recognized you have a fair point. The proposed feature can lead to potential unexpected behaviors on databases where I'm gonna wait for the voting to end but in case it is rejected do you believe we could add the outcome of this discussion somewhere into the documentation ? Or going one step further, would be possible to extend the behavior of this function via a third-party (away for good reasons of the core) ? |
Absolutely +1 to documenting this, either in the I/O tools guide, the to_sql docstrings, or both. As far as a third-party package goes, are you thinking of an entirely new package just for this feature or are you asking if there's a way to integrate that behavior into our existing to_sql call from a third party? The latter does not exist, but I know I/O plugins has historically been an interest of @datapythonista |
Same holds for (As an aside: unlike |
Nice analysis @nickolay . From what I understood this is indeed what happens today. pandas is not resilient to non-transactional DDL statements (Oracle and its derivations). The existing API is not atomic thus adding the |
Thanks to everyone for great views and opinions regarding this feature request! I may not understand every detail, but for me
|
That second link in particular is a good reason why we do not issue these statements directly from within pandas and rely on third parties like SQLAlchemy to abstract this as best as possible. It sounds like your statement may be true for Oracle versions < 11g Release 2 but possibly not thereafter. |
I'm pretty sure what I said about DROP implicitly committing an active transaction applies to the current versions of Oracle as well. I also don't see any evidence of sqlalchemy or pandas using edition-based redefinition for But it's true that unlike DROP+CREATE, If that's the case, is @tokorhon's suggestion of providing a mode to run |
That's an interesting idea; I think delete may be more viable given it is provided by sqlalchemy Maybe the keyword becomes |
The usage of |
So just to summarize what we all discussed so far and members of the core team @pandas-dev/pandas can vote without reading the whole issue:
|
I think it makes sense to go the DELETE route. I don't have a strong opinion on naming - maybe delete_replace? It's a shame that what we have currently as "replace" is better thought of as "recreate" but there's too much history to try and change that |
Is your feature request related to a problem?
Dropping table when if_exists=‘replace’ will fail if the table has any objects, like a view, depending on it. Also, some databases like Oracle will end a transaction implicitly when a DLL statement like 'drop table' is issued.
Describe the solution you'd like
Better alternative would be to issue 'truncate table' command instead.
API breaking implications
Should not have any changes on API
Describe alternatives you've considered
As above or a new flag if_exists=‘truncate’
The text was updated successfully, but these errors were encountered: