-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Copy attrs on join (possibly depending on left, right, etc.) #60351
Comments
This should be handled the same way as
Line 6056 in 7fe270c
but we may want to delegate the attrs combination back to the operation instead, i.e.
Note that Alternatively, one could leave the combination logic in |
Thanks for the report! @timhoffm - can you post a reproducible example. |
This uses the same logic as `pd.concat()`: Copy `attrs` only if all input `attrs` are identical. I've refactored the handling in __finalize__ from special-casing based on th the method name (previously only "concat") to handling "other" parameters that have an `input_objs` attribute. This is a more scalable architecture compared to hard-coding method names in __finalize__. Tests added for `concat()` and `merge()`. Closes pandas-dev#60351.
#60357 should fix this. I've choosen the somewhat smaller refactoring and not pushed the combination logic back into the "combination objects". In fact #59141 removed Note that |
This uses the same logic as `pd.concat()`: Copy `attrs` only if all input `attrs` are identical. I've refactored the handling in __finalize__ from special-casing based on th the method name (previously only "concat") to handling "other" parameters that have an `input_objs` attribute. This is a more scalable architecture compared to hard-coding method names in __finalize__. Tests added for `concat()` and `merge()`. Closes pandas-dev#60351.
This uses the same logic as `pd.concat()`: Copy `attrs` only if all input `attrs` are identical. I've refactored the handling in __finalize__ from special-casing based on th the method name (previously only "concat") to handling "other" parameters that have an `input_objs` attribute. This is a more scalable architecture compared to hard-coding method names in __finalize__. Tests added for `concat()` and `merge()`. Closes pandas-dev#60351.
This uses the same logic as `pd.concat()`: Copy `attrs` only if all input `attrs` are identical. I've refactored the handling in __finalize__ from special-casing based on th the method name (previously only "concat") to handling "other" parameters that have an `input_objs` attribute. This is a more scalable architecture compared to hard-coding method names in __finalize__. Tests added for `concat()` and `merge()`. Closes pandas-dev#60351.
This uses the same logic as `pd.concat()`: Copy `attrs` only if all input `attrs` are identical. I've refactored the handling in __finalize__ from special-casing based on th the method name (previously only "concat") to handling "other" parameters that have an `input_objs` attribute. This is a more scalable architecture compared to hard-coding method names in __finalize__. Tests added for `concat()` and `merge()`. Closes pandas-dev#60351.
I disagree because concat is a symmetric operation operating on a list of dfs but join is an operation that joins other data to a specific dataframe. The properties of the initial dataframe should then be maintained unless pandas is instructed otherwise. It's as if you start dropping rows from your dataframe after a join just because the added data has some NaNs. |
I recommend to be defensive on all operations that have multiple DataFrames as input. The inputs may have contradicting Information in their attrs and joining them can lead to misinterpretation. Say, Instead, I propose to implement a |
Something like set_attrs() saves half a line of boilerplate code. But the main cost is cognitive: users constantly need to chase after the attrs to make sure they are copied. If the manual says that most operations copy them, I would expect any operation to copy those attrs that look like a reasonable default. Otherwise I expect the manual to say that attrs are usually dropped unless pandas is absolutely sure that the user wants to maintain the attrs. I would even argue that on a concat the most sensible thing to do would be to merge the attrs by field and throw a warning in case of conflicting fields. |
We can possibly improve the docs. Concerning the behavior, there are two aspects stemming from the experimental state of the feature
This has lead to the suggestion to deprecate My personal take (as a non-pandas developer but advocate for attrs) on the intended behavior is to be very defensive and minimal:
|
I'm supportive of @timhoffm's proposal. Users can also override this behavior via subclassing. import functools as ft
import pandas as pd
class Foo(pd.DataFrame):
@property
def _constructor(self):
return Foo
@ft.wraps(DataFrame.join)
def join(self, other, *args, **kwargs):
self_attrs = self.attrs
other_attrs = other.attrs
result = super().join(other, *args, **kwargs)
result.attrs = self_attrs | other_attrs
return result
df1 = Foo({"a": [1, 2, 3], "b": [3, 4, 5]}).set_index("a")
df1.attrs = {"key1": "value1"}
df2 = Foo({"a": [2, 3, 4], "c": [6, 7, 8]}).set_index("a")
df2.attrs = {"key2": "value2"}
result = df1.join(df2, how="inner")
print(result)
# b c
# a
# 2 4 6
# 3 5 7
print(result.attrs)
# {'key1': 'value1', 'key2': 'value2'} Perhaps that's not helpful in all cases, but hopefully is in many. |
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
df.join() does not retain the attrs of the dataset. Given that the attrs manual states that "many operations that create new datasets will retain attrs", this seems like an omission.
Feature Description
Join is different from concat because there is a clear dataframe that the operation is on. Therefore, it would seem natural if df.join() would retain the attrs of the initial dataframe.
Alternative Solutions
It would also be possible to make the attrs dependent on "how" but this would only be natural for "left" and "right".
Additional Context
No response
The text was updated successfully, but these errors were encountered: