Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST (string dtype): clean-up assorted xfails #60354

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

simonjayhawkins
Copy link
Member

@jorisvandenbossche @WillAyd opening as draft and will add more to this PR over time (as I find solvable ones).

@simonjayhawkins simonjayhawkins added the Strings String extension data type and string data label Nov 18, 2024
@simonjayhawkins simonjayhawkins added this to the 2.3 milestone Nov 18, 2024
@simonjayhawkins simonjayhawkins changed the title TST (string dtype): resolve xfails TST (string dtype): clean-up assorted xfails Nov 18, 2024
def test_pivot_columns_is_none(self):
# GH#48293
df = DataFrame({None: [1], "b": 2, "c": 3})
df = DataFrame([[1, 2, 3]], columns=Index([None, "b", "c"], dtype="object"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do need to figure out how this should work with the new string data type; the larger discussion to that is being had in #60329 (comment)

def test_get_dummies_with_str_dtype(any_string_dtype):
@pytest.mark.parametrize("use_string_repr", [True, False])
def test_get_dummies_with_any_string_dtype(
request, any_string_dtype, any_string_dtype2, use_string_repr, using_infer_string
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a new feature for 3.0.0

(that doesn't appear to be fully tested and consistent - hence additional parameterisation addded).

so using using_infer_string is probably not needed as we won't need backwards compatibility once inference is enable by default for 3.0.0

also any changes won't need backporting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented about this on the PR adding this test, see #59577 (comment), because I don't think it makes much sense to allow this. I see there is a PR following up on it to disallow dtype=str in get_dummies (#59786), but we should follow-up on it to ensure it gets into 3.0

(but so conclusion: I would leave this alone for this PR, and focus on fixing it in #59786)

@@ -13,6 +13,10 @@
from pandas._libs import missing as libmissing
from pandas._libs.sparse import IntIndex

from pandas.core.dtypes.cast import (
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this commit has code changes but have pushed to this PR while we discuss

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants