-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug Fix: #60343 Construction of Series / Index fails from dict keys when "str" dtype is specified explicitly #60383
base: main
Are you sure you want to change the base?
Conversation
Hi, I'm a student contributing to this PR and am on bit of a time crunch due to finals. For my school project, my task is to merge the PR as quickly as possible with the help and guidance of maintainers. I was able to fix the bug but I am a bit stuck on how to fix the checks. Could @jorisvandenbossche or anyone else help? Especially the unit tests ones. I tried to fix the pre-commit (using ruff lint fix) but every time I fixed a formatting issue, after running pre-commit it goes to the initial position before I did the fix. For the Doc build and upload check (it was giving an error for every declaration of ipython that didn't have import pandas as pd), I manually inserted it but don't know if there is an easy way. |
The default behaviour (pd.Index(d.keys())) worked correctly, but explicitly setting dtype="str" raised a ValueError. The issue stemmed from dict_keys not being converted to a proper array-like structure before being passed to StringDtype, which couldn't handle such inputs.
To fix the issue:
KeyView was introduced to identify and preprocess dict_keys before passing them to Pandas internals. The keys are now converted to a list for compatibility.
Updated logic in Index and sanitize_array to map dtype="str" to StringDtype(storage="python"). Updated check_array_indexer to allow empty boolean indexers for StringArray
New test added "test_index_from_dict_keys_with_dtype" to ensure:
Default inference (pd.Index(d.keys())) works.
Explicit dtype="str" works, resulting in string[python].
Updated existing tests (test_is_object and test_empty_fancy) to handle new behaviours introduced by the fix.
After the fix both the default (pd.Index(d.keys())) and explicit (pd.Index(d.keys(), dtype="str")) cases work: