-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG (string): contruction of Series / Index fails from dict keys when "str" dtype is specified explicitly #60343
Comments
Hi Joris! If I fix this, I could send you a PR. Would you be able to merge my PR then or give suggestions on my PR so it can be merged? I have a school assignment deadline of working on an open source good first issue where the owner will at the end merge my PR. I was wondering if you can assign me this and help me? I am a 4th year Computer Engineering major. |
Also, would you be able to tell me what files I should look at for this so I can start? Do I fork the main branch? |
Hi @tasfia8 kindly check the contributing docs: https://pandas.pydata.org/docs/development/contributing.html. For guidance regarding github issue assignment, proper format of PRs, etc... I recommend you to work on an issue with a label |
I have already started working on this, would you be able to assign me this? I think I can do it and I have read the contributing files thank you. |
@tasfia8 - issue assignment can be found on the contributing docs |
take |
@jorisvandenbossche The issue was that dict_keys was passed directly to the StringDtype's _from_sequence method, which could not handle non-array-like inputs like dict_keys. The fix involved updating the handling of dict_keys during the construction of an Index or Series. |
@tasfia8 apologies for the slow response. The output you show is indeed the expected behaviour. |
Done @jorisvandenbossche. Please see #60383. |
When not specifying a dtype (inferring the type), construction of
Index
orSeries
from dict keys goes fine:But if you explicitly specify the dtype, then it fails:
The reason is that at that point we pass the data directly to the dtype's array
_from_sequence
instead of first pre-processing the data into a numpy array, and_from_sequence
callingensure_string_array
directly doesn't seem to be able to handle dict keys (although we do callnp.asarray(..)
insideensure_string_array
, so not entirely sure what is going wrong)The text was updated successfully, but these errors were encountered: