Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Convert output type in Excel for MultiIndex with period levels #60182

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

ZKaoChi
Copy link
Contributor

@ZKaoChi ZKaoChi commented Nov 4, 2024

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Please add tests whenever changing functionality.

pandas/io/formats/excel.py Show resolved Hide resolved
pandas/io/formats/excel.py Show resolved Hide resolved
@rhshadrach rhshadrach changed the title Convert output type in Excel for PeriodIndex in Index or MultiIndex b… BUG: Convert output type in Excel for MultiIndex with period levels Nov 6, 2024
@rhshadrach rhshadrach added Bug IO Excel read_excel, to_excel labels Nov 6, 2024
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good - can you also add one test in excel/test_writers.py for round-tripping (writes a DataFrame to an excel file, reads it, and asserts its the same as the original DataFrame).

pandas/tests/io/excel/test_style.py Outdated Show resolved Hide resolved
pandas/tests/io/excel/test_writers.py Outdated Show resolved Hide resolved
pandas/tests/io/excel/test_writers.py Outdated Show resolved Hide resolved
@ZKaoChi
Copy link
Contributor Author

ZKaoChi commented Nov 12, 2024

Looking good - can you also add one test in excel/test_writers.py for round-tripping (writes a DataFrame to an excel file, reads it, and asserts its the same as the original DataFrame).

My test case gets called 8 times, 6 times the dtype of result is 'datetime64[us]', and the other two times is 'datetime64[s]'. I don't know why this is happening. I had to do unit conversions for result.

@ZKaoChi
Copy link
Contributor Author

ZKaoChi commented Nov 13, 2024

Looking good - can you also add one test in excel/test_writers.py for round-tripping (writes a DataFrame to an excel file, reads it, and asserts its the same as the original DataFrame).

My test case gets called 8 times, 6 times the dtype of result is 'datetime64[us]', and the other two times is 'datetime64[s]'. I don't know why this is happening. I had to do unit conversions for result.

I've found that the dtype of result depends on the suffix of tmp_excel, only in the case of .ods the dtype is datetime64[s] and in other cases it is datetime64[us]. The latest PR has been completed by changing the expected to match result, please check it out. Thanks a lot!!!

@ZKaoChi
Copy link
Contributor Author

ZKaoChi commented Nov 13, 2024

And maybe it can be a new issue to unify the units of datetime in different tmp_excel formats.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a note in the whatsnew for v3.0.0 under the I/O section.

pandas/tests/io/excel/test_writers.py Show resolved Hide resolved
@ZKaoChi
Copy link
Contributor Author

ZKaoChi commented Nov 17, 2024

Can you add a note in the whatsnew for v3.0.0 under the I/O section.

Sure.

@ZKaoChi
Copy link
Contributor Author

ZKaoChi commented Nov 21, 2024

@rhshadrach I'm sorry for the trouble, but could you please let me know if there is anything else to change? Thank you very much!

formatted_cells = formatter._format_hierarchical_rows()

for cell in formatted_cells:
assert not isinstance(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we assert that this is a Timestamp instead?

@@ -691,6 +691,7 @@ I/O
- Bug in :meth:`DataFrame.to_stata` when writing :class:`DataFrame` and ``byteorder=`big```. (:issue:`58969`)
- Bug in :meth:`DataFrame.to_stata` when writing more than 32,000 value labels. (:issue:`60107`)
- Bug in :meth:`DataFrame.to_string` that raised ``StopIteration`` with nested DataFrames. (:issue:`16098`)
- Bug in :meth:`ExcelFormatter._format_hierarchical_rows` where output type in excel for multiIndex with period levels is not a date (:issue:`60099`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Bug in :meth:`ExcelFormatter._format_hierarchical_rows` where output type in excel for multiIndex with period levels is not a date (:issue:`60099`)
- Bug in :meth:`DataFrame.to_excel` where the a :class:`MultiIndex` index with a period level was not a date (:issue:`60099`)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Inconsistent output type in Excel for PeriodIndex in Index or MultiIndex
4 participants