feat(repr): add show_count
option to interactive repr
#10518
+50
−47
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resolves #10231
Doesn't have tests yet. I want to get feedback on the semantics before I do that. I assume CI is going to fail all over the place because all the docstrings etc are going to change.
Currently looks like
I made the config default be
show_count=True
. I can change this with pushback, but this is what I would start with. I would love to brainstorm a few benchmark test cases to run to see the perf difference. Some ideas:ibis.duckdb.connect("mydb.db").table("my_table")
(I expect ~0 difference)ibis.duckdb.connect().read_parquet("t.pq")
(I expect~0 difference)ibis.duckdb.connect().read_csv("thousand_rows.csv")
(I expect small difference)ibis.duckdb.connect().read_csv("billion_rows.csv")
(I expect large difference)ibis.duckdb.connect().read_csv("thousand_rows.csv").some_expensive computation()
(I expect a difference, size depends on semantics of the function)I considered adding a
.repr_options
attribute to expressions as described in #10231 (comment), but I decided that was too complicated.I considered showing the table name in the repr, eg with
table.get_name()
, but that is a separate question.Currently, the option has the semantics of
show_count: bool
, and we ALWAYS show the column count. I considered other encodings such asshow_shape: Literal["rows", "cols", "both", None]
, but I thought that was overkill.I considered adding the row count to the bottom of the table, eg something like
but then if you set a high
max_rows
it would be hard to see. Plus then this info would need to be repeated in every column. Anyway, if you have other ideas on the graphic design of where to present the counts, I'm all ears.