Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(repr): add show_count option to interactive repr #10518

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

NickCrews
Copy link
Contributor

@NickCrews NickCrews commented Nov 21, 2024

Resolves #10231

Doesn't have tests yet. I want to get feedback on the semantics before I do that. I assume CI is going to fail all over the place because all the docstrings etc are going to change.

Currently looks like

import ibis

ibis.options.repr.interactive.max_rows = 3
ibis.options.repr.interactive.show_count
# True

t = ibis.memtable({"foo": range(1_000), "bar": range(1_000), "baz": range(1_000)})
t
#   3 cols by 1_000 rows
# ┏━━━━━━━┳━━━━━━━┳━━━━━━━┓
# ┃ foo   ┃ bar   ┃ baz   ┃
# ┡━━━━━━━╇━━━━━━━╇━━━━━━━┩
# │ int64 │ int64 │ int64 │
# ├───────┼───────┼───────┤
# │     0 │     0 │     0 │
# │     1 │     1 │     1 │
# │     2 │     2 │     2 │
# │     … │     … │     … │
# └───────┴───────┴───────┘
t.preview(show_count=False)
#     3 cols by … rows
# ┏━━━━━━━┳━━━━━━━┳━━━━━━━┓
# ┃ foo   ┃ bar   ┃ baz   ┃
# ┡━━━━━━━╇━━━━━━━╇━━━━━━━┩
# │ int64 │ int64 │ int64 │
# ├───────┼───────┼───────┤
# │     0 │     0 │     0 │
# │     1 │     1 │     1 │
# │     2 │     2 │     2 │
# │     … │     … │     … │
# └───────┴───────┴───────┘
t.foo
#   1_000
#   rows
# ┏━━━━━━━┓
# ┃ foo   ┃
# ┡━━━━━━━┩
# │ int64 │
# ├───────┤
# │     0 │
# │     1 │
# │     2 │
# │     … │
# └───────┘
t.foo.preview(show_count=False)
# We could be more concise here, and not show anything.
# But I think showing something is good for consistency and discoverability.
#  … rows
# ┏━━━━━━━┓
# ┃ foo   ┃
# ┡━━━━━━━┩
# │ int64 │
# ├───────┤
# │     0 │
# │     1 │
# │     2 │
# │     … │
# └───────┘

I made the config default be show_count=True. I can change this with pushback, but this is what I would start with. I would love to brainstorm a few benchmark test cases to run to see the perf difference. Some ideas:

  1. ibis.duckdb.connect("mydb.db").table("my_table") (I expect ~0 difference)
  2. ibis.duckdb.connect().read_parquet("t.pq") (I expect~0 difference)
  3. ibis.duckdb.connect().read_csv("thousand_rows.csv") (I expect small difference)
  4. ibis.duckdb.connect().read_csv("billion_rows.csv") (I expect large difference)
  5. ibis.duckdb.connect().read_csv("thousand_rows.csv").some_expensive computation() (I expect a difference, size depends on semantics of the function)

I considered adding a .repr_options attribute to expressions as described in #10231 (comment), but I decided that was too complicated.

I considered showing the table name in the repr, eg with table.get_name(), but that is a separate question.

Currently, the option has the semantics of show_count: bool, and we ALWAYS show the column count. I considered other encodings such as show_shape: Literal["rows", "cols", "both", None], but I thought that was overkill.

I considered adding the row count to the bottom of the table, eg something like

┏━━━━━━━┓
┃ foo   ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     0 │
│     1 │
│     2 │
│     … │
│ 1_000 total rows│
└───────┘

but then if you set a high max_rows it would be hard to see. Plus then this info would need to be repeated in every column. Anyway, if you have other ideas on the graphic design of where to present the counts, I'm all ears.

@lboller-pwbm
Copy link

lboller-pwbm commented Nov 21, 2024

This looks fantastic!

I tried it out in a Jupyter kernel with a fairly wide table, which appears as a horizontally scrollable output using VSCode's interactive window. The dimensions appear but are center-justified, so the user has to scroll horizontally quite a bit to see them. I would suggest making it left-justified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: show df.count(), len(df.schema()) when displaying interactive = True tables
2 participants