-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized traversal of schema tree for schema cleaning (GenerateSchema.clean_schema
)
#1487
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1487 +/- ##
==========================================
- Coverage 90.21% 89.63% -0.58%
==========================================
Files 106 113 +7
Lines 16339 17984 +1645
Branches 36 40 +4
==========================================
+ Hits 14740 16120 +1380
- Misses 1592 1844 +252
- Partials 7 20 +13
... and 54 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
CodSpeed Performance ReportMerging #1487 will not alter performanceComparing Summary
Benchmarks breakdown
|
18a29c4
to
f2b4fc2
Compare
please review |
GenerateSchema.clean_schema
)
0ea8bb3
to
7eae801
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable to me. I haven't thought too much about the individual schema types, but I had a couple of possible optimization ideas :)
src/schema_traverse.rs
Outdated
meta_with_keys: Option<(Bound<'py, PyDict>, &'a Bound<'py, PySet>)>, | ||
def_refs: Bound<'py, PyDict>, | ||
recursive_def_refs: Bound<'py, PySet>, | ||
recursively_seen_refs: HashSet<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be an optimization to store these as Python strings to avoid round-trips:
recursively_seen_refs: HashSet<String>, | |
recursively_seen_refs: HashSet<PyBackedStr>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering does it help here as the first contains
check anyways converts it into rust string right away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to using PySet, it seems to be the fastest here. (Also faster than HashSet<PyBackedStr>
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Python strings might cache their hash, I don't think PyO3 uses that right now though. Interesting observation 👍
b4aed80
to
14e3137
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I am concerned, this side looks good to me! Thanks 👍
1a3071e
to
22aa6a2
Compare
22aa6a2
to
5c9c9e9
Compare
This reverts commit 90418d9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good in principle, but it needs comprehensive docstrings.
Change Summary
Adds schema tree traversal which gathers necessary schema nodes and information for schema inlining and discriminator handling. Schema tree traversal is done in a single pass gathering the needed information. This is used in
GenerateSchema.clean_schema
handling. Required for PR pydantic/pydantic#10655 This makes schema cleaning much more efficient where the biggest bottleneck has been the Python side tree traversal. This especially with lots of models or deep models.Related issue number
See above Pydantic side PR.
Checklist
pydantic-core
(except for expected changes)Selected Reviewer: @sydney-runkle