Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized traversal of schema tree for schema cleaning (GenerateSchema.clean_schema) #1487

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

MarkusSintonen
Copy link
Contributor

@MarkusSintonen MarkusSintonen commented Oct 18, 2024

Change Summary

Adds schema tree traversal which gathers necessary schema nodes and information for schema inlining and discriminator handling. Schema tree traversal is done in a single pass gathering the needed information. This is used in GenerateSchema.clean_schema handling. Required for PR pydantic/pydantic#10655 This makes schema cleaning much more efficient where the biggest bottleneck has been the Python side tree traversal. This especially with lots of models or deep models.

Related issue number

See above Pydantic side PR.

Checklist

  • Unit tests for the changes exist
  • Documentation reflects the changes where applicable
  • Pydantic tests pass with this pydantic-core (except for expected changes)
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @sydney-runkle

Copy link

codecov bot commented Oct 18, 2024

Codecov Report

Attention: Patch coverage is 98.88889% with 1 line in your changes missing coverage. Please review.

Project coverage is 89.63%. Comparing base (ab503cb) to head (29b0ebc).
Report is 230 commits behind head on main.

Files with missing lines Patch % Lines
src/schema_traverse.rs 98.75% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1487      +/-   ##
==========================================
- Coverage   90.21%   89.63%   -0.58%     
==========================================
  Files         106      113       +7     
  Lines       16339    17984    +1645     
  Branches       36       40       +4     
==========================================
+ Hits        14740    16120    +1380     
- Misses       1592     1844     +252     
- Partials        7       20      +13     
Files with missing lines Coverage Δ
python/pydantic_core/__init__.py 93.10% <100.00%> (+0.51%) ⬆️
src/lib.rs 100.00% <100.00%> (+12.85%) ⬆️
src/schema_traverse.rs 98.75% <98.75%> (ø)

... and 54 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 061711f...29b0ebc. Read the comment docs.

Copy link

codspeed-hq bot commented Oct 18, 2024

CodSpeed Performance Report

Merging #1487 will not alter performance

Comparing MarkusSintonen:schema-gather (29b0ebc) with main (6472887)

Summary

✅ 155 untouched benchmarks

🆕 2 new benchmarks

Benchmarks breakdown

Benchmark main MarkusSintonen:schema-gather Change
🆕 test_nested_schema_inlined N/A 11.2 ms N/A
🆕 test_nested_schema_using_defs N/A 61 µs N/A

@MarkusSintonen MarkusSintonen changed the title Optimized traversal for schema node gathering for schema cleaning Optimized traversal of schema nodes for schema cleaning Oct 18, 2024
@MarkusSintonen
Copy link
Contributor Author

please review

@MarkusSintonen MarkusSintonen changed the title Optimized traversal of schema nodes for schema cleaning Optimized traversal of schema tree for schema cleaning (GenerateSchema.clean_schema) Oct 19, 2024
Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me. I haven't thought too much about the individual schema types, but I had a couple of possible optimization ideas :)

pyproject.toml Outdated Show resolved Hide resolved
src/schema_traverse.rs Outdated Show resolved Hide resolved
meta_with_keys: Option<(Bound<'py, PyDict>, &'a Bound<'py, PySet>)>,
def_refs: Bound<'py, PyDict>,
recursive_def_refs: Bound<'py, PySet>,
recursively_seen_refs: HashSet<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be an optimization to store these as Python strings to avoid round-trips:

Suggested change
recursively_seen_refs: HashSet<String>,
recursively_seen_refs: HashSet<PyBackedStr>,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering does it help here as the first contains check anyways converts it into rust string right away?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to using PySet, it seems to be the fastest here. (Also faster than HashSet<PyBackedStr>)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Python strings might cache their hash, I don't think PyO3 uses that right now though. Interesting observation 👍

@MarkusSintonen MarkusSintonen force-pushed the schema-gather branch 2 times, most recently from b4aed80 to 14e3137 Compare October 23, 2024 17:16
Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I am concerned, this side looks good to me! Thanks 👍

@MarkusSintonen MarkusSintonen force-pushed the schema-gather branch 2 times, most recently from 1a3071e to 22aa6a2 Compare November 14, 2024 13:35
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good in principle, but it needs comprehensive docstrings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants