Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse checkout for git pulls #15824

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

tetracionist
Copy link

@tetracionist tetracionist commented Oct 28, 2024

Added support for sparse-checkout when cloning a GitHub repo (#15185).
This is ideal for teams that have larger repos who don't want to clone all the folders in their repo.

When directories are specified in the Prefect.yaml or GitHubRepository storage class, we will use sparse checkout to get the directories and any file in the root directory (cone-mode). Also supports without cone-mode, although this is not recommended in the official git docs (see https://git-scm.com/docs/git-sparse-checkout)

Example usage in a Prefect.yaml

pull:
- prefect.deployments.steps.git_clone:
    repository: https://github.com/tetracionist/prefect.git
    branch: sparse-checkout-for-git-pulls
    access_token:
    directories: [src/integrations/prefect-azure, src/integrations/prefect-dask]
    cone_mode: True # set to true by default, only need to include if False

Checklist

  • This pull request references any related issue by including "closes <link to issue>"
    • If no issue exists and your change is not a small fix, please create an issue first.
  • If this pull request adds new functionality, it includes unit tests that cover the changes
  • If this pull request removes docs files, it includes redirect settings in mint.json.
  • If this pull request adds functions or classes, it includes helpful docstrings.

Copy link

codspeed-hq bot commented Oct 28, 2024

CodSpeed Performance Report

Merging #15824 will not alter performance

Comparing tetracionist:sparse-checkout-for-git-pulls (cb5d049) with main (567138f)

Summary

✅ 3 untouched benchmarks

@desertaxle desertaxle linked an issue Oct 28, 2024 that may be closed by this pull request
Copy link
Member

@desertaxle desertaxle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @tetracionist! This is looking pretty good, but I have a couple of questions about this implementation after a first pass.

Uses sparse-checkout on repository
"""

cmd = ["git", "sparse-checkout", "init"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the init subcommand has been deprecated, and the set subcommand is the preferred approach based on the git docs. It seems like we would update this method to only call git sparse-checkout set with the provided directories, but let me know if that would cause issues.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be okay, I will try using git sparse-checkout set

# Limit git history and set path to clone to
cmd += ["--depth", "1", str(self.destination)]
# For sparse-checkout it is recommended to use --filter=blob:none to reduce disk-space
if self._sparse_checkout_mode:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like if sparse checkout mode is used then you won't be able to include submodules in the checkout. Is is possible to have both?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this in the else block as I was unsure what submodules did, but are they something to do with adding code from another repo into the cloned repo?

I think this would be easy with cone mode but might be trickier without.
Would it be useful to also sparse-checkout the submodule?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, submodules allow you to reference another git repository from a parent repository. I'm not sure how sparse checkout would interact with submodules, but I think it'd be useful to apply sparse checkout to submodules, also.

Copy link
Contributor

This pull request is stale because it has been open 14 days with no activity. To keep this pull request open remove stale label or comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sparse checkout for git pulls
2 participants