Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement automated migrations #56

Open
infinisil opened this issue Apr 22, 2024 · 13 comments
Open

Implement automated migrations #56

infinisil opened this issue Apr 22, 2024 · 13 comments

Comments

@infinisil
Copy link
Member

infinisil commented Apr 22, 2024

Most packages in Nixpkgs aren't defined by pkgs/by-name yet, even if they could. While one could migrate them manually, that would take a long time. Instead we should automate this. Figuring out how to do large-scale automated migrations will also be of great benefit to many other future changes in Nixpkgs.

In this issue I'll try to explain the pre-existing work and my ideas on how to go about this in the current codebase, such that others could implement this.

The task

The pkgs/by-name directory has certain validity checks that need to be fulfilled. For the migration in particular, these are most important:

  1. Each package directory (e.g. pkgs/by-name/he/hello) must not refer to files outside itself using symlinks or Nix path expressions.
  2. Evaluate Nixpkgs with system set to x86_64-linux and check that:
    a. For each package directory, the pkgs.${name} attribute must be defined as callPackage pkgs/by-name/${shard}/${name}/package.nix args for some args.
    b. For each package directory, pkgs.lib.isDerivation pkgs.${name} must be true.

A lot of packages under pkgs.* do fulfill these requirements, these are the ones we can migrate automatically. The ones that don't fulfil these requirements typically need some manual refactoring.

After verifying these requirements, migration can happen in two separate stages:

  • Move all files needed for the package into the appropriate pkgs/by-name directory, with the entry-point at package.nix.

    Rewrite the packages definition in all-packages.nix appropriately to point to ../by-name/...

  • Optionally, if the definition in all-packages.nix has { } as the second callPackage argument, remove the definition completely.

Examples

  • pkgs.kody (defined here) can't be migrated automatically, because its default.nix contains ../../../top-level/kodi-packages.nix, therefore not fulfilling requirement 1.
  • pkgs.termdown (defined here) can't be migrated automatically, because it's not defined using pkgs.callPackage
  • pkgs.qiv (defined here) fulfills all requirements and therefore can be migrated by:
    • Moving pkgs/applications/graphics/qiv/default.nix to pkgs/by-name/qi/qiv/package.nix and adjusting the definition in all-packages.nix to
        qiv = callPackage ../by-name/qi/qiv/package.nix {
          imlib2 = imlib2Full;
        };
    • We cannot remove the definition entirely because the second callPackage argument is not { }
  • pkgs.scli (defined here) fulfills all requirements and therefore can be migrated by:
    • Moving pkgs/applications/misc/scli/default.nix to pkgs/by-name/sc/scli/package.nix and adjusting the definition in all-packages.nix to
        scli = callPackage ../by-name/sc/scli/package.nix { };
    • And because the second callPackage argument is { } we can remove the entire definition.

Ratchet checks

Currently this tool implements ratchet checks, which prevent the introduction of new instances of deprecated patterns. This happens here: https://github.com/NixOS/nixpkgs-check-by-name/blob/86202962eed024d778f2bece8a68e81361898405/src/eval.rs#L515-L517

This says that the ratchet for a specific package is "loose" (aka legacy), meaning we can migrate away from it (therefore tightening the ratchet).

Currently these ratchet states are only used to compare the base branch and the head of a PR, such that we can detect invalid transitions between ratchet states, see ratchet.rs. This allows detect when a PR introduces a new instance of a legacy pattern.

This notably also means that even if somebody merges a PR that introduces new instances of deprecated patterns (happens if a pattern is deprecated between the PRs CI running and it being merged), it won't break master, because these ratchets aren't invalid on their own: Only the transition from tight to loose is invalid.

Using ratchets for migrations

We can also use this ratchet abstraction to determine migrations, because the loose state is exactly what needs to be migrated!

The tool can have two modes:

Thus in addition to defining how ratchets can be compared, we just need a way to define how ratchets can be migrated.

I've started a very rough WIP branch of this before here. Note how:

Problem: Which files are part of a package?

It's really tricky to figure out which files are part of a package and should therefore be moved. Because outside of pkgs/by-name, each package's entry-point file (usually default.nix) can refer to either more of its own files which should be moved (e.g. ./Cargo.lock), or refer to arbitrary files from other packages in which case it can't be moved without breaking check 1 (like ../../some/file.sh).

I did manage to get that to work almost perfectly in an older version of this tooling. It works by creating an index of all (static) references between Nix files to figure out whether a file was only references by a single package or many packages, and then revert that relation to figure out which packages only referred to files used by that package itself.

This tool was used to create this PR: NixOS/nixpkgs#211832

Notably, the current nixpkgs-check-by-name tool doesn't have such functionality, which also means that when somebody creates a PR that introduces a new package outside of pkgs/by-name, it tells them to migrate it to pkgs/by-name, even if it can't be directly migrated due to it referring to shared files! This is the reason the workaround for packages with multiple versions is necessary.

Problem: It's non-trivial to efficiently rewrite many sections of files

This is what I've been struggling with while experimenting. I wrote some notes on this. Basically it's really hard to use rowan (the parsing library used underneath rnix) to edit files in multiple places.

In this case I'd like to have a basic edit operation implemented that can modify or remove attributes (for all-packages.nix). I did get that working in the older tooling, but it's a bit too hacky (still, it shows that it can be done!).

Todo

This is my current idea on how to progress with this:

  1. Port the reference index code from the old tool to nixpkgs-check-by-name, therefore removing the need for the multi-versioned package workaround. This means that we only have a loose ratchet exactly when we can figure out how to migrate automatically.

  2. Implement decent abstractions for representing tree changes, including moving files and changing Nix files. And implement those abstractions efficiently using rnix.

  3. Make ratchets declare how to migrate from a loose state to a tight one using the above tree change abstraction. This should be almost trivial at this point, since we already know that we can migrate, and information from determining this should allow exactly specifying how we can migrate using the abstraction for doing changes.

  4. Set up CI to run the migration for every push to Nixpkgs master, and force push the resulting changes into a separate branch, which can then be merged at any point.

    Bonus: Also make it open a tracking issue for all the packages that can't be migrated manually and update it as they are fixed.

Every future addition of a new ratchet check (for any pattern we want to deprecate) will have to come with an implementation of how to migrate from it, which will then automatically run and be pushed to the branch, which can then be merged again as necessary.

@willbush
Copy link
Member

willbush commented Apr 23, 2024

This might be a good tool for this https://github.com/getgrit/gritql well it doesn't support nix yet. I wonder how hard it is to add given it's based on tree-sitter grammars.

Adding Nix support to that tool could create a lot of value for Nix.

@philiptaron
Copy link
Contributor

I tried to add Nix support to sg and didn't get too far, likely because I didn't really understand why simple pattern matching wasn't working. Agree that a tree-sitter / codemod approach is the right way to go. tree-sitter-nix exists and I think could be resuscitated for this usecase.

@morgante
Copy link

In case anyone ends up working on this, I'd be very receptive to adding Nix support in GritQL and happy to debug any patterns that don't match.

@infinisil
Copy link
Member Author

Problem: Which files are part of a package?

Was pointed out by @emilazy that we can avoid this problem by just not migrating packages that are ambiguous in such a way, which simplifies this a bunch, at the cost of not migrating everything that could be migrated

@emilazy
Copy link
Member

emilazy commented Jul 22, 2024

I think another simple but more general approach would be to migrate the entire directory, check that the result follows the by-name rules, and that its derivation hash remains unchanged?

@eclairevoyant
Copy link

at the cost of not migrating everything that could be migrated

My unscientific survey of files in pkgs shows ~1000 packages with ../ in them

$ rg '\.\./' -c | rg '\.nix:' | wc -l
971

~100 of which are already in by-name. 900ish packages is a fair amount to migrate manually, so automation probably is needed. But since we're migrating piecemeal I don't think it's entirely out of the question to do a few smaller PRs - starting with one to catch the low-hanging fruit.

@infinisil
Copy link
Member Author

Yeah totally, I'm all in favor for dropping the more complicated automated migration to start out with. Honestly at this point it might be best to just go for an ugly solution, even if it's not perfect.

Tbh, https://github.com/nixpkgs-architecture/nix-spp worked decently well too (it's what was used to create NixOS/nixpkgs#211832). Iirc it had some bugs for the more complicated cases, so perhaps it could be simplified to just not do those.

@Frontear
Copy link
Member

I'm curious how this plans to handle packages that appear in distinct namespaces (such as language-specific stuff). An example is python3Packages.<drv>, which would not be possible with pkgs/by-name, due to it being a generic entrypoint. Would we need to preserve pkgs/top-level/python-packages.nix in that case for this situation, or is there a different plan going forwards?

@emilazy
Copy link
Member

emilazy commented Oct 17, 2024

The plan is only to migrate simple cases that by-name already handles for now.

@mrgiles
Copy link

mrgiles commented Oct 23, 2024

Hello!

I've been struggling to merge NixOS/nixpkgs#348479 into pkgs/by-name. I had to update the package version, so I wanted to use the opportunity to also move it under the new pkgs structure.

I went back and forth, getting different errors as I iterated and I finally gave up. I ended up leaving the pkg in the old format but I'd really like to move it under by-name.

Further details are in the Conversation thread in the PR but basically the last error I got is because, besides changing (and/or removing) the old pkg definition, I've also removed it from the all-packages.nix file, so the nixpkgs-merge-bot detected that those other files also changed, and failed to merge the PR.

Question: To move an existing package from pkgs/applications to pkgs/by-name, does it have to be done manually?
If so, do I have to mention the PR# under a separate, specific Issue#?

Thanks and sorry for bothering you guys with these basic questions, I read the Contributing docs but I couldn't find an answer for this specific issue.

@willbush
Copy link
Member

Thanks for sharing! We definitely want to improve the UX and documentation. I'll look more into the issues and respond more in-depth in a day or two. Ping @infinisil @philiptaron

@uncenter
Copy link
Member

Fwiw @Aleksanaa has created a tool for this: https://github.com/Aleksanaa/by-name-migrate. Used in NixOS/nixpkgs#354531.

@liberodark
Copy link

Hi,

I send my little work in this subject :

https://gist.github.com/liberodark/5c0074968b40611c1dfbdc1263a22b53

Best Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants