Skip to content

Commit

Permalink
[CHG] runbot_merge: switch staging from github API to local
Browse files Browse the repository at this point in the history
It has been a consideration for a while, but the pain of subtly
interacting with git via the ignominous CLI kept it back. Then ~~the
fire nation attacked~~ github got more and more tight-fisted (and in
some ways less reliable) with their API.

Staging pretty much just interacts with the git database, so it's both
a facultative github operator (it can just interact with git directly)
and a big consumer of API requests (because the git database endpoints
are very low level so it takes quite a bit of work to do anything
especially when high-level operations like rebase have to be
replicated by hand).

Furthermore, an issue has also been noticed which can be attributed to
using the github API (and that API's reliability getting worse): in
some cases github will fail to propagate a ref update / reset, so when
staging 2 PRs it's possible that the second one is merged on top of
the temporary branch of the first one, yielding a kinda broken commit
(in that it's a merge commit with a broken error message) instead of
the rebase / squash commit we expected.

As it turns out it's a very old issue but only happened very early so
was misattributed and not (sufficiently) guarded against:

- 41bd82244bb976bbd4d4be5e7bd792417c7dae6b (October 8th 2018) was
  spotted but thought to be a mergebot issue (might have been one of
  the opportunities where ref-checks were added though I can't find
  any reference to the commit in the runbot repo).
- 2be25052e147b151d1d8a5bc73cceb351586ce03 (October 15th, 2019) was
  missed (or ignored).
- 5a9fe7a7d05a9df7186072a7bffd60c6b428fd0e (July 31st, 2023) was
  spotted, but happened at a moment where everything kinda broke
  because of github rate-limiting ref updates, so the forensics were
  difficult and it was attributed to rate limiting issues.
- f10d03bf0f2e8f88f62a5d8356b84f714196130f (August 24th, 2023) broke
  the camel's back (and the head block): the logs were not too
  interspersed with other garbage and pretty clear that github ack'd a
  ref update, returned the correct oid when checking the ref, then
  returned the wrong oid when fetching it later on.

No Working Copy
===============

The working copy turns out to not be necessary, the plumbing commands
we *need* work just fine on a bare repository.

Working without a WC means we had to reimplement the high level
operations (rebase) by hand much as we'd done previously, *but* we
needed to do that anyway as git doesn't seem to provide any way to
retrieve the mapping when rebasing/cherrypicking, and cherrypicking by
commit doesn't work well as it can't really find the *merge base* it
needs.

Forward-porting can almost certainly be implemented similarly (with
some overhead), issue #803 has been opened to keep track of the idea.

No TMP
======

The `tmp.` branches are no more, the process of creating stagings is
based entirely around oids, if staging something fails we can just
abandon the oids (they'll be collected by the weekly GC), we only
need to update the staging branches at the very end of the process.

This simplifies things a fair bit.

For now we have stopped checking for visibility / backoff as we're
pushing via git, hopefully it is a more reliable reference than the
API.

Commmit Message Formatting
==========================

There's some unfortunate churn in the test, as the handling of
trailing newlines differs between github's APIs and git itself.

Fixes #247

PS: It might be a good idea to use pygit2 instead of the CLI
    eventually, the library is typed which is nice, and it avoids
    shelling out although that's really unlikely to be a major cost.
  • Loading branch information
xmo-odoo committed Aug 25, 2023
1 parent 2fbbe3f commit 85a7890
Show file tree
Hide file tree
Showing 6 changed files with 307 additions and 210 deletions.
3 changes: 3 additions & 0 deletions conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -537,6 +537,9 @@ def __init__(self, session, fullname, repos):
self.hook = False
repos.append(self)

def __repr__(self):
return f'<conftest.Repo {self.name}>'

@property
def owner(self):
return self.name.split('/')[0]
Expand Down
7 changes: 3 additions & 4 deletions mergebot_test_utils/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@
closes {repo}#{number}
{headers}Signed-off-by: {name} <{email}>"""
{headers}Signed-off-by: {name} <{email}>
"""
# target branch '-' source branch '-' base64 unique '-fw'
REF_PATTERN = r'{target}-{source}-[a-zA-Z0-9_-]{{4}}-fw'

Expand Down Expand Up @@ -136,6 +137,4 @@ def to_pr(env, pr):
return pr

def part_of(label, pr_id, *, separator='\n\n'):
""" Adds the "part-of" pseudo-header in the footer.
"""
return f'{label}{separator}Part-of: {pr_id.display_name}'
return f'{label}{separator}Part-of: {pr_id.display_name}\n'
118 changes: 117 additions & 1 deletion runbot_merge/git.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
import dataclasses
import itertools
import logging
import os
import pathlib
import resource
import subprocess
from typing import Optional, TypeVar
from typing import Optional, TypeVar, Union, Sequence, Tuple, Dict

from odoo.tools.appdirs import user_cache_dir
from .github import MergeError, PrCommit

_logger = logging.getLogger(__name__)

Expand All @@ -17,6 +19,7 @@ def source_url(repository, prefix: str) -> str:
repository.name,
)

Authorship = Union[Tuple[str, str], Tuple[str, str, str]]

def get_local(repository, prefix: Optional[str]) -> 'Optional[Repo]':
repos_dir = pathlib.Path(user_cache_dir('mergebot'))
Expand Down Expand Up @@ -104,6 +107,119 @@ def clone(self, to: str, branch: Optional[str] = None) -> Self:
)
return Repo(to)

def rebase(self, dest: str, commits: Sequence[PrCommit]) -> Tuple[str, Dict[str, str]]:
"""Implements rebase by hand atop plumbing so:
- we can work without a working copy
- we can track individual commits (and store the mapping)
It looks like `--merge-base` is not sufficient for `merge-tree` to
correctly keep track of history, so it loses contents. Therefore
implement in two passes as in the github version.
"""
repo = self.stdout().with_config(text=True, check=False)

logger = _logger.getChild('rebase')
logger.debug("rebasing %s on %s (reset=%s, commits=%s)",
self._repo, dest, len(commits))
if not commits:
raise MergeError("PR has no commits")

new_trees = []
parent = dest
for original in commits:
if len(original['parents']) != 1:
raise MergeError(
f"commits with multiple parents ({original['sha']}) can not be rebased, "
"either fix the branch to remove merges or merge without "
"rebasing")

new_trees.append(check(repo.merge_tree(parent, original['sha'])).stdout.strip())
parent = check(repo.commit_tree(
tree=new_trees[-1],
parents=[parent, original['sha']],
message=f'temp rebase {original["sha"]}',
)).stdout.strip()

mapping = {}
for original, tree in zip(commits, new_trees):
authorship = check(repo.show('--no-patch', '--pretty="%an%n%ae%n%ai%n%cn%n%ce"', original['sha']))
author_name, author_email, author_date, committer_name, committer_email =\
authorship.stdout.splitlines()

c = check(repo.commit_tree(
tree=tree,
parents=[dest],
message=original['commit']['message'],
author=(author_name, author_email, author_date),
committer=(committer_name, committer_email),
)).stdout.strip()

logger.debug('copied %s to %s (parent: %s)', original['sha'], c, dest)
dest = mapping[original['sha']] = c

return dest, mapping

def merge(self, c1: str, c2: str, msg: str, *, author: Tuple[str, str]) -> str:
repo = self.stdout().with_config(text=True, check=False)

t = repo.merge_tree(c1, c2)
if t.returncode:
raise MergeError(t.stderr)

c = self.commit_tree(
tree=t.stdout.strip(),
message=msg,
parents=[c1, c2],
author=author,
)
if c.returncode:
raise MergeError(c.stderr)
return c.stdout.strip()

def commit_tree(
self, *, tree: str, message: str,
parents: Sequence[str] = (),
author: Optional[Authorship] = None,
committer: Optional[Authorship] = None,
) -> subprocess.CompletedProcess:
authorship = {}
if author:
authorship['GIT_AUTHOR_NAME'] = author[0]
authorship['GIT_AUTHOR_EMAIL'] = author[1]
if len(author) > 2:
authorship['GIT_AUTHOR_DATE'] = author[2]
if committer:
authorship['GIT_COMMITTER_NAME'] = committer[0]
authorship['GIT_COMMITTER_EMAIL'] = committer[1]
if len(committer) > 2:
authorship['GIT_COMMITTER_DATE'] = committer[2]

return self.with_config(
stdout=subprocess.PIPE,
text=True,
env={
**os.environ,
**authorship,
# we don't want git to use the timezone of the machine it's
# running on: previously it used the timezone configured in
# github (?), which I think / assume defaults to a generic UTC
'TZ': 'UTC',
}
)._run(
'commit-tree',
tree,
'-m', message,
*itertools.chain.from_iterable(('-p', p) for p in parents),
)

def check(p: subprocess.CompletedProcess) -> subprocess.CompletedProcess:
if not p.returncode:
return p

_logger.info("rebase failed at %s\nstdout:\n%s\nstderr:\n%s", p.args, p.stdout, p.stderr)
raise MergeError(p.stderr or 'merge conflict')


@dataclasses.dataclass
class GitCommand:
Expand Down
Loading

0 comments on commit 85a7890

Please sign in to comment.