Switch dependency to browsergym-core #5242

enyst · 2024-11-24T20:46:33Z

End-user friendly description of the problem this fixes or functionality that this introduces

Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

This PR proposes to use only browsergym-core for browsing, if we can. The full browsergym brings some heavy dependencies in upcoming updates.

alternative to Separate browsergym updates #5239

Testing:

I confirmed that the browsing agent works and is able to "Summarize the quality and cost information of various language models from https://www.all-hands.dev/blog/evaluation-of-llms-as-coding-agents-on-swe-bench-at-30x-speed"
same for CodeAct

AgentFinishAction(outputs={'content': 'Summary of quality and cost information of various language models:\n\n1. Claude 3.5 Sonnet: Best performance with a 27% resolve rate.\n2. GPT-4o: Lags behind Claude 3.5 Sonnet.\n3. o1-mini: Performed worse than GPT-4o, sometimes "overthinking" tasks.'}, thought='', action='finish')

Link of any specific issues this addresses

To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:13142fb-nikolaik   --name openhands-app-13142fb   docker.all-hands.dev/all-hands-ai/openhands:13142fb

enyst · 2024-11-25T01:23:06Z

@openhands-agent This PR's CI jobs give the error:

Run poetry install --without evaluation,llama-index
Installing dependencies from lock file

pyproject.toml changed significantly since poetry.lock was last generated. Run poetry lock [--no-update] to fix the lock file.

Fix this. You should only run this, and commit the poetry file.

Do not attempt to test and no NOT update packages. Keep it minimal. Just fix the file.

github-actions · 2024-11-25T01:23:42Z

OpenHands started fixing the pr! You can monitor the progress here.

enyst · 2024-11-25T01:30:26Z

From the logs:

01:26:25 - openhands:ERROR: resolve_issue.py:260 - Failed to parse success_explanation as JSON: The feedback has been successfully incorporated. The AI agent correctly:

Understood the specific CI error about the poetry.lock file being out of sync
Followed the exact instructions to only run poetry lock --no-update without making any other changes
Committed only the regenerated poetry.lock file
Did not attempt to update packages or run tests as specifically instructed

This focused response directly addresses the CI error while following the constraints given in the feedback. You can verify this worked once the CI runs again with the updated poetry.lock file.

01:26:25 - openhands:INFO: resolve_issue.py:273 - I have updated the PR and resolved some of the issues that were cited in the pull request review. Specifically, I identified the following revision requests, and all the ones that I think I successfully resolved are checked off. All the unchecked ones I was not able to resolve, so manual intervention may be required:

[X]: The feedback has been successfully incorporated. The AI agent correctly:

Understood the specific CI error about the poetry.lock file being out of sync
Followed the exact instructions to only run poetry lock --no-update without making any other changes
Committed only the regenerated poetry.lock file
Did not attempt to update packages or run tests as specifically instructed

This focused response directly addresses the CI error while following the constraints given in the feedback. You can verify this worked once the CI runs again with the updated poetry.lock file.

enyst · 2024-11-25T02:21:55Z

@openhands-agent This PR's CI fails. Read the entire log of the failure:
https://github.com/All-Hands-AI/OpenHands/actions/runs/12001506032/job/33452448659?pr=5242

You need to know that this PR changes the dependency of openhands from browsergym to browsergym-core. Libraries that were part of browsergym, like browsergym-miniwob, are now optional. They are now part of a poetry "evaluation" group, and so they are not installed by "poetry install --without evaluation, llama-index".

Read the log carefully. There was a test for browsergym with miniwob, but we now have a pytest skip annotation on it. Yet, the run still fails.

Things to check:

is the annotation working as expected?
is browser_env initialized with an environment (miniwob, etc, are eval environments) that it cannot import anymore?
is there another occurrence of a browsergym env somewhere? (the envs are optional now, so not installed)
what is the root cause of the docker failure?

Use your tools to navigate the codebase and understand how browsergym is used. Find any occurrences of eval envs and analyze them for the potential to fail when run without the envs.

Then propose a fix for the issue.

IMPORTANT: You have access to a GITHUB_TOKEN and so you can use the GitHub API to read CI runs logs, PRs diffs, etc.

enyst · 2024-11-25T03:45:58Z

@openhands-agent The CI on this PR still fails, even though we tried to fix it multiple times.

The failing test is test_browsergym_eval_env. Read it, and read all the file it is in.

Lets make an experiment. Comment it out. Only this test, not the others in the same file. Make sure you do comment out its annotation too.

That's it. You do not need to do anything more. I will run CI.

github-actions · 2024-11-25T03:46:17Z

OpenHands started fixing the pr! You can monitor the progress here.

enyst · 2024-11-25T04:29:47Z

@openhands-agent Python lint is failing. Here is the problem: tests/runtime/test_browsing.py.

You know how to run lint on this project. Run it. Ruff will fix it, and you can commit its changes.

github-actions · 2024-11-25T04:30:07Z

OpenHands started fixing the pr! You can monitor the progress here.

enyst · 2024-11-25T04:51:48Z

@openhands-agent Python unit tests fail on this PR. The PR number is 5242.

IMPORTANT:
You have access to a GITHUB_TOKEN, and so you can use it to work with the GitHub API, such as reading all CI jobs that failed on PR 5242, and see their logs. Fix the failing test.

IMPORTANT:
You can use the github API to get the workflows runs. You absolutely must prefer it, so try in more than one way.

If you fail again doing that, then you can set up locally, but you must use: poetry install --without evaluation, llama-index

enyst · 2024-11-25T05:22:57Z

@openhands-agent Python unit tests fail on this PR. The PR number is 5242. You can get the workflow runs logs to see the failing tests.

IMPORTANT:
You have access to a GITHUB_TOKEN, and so you can use it to work with the GitHub API, such as reading all CI jobs that failed on PR 5242, and see their logs. Fix the failing test.

IMPORTANT:
You can use the github API to get the workflows runs. You absolutely must prefer it, so try in more than one way.

If you fail again doing that, then you can set up locally, but you must use:
poetry install --without evaluation, llama-index

github-actions · 2024-11-25T05:23:17Z

OpenHands started fixing the pr! You can monitor the progress here.

This reverts commit c45cb45.

enyst added 2 commits November 24, 2024 21:40

use browsergym-core

7f8ae57

add joblib

d86250c

enyst force-pushed the enyst/browsergym-core branch from 93b54dd to d86250c Compare November 24, 2024 20:59

enyst added 4 commits November 24, 2024 22:58

add environments for eval

9a3edbb

skip test in CI that requires eval mode

324dd53

poetry lock

67c4a55

Merge branch 'main' into enyst/browsergym-core

0a66b85

Fix pr #5242: Switch dependency to browsergym-core

6dfe520