AutoGPT/agbenchmark
Silen Naihin 5e3bbb946f
fix suite dependencies (#194)
2023-07-26 01:50:53 +01:00
..
challenges hotfix reports (#191) 2023-07-25 19:07:24 +01:00
reports hotfix reports (#191) 2023-07-25 19:07:24 +01:00
README.md Dynamic home path for runs (#119) 2023-07-16 18:24:06 -07:00
ReportManager.py internal_info.json dynamic changes (#163) 2023-07-17 09:39:24 -04:00
__init__.py init agbenchmark 2023-06-18 11:14:54 -04:00
agent_interface.py Safety challenges, adaptability challenges, suite same_task (#177) 2023-07-24 13:57:44 -07:00
challenge.py Safety challenges, adaptability challenges, suite same_task (#177) 2023-07-24 13:57:44 -07:00
config.json Dynamic home path for runs (#119) 2023-07-16 18:24:06 -07:00
conftest.py fix suite dependencies (#194) 2023-07-26 01:50:53 +01:00
metrics.py start click, fixtures, types, challenge creation, mock run -stable (#37) 2023-06-21 11:43:18 -04:00
regression_tests.json Dynamic cutoff and other quality of life (#101) 2023-07-15 22:10:20 -04:00
start_benchmark.py Safety challenges, adaptability challenges, suite same_task (#177) 2023-07-24 13:57:44 -07:00
utils.py hotfix reports (#191) 2023-07-25 19:07:24 +01:00

README.md

As a user

  1. pip install auto-gpt-benchmarks
  2. Add boilerplate code to run and kill agent
  3. agbenchmark start
    • --category challenge_category to run tests in a specific category
    • --mock to only run mock tests if they exists for each test
    • --noreg to skip any tests that have passed in the past. When you run without this flag and a previous challenge that passed fails, it will now not be regression tests
  4. We call boilerplate code for your agent
  5. Show pass rate of tests, logs, and any other metrics

Contributing

Diagrams: https://whimsical.com/agbenchmark-5n4hXBq1ZGzBwRsK4TVY7x

To run the existing mocks

  1. clone the repo auto-gpt-benchmarks
  2. pip install poetry
  3. poetry shell
  4. poetry install
  5. cp .env_example .env
  6. agbenchmark start --mock Keep config the same and watch the logs :)

To run with mini-agi

  1. Navigate to auto-gpt-benchmarks/agent/mini-agi
  2. pip install -r requirements.txt
  3. cp .env_example .env, set PROMPT_USER=false and add your OPENAI_API_KEY=. Sset MODEL="gpt-3.5-turbo" if you don't have access to gpt-4 yet. Also make sure you have Python 3.10^ installed
  4. Make sure to follow the commands above, and remove mock flag agbenchmark start
  • To add requirements poetry add requirement.

Feel free to create prs to merge with main at will (but also feel free to ask for review) - if you can't send msg in R&D chat for access.

If you push at any point and break things - it'll happen to everyone - fix it asap. Step 1 is to revert master to last working commit

Let people know what beautiful code you write does, document everything well

Share your progress :)

Workspace

If --mock flag is used it is at agbenchmark/workspace. Otherwise for mini-agi it is at C:/Users/<name>/miniagi - it will be automitcally set on config

Dataset

Manually created, existing challenges within Auto-Gpt, https://osu-nlp-group.github.io/Mind2Web/

How do I add new agents to agbenchmark ?

Example with smol developer.

1- Create a github branch with your agent following the same pattern as this example:

https://github.com/smol-ai/developer/pull/114/files

2- Create the submodule and the github workflow by following the same pattern as this example:

https://github.com/Significant-Gravitas/Auto-GPT-Benchmarks/pull/48/files

How do I run agent in different environments?

To just use as the benchmark for your agent. pip install the package and run agbenchmark start

For internal Auto-GPT ci runs, specify the AGENT_NAME you want you use and set the HOME_ENV. Ex. HOME_ENV=ci AGENT_NAME=mini-agi

To develop agent alongside benchmark, you can specify the AGENT_NAME you want you use and add as a submodule to the repo