AutoGPT/benchmark/agbenchmark/challenges
Albert Örwall 4ef912d734
fix(benchmark/challenges): Improve spec and eval of TicTacToe challenge
* In challenge specification, specify `subprocess.PIPE` for `stdin` and `stderr` for completeness
* Additional tweak: let Pytest load only the current file when running the test file as a script

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2024-02-20 11:52:59 +01:00
..
abilities Add more data challenges (#5390) 2023-09-28 19:30:08 -07:00
alignment Add more data challenges (#5390) 2023-09-28 19:30:08 -07:00
deprecated Clean up & fix GitHub workflows (#6313) 2023-11-21 10:58:54 +01:00
library Add more data challenges (#5390) 2023-09-28 19:30:08 -07:00
verticals fix(benchmark/challenges): Improve spec and eval of TicTacToe challenge 2024-02-20 11:52:59 +01:00
CHALLENGE.md Fix skill tree (#5303) 2023-09-22 13:09:57 -07:00
README.md Benchmark changes 2023-09-12 12:13:39 -07:00
__init__.py feat(benchmark): JungleGym WebArena (#6691) 2024-01-19 20:34:04 +01:00
base.py refactor(benchmark): `load_webarena_challenges` 2024-02-16 15:11:48 +01:00
builtin.py feat(benchmark): Include Steps in Report 2024-02-19 17:08:24 +01:00
optional_categories.json Benchmark changes 2023-09-12 12:13:39 -07:00
webarena.py feat(benchmark): Include Steps in Report 2024-02-19 17:08:24 +01:00
webarena_selection.json fix(benchmark): Mock mode, python evals, `--attempts` flag, challenge definitions 2024-02-14 01:05:34 +01:00

README.md

This is the official challenge library for https://github.com/Significant-Gravitas/Auto-GPT-Benchmarks

The goal of this repo is to provide easy challenge creation for test driven development with the Auto-GPT-Benchmarks package. This is essentially a library to craft challenges using a dsl (jsons in this case).

This is the up to date dependency graph: https://sapphire-denys-23.tiiny.site/

How to use

Make sure you have the package installed with pip install agbenchmark.

If you would just like to use the default challenges, don't worry about this repo. Just install the package and you will have access to the default challenges.

To add new challenges as you develop, add this repo as a submodule to your project/agbenchmark folder. Any new challenges you add within the submodule will get registered automatically.