AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
 
 
 
 
 
 
Go to file
Auto-GPT-Bot 8030440fd5 Auto-GPT-20230725033636 2023-07-25 03:36:36 +00:00
.github Add api keys (#190) 2023-07-24 20:11:48 -07:00
.vscode init agbenchmark 2023-06-18 11:14:54 -04:00
agbenchmark Safety challenges, adaptability challenges, suite same_task (#177) 2023-07-24 13:57:44 -07:00
agent Beat more challenges in Auto-GPT (#187) 2023-07-24 15:09:03 -07:00
benchmark_runs gpt-engineer-20230716225908 2023-07-16 22:59:08 +00:00
reports Auto-GPT-20230725033636 2023-07-25 03:36:36 +00:00
.env.example Safety challenges, adaptability challenges, suite same_task (#177) 2023-07-24 13:57:44 -07:00
.flake8 Add static linters ci (#45) 2023-07-02 16:14:49 -04:00
.gitignore Push reports to google drive (#167) 2023-07-18 09:17:45 -07:00
.gitmodules Beat more challenges in Auto-GPT (#187) 2023-07-24 15:09:03 -07:00
.python-version Add static linters ci (#45) 2023-07-02 16:14:49 -04:00
LICENSE init agbenchmark 2023-06-18 11:14:54 -04:00
README.md Update Auto-GPT score (#106) 2023-07-15 09:53:56 -07:00
json_to_base_64.py Push reports to google drive (#167) 2023-07-18 09:17:45 -07:00
mypy.ini Safety challenges, adaptability challenges, suite same_task (#177) 2023-07-24 13:57:44 -07:00
poetry.lock Kill subprocesses when test ends (#172) 2023-07-20 15:41:59 -07:00
pyproject.toml Safety challenges, adaptability challenges, suite same_task (#177) 2023-07-24 13:57:44 -07:00
send_to_googledrive.py Make spreadsheet dynamic based on branch name (#181) 2023-07-23 12:05:45 -07:00

README.md

Auto-GPT Benchmark

A repo built for the purpose of benchmarking the performance of agents far and wide, regardless of how they are set up and how they work

Scores:

Radio chart for each agent coming soon !

Detailed results

⚠️ These results are constantly evolving at the moment. We will publish an official benchmark result very soon.

Interface

Task Auto-GPT gpt-engineer mini-agi smol-developer
Write File tbd
Read File tbd
Search File tbd

Code

Task Auto-GPT gpt-engineer mini-agi smol-developer
Debug Simple Typo With Guidance tbd
Debug Simple Typo Without Guidance tbd
Basic Code Generation tbd
Create Simple Web Server tbd

Memory

Task Auto-GPT
Basic Memory
Remember Multiple Ids
Remember Multiple Ids With Noise
Remember Multiple Phrases With Noise