6407e258b5 | ||
---|---|---|
.github | ||
.vscode | ||
agbenchmark | ||
agent | ||
benchmark_runs | ||
reports | ||
.env.example | ||
.flake8 | ||
.gitignore | ||
.gitmodules | ||
.python-version | ||
LICENSE | ||
README.md | ||
json_to_base_64.py | ||
mypy.ini | ||
poetry.lock | ||
pyproject.toml | ||
send_to_googledrive.py |
README.md
Auto-GPT Benchmark
A repo built for the purpose of benchmarking the performance of agents far and wide, regardless of how they are set up and how they work
Scores:
Radio chart for each agent coming soon !
Detailed results
⚠️ These results are constantly evolving at the moment. We will publish an official benchmark result very soon.
Interface
Task | Auto-GPT | gpt-engineer | mini-agi | smol-developer |
---|---|---|---|---|
Write File | ❌ | ✅ | tbd | ✅ |
Read File | ❌ | ❌ | tbd | ❌ |
Search File | ❌ | ❌ | tbd | ❌ |
Code
Task | Auto-GPT | gpt-engineer | mini-agi | smol-developer |
---|---|---|---|---|
Debug Simple Typo With Guidance | ❌ | ❌ | tbd | ❌ |
Debug Simple Typo Without Guidance | ❌ | ❌ | tbd | ❌ |
Basic Code Generation | ❌ | ✅ | tbd | ✅ |
Create Simple Web Server | ❌ | ❌ | tbd | ❌ |
Memory
Task | Auto-GPT |
---|---|
Basic Memory | ❌ |
Remember Multiple Ids | ❌ |
Remember Multiple Ids With Noise | ❌ |
Remember Multiple Phrases With Noise | ❌ |