AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
 
 
 
 
 
 
Go to file
merwanehamadi 44436fe1a3
Fix Chart generation (#346)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-09-01 08:31:17 -07:00
.github Tic tac toe challenge (#345) 2023-08-31 20:45:31 -07:00
.vscode Adding Auto-GPT-Turbo (#322) 2023-08-19 11:32:38 -07:00
agbenchmark Tic tac toe challenge (#345) 2023-08-31 20:45:31 -07:00
agent Updating Turbo (#343) 2023-08-31 07:09:41 -04:00
backend adding backend and a basic ui (#309) 2023-08-27 03:18:30 -04:00
frontend@c6a9572bed update frontend hash, run.sh 2023-08-28 19:55:44 -07:00
notebooks working bar and radar charts (#221) 2023-07-31 12:22:38 +01:00
reports Fix Chart generation (#346) 2023-09-01 08:31:17 -07:00
.env.example Update .env.example (#298) 2023-08-12 19:52:15 -07:00
.flake8 Cleanup skill tree (#287) 2023-08-10 16:29:58 -07:00
.gitignore Updated ignore 2023-08-29 15:47:35 +02:00
.gitmodules Update Turbo (#324) 2023-08-23 14:39:20 -07:00
.pre-commit-config.yaml AUTO-25: Add the ability to run multiple categories and to skip categories (#270) 2023-08-07 12:29:00 +01:00
.python-version Add static linters ci (#45) 2023-07-02 16:14:49 -04:00
LICENSE init agbenchmark 2023-06-18 11:14:54 -04:00
README.md adding backend and a basic ui (#309) 2023-08-27 03:18:30 -04:00
json_to_base_64.py Push reports to google drive (#167) 2023-07-18 09:17:45 -07:00
mypy.ini Add all agent protocol tests (#260) 2023-08-06 09:52:46 -07:00
poetry.lock Support agent protocol (#337) 2023-08-30 19:44:39 -07:00
pyproject.toml Support agent protocol (#337) 2023-08-30 19:44:39 -07:00
run.sh update frontend hash, run.sh 2023-08-28 19:55:44 -07:00
send_to_googledrive.py Fix linter 2 (#319) 2023-08-16 16:56:02 -07:00
server.py Tic tac toe challenge (#345) 2023-08-31 20:45:31 -07:00

README.md

Auto-GPT Benchmarks

Built for the purpose of benchmarking the performance of agents regardless of how they work.

Objectively know how well your agent is performing in categories like code, retrieval, memory, and safety.

Save time and money while doing it through smart dependencies. The best part? It's all automated.

Scores:

Screenshot 2023-07-25 at 10 35 01 AM

Ranking overall:

Detailed results:

Screenshot 2023-07-25 at 10 42 15 AM

Click here to see the results and the raw data!!

More agents coming soon !