AutoGPT/agbenchmark/metrics.py

# how well the agent did on the challenges, the metrics calculation for the future if we're tracking specific tests

# POTENTIAL METRICS
# pass/fail - in the future could have a % metric of challenge completed, milestones achieved
# convergence - how long it took to get the result
# difficulty of the task - defined by previous comparing to runs against other agents
# consistency
# time passed
# budget used
# divergence (distractions not related to task at hand)