AutoGPT/.github/workflows/hackathon.yml

134 lines
4.3 KiB
YAML
Raw Normal View History

name: Hackathon
on:
workflow_dispatch:
inputs:
agents:
description: "Agents to run (comma-separated)"
required: false
2023-11-16 13:48:41 +00:00
default: "autogpt" # Default agents if none are specified
jobs:
matrix-setup:
runs-on: ubuntu-latest
2023-11-16 13:48:41 +00:00
# Service containers to run with `matrix-setup`
services:
# Label used to access the service container
postgres:
# Docker Hub image
image: postgres
# Provide the password for postgres
env:
POSTGRES_PASSWORD: postgres
# Set health checks to wait until postgres has started
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
# Maps tcp port 5432 on service container to the host
- 5432:5432
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
env-name: ${{ steps.set-matrix.outputs.env-name }}
steps:
- id: set-matrix
run: |
if [ "${{ github.event_name }}" == "schedule" ]; then
echo "::set-output name=env-name::production"
2023-11-16 13:48:41 +00:00
echo "::set-output name=matrix::[ 'irrelevant']"
elif [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
IFS=',' read -ra matrix_array <<< "${{ github.event.inputs.agents }}"
matrix_string="[ \"$(echo "${matrix_array[@]}" | sed 's/ /", "/g')\" ]"
echo "::set-output name=env-name::production"
echo "::set-output name=matrix::$matrix_string"
else
echo "::set-output name=env-name::testing"
2023-11-16 13:48:41 +00:00
echo "::set-output name=matrix::[ 'irrelevant' ]"
fi
tests:
environment:
name: "${{ needs.matrix-setup.outputs.env-name }}"
needs: matrix-setup
env:
min-python-version: "3.10"
name: "${{ matrix.agent-name }}"
runs-on: ubuntu-latest
2023-11-16 13:48:41 +00:00
services:
# Label used to access the service container
postgres:
# Docker Hub image
image: postgres
# Provide the password for postgres
env:
POSTGRES_PASSWORD: postgres
# Set health checks to wait until postgres has started
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
# Maps tcp port 5432 on service container to the host
- 5432:5432
timeout-minutes: 50
strategy:
fail-fast: false
matrix:
agent-name: ${{fromJson(needs.matrix-setup.outputs.matrix)}}
steps:
- name: Print Environment Name
run: |
echo "Matrix Setup Environment Name: ${{ needs.matrix-setup.outputs.env-name }}"
2023-11-16 13:48:41 +00:00
- name: Check Docker Container
id: check
run: docker ps
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0
submodules: true
- name: Set up Python ${{ env.min-python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ env.min-python-version }}
- id: get_date
name: Get date
run: echo "date=$(date +'%Y-%m-%d')" >> $GITHUB_OUTPUT
- name: Install Poetry
run: |
curl -sSL https://install.python-poetry.org | python -
2023-11-16 13:48:41 +00:00
- name: Install Node.js
uses: actions/setup-node@v4
2023-11-16 13:48:41 +00:00
with:
node-version: v18.15
2023-10-27 14:21:29 +00:00
- name: Run benchmark
run: |
2023-10-27 14:21:29 +00:00
link=$(jq -r '.["github_repo_url"]' arena/$AGENT_NAME.json)
branch=$(jq -r '.["branch_to_benchmark"]' arena/$AGENT_NAME.json)
2023-10-27 14:27:43 +00:00
git clone "$link" -b "$branch" "$AGENT_NAME"
2023-10-27 14:21:29 +00:00
cd $AGENT_NAME
cp ./$AGENT_NAME/.env.example ./$AGENT_NAME/.env || echo "file not found"
2023-10-27 14:21:29 +00:00
./run agent start $AGENT_NAME
2023-11-16 13:48:41 +00:00
cd ../benchmark
2023-10-27 14:27:43 +00:00
poetry install
AGBenchmark codebase clean-up (#6650) * refactor(benchmark): Deduplicate configuration loading logic - Move the configuration loading logic to a separate `load_agbenchmark_config` function in `agbenchmark/config.py` module. - Replace the duplicate loading logic in `conftest.py`, `generate_test.py`, `ReportManager.py`, `reports.py`, and `__main__.py` with calls to `load_agbenchmark_config` function. * fix(benchmark): Fix type errors, linting errors, and clean up CLI validation in __main__.py - Fixed type errors and linting errors in `__main__.py` - Improved the readability of CLI argument validation by introducing a separate function for it * refactor(benchmark): Lint and typefix app.py - Rearranged and cleaned up import statements - Fixed type errors caused by improper use of `psutil` objects - Simplified a number of `os.path` usages by converting to `pathlib` - Use `Task` and `TaskRequestBody` classes from `agent_protocol_client` instead of `.schema` * refactor(benchmark): Replace `.agent_protocol_client` by `agent-protcol-client`, clean up schema.py - Remove `agbenchmark.agent_protocol_client` (an offline copy of `agent-protocol-client`). - Add `agent-protocol-client` as a dependency and change imports to `agent_protocol_client`. - Fix type annotation on `agent_api_interface.py::upload_artifacts` (`ApiClient` -> `AgentApi`). - Remove all unused types from schema.py (= most of them). * refactor(benchmark): Use pathlib in agent_interface.py and agent_api_interface.py * refactor(benchmark): Improve typing, response validation, and readability in app.py - Simplified response generation by leveraging type checking and conversion by FastAPI. - Introduced use of `HTTPException` for error responses. - Improved naming, formatting, and typing in `app.py::create_evaluation`. - Updated the docstring on `app.py::create_agent_task`. - Fixed return type annotations of `create_single_test` and `create_challenge` in generate_test.py. - Added default values to optional attributes on models in report_types_v2.py. - Removed unused imports in `generate_test.py` * refactor(benchmark): Clean up logging and print statements - Introduced use of the `logging` library for unified logging and better readability. - Converted most print statements to use `logger.debug`, `logger.warning`, and `logger.error`. - Improved descriptiveness of log statements. - Removed unnecessary print statements. - Added log statements to unspecific and non-verbose `except` blocks. - Added `--debug` flag, which sets the log level to `DEBUG` and enables a more comprehensive log format. - Added `.utils.logging` module with `configure_logging` function to easily configure the logging library. - Converted raw escape sequences in `.utils.challenge` to use `colorama`. - Renamed `generate_test.py::generate_tests` to `load_challenges`. * refactor(benchmark): Remove unused server.py and agent_interface.py::run_agent - Remove unused server.py file - Remove unused run_agent function from agent_interface.py * refactor(benchmark): Clean up conftest.py - Fix and add type annotations - Rewrite docstrings - Disable or remove unused code - Fix definition of arguments and their types in `pytest_addoption` * refactor(benchmark): Clean up generate_test.py file - Refactored the `create_single_test` function for clarity and readability - Removed unused variables - Made creation of `Challenge` subclasses more straightforward - Made bare `except` more specific - Renamed `Challenge.setup_challenge` method to `run_challenge` - Updated type hints and annotations - Made minor code/readability improvements in `load_challenges` - Added a helper function `_add_challenge_to_module` for attaching a Challenge class to the current module * fix(benchmark): Fix and add type annotations in execute_sub_process.py * refactor(benchmark): Simplify const determination in agent_interface.py - Simplify the logic that determines the value of `HELICONE_GRAPHQL_LOGS` * fix(benchmark): Register category markers to prevent warnings - Use the `pytest_configure` hook to register the known challenge categories as markers. Otherwise, Pytest will raise "unknown marker" warnings at runtime. * refactor(benchmark/challenges): Fix indentation in 4_revenue_retrieval_2/data.json * refactor(benchmark): Update agent_api_interface.py - Add type annotations to `copy_agent_artifacts_into_temp_folder` function - Add note about broken endpoint in the `agent_protocol_client` library - Remove unused variable in `run_api_agent` function - Improve readability and resolve linting error * feat(benchmark): Improve and centralize pathfinding - Search path hierarchy for applicable `agbenchmark_config`, rather than assuming it's in the current folder. - Create `agbenchmark.utils.path_manager` with `AGBenchmarkPathManager` and exporting a `PATH_MANAGER` const. - Replace path constants defined in __main__.py with usages of `PATH_MANAGER`. * feat(benchmark/cli): Clean up and improve CLI - Updated commands, options, and their descriptions to be more intuitive and consistent - Moved slow imports into the entrypoints that use them to speed up application startup - Fixed type hints to match output types of Click options - Hid deprecated `agbenchmark start` command - Refactored code to improve readability and maintainability - Moved main entrypoint into `run` subcommand - Fixed `version` and `serve` subcommands - Added `click-default-group` package to allow using `run` implicitly (for backwards compatibility) - Renamed `--no_dep` to `--no-dep` for consistency - Fixed string formatting issues in log statements * refactor(benchmark/config): Move AgentBenchmarkConfig and related functions to config.py - Move the `AgentBenchmarkConfig` class from `utils/data_types.py` to `config.py`. - Extract the `calculate_info_test_path` function from `utils/data_types.py` and move it to `config.py` as a private helper function `_calculate_info_test_path`. - Move `load_agent_benchmark_config()` to `AgentBenchmarkConfig.load()`. - Changed simple getter methods on `AgentBenchmarkConfig` to calculated properties. - Update all code references according to the changes mentioned above. * refactor(benchmark): Fix ReportManager init parameter types and use pathlib - Fix the type annotation of the `benchmark_start_time` parameter in `ReportManager.__init__`, was mistyped as `str` instead of `datetime`. - Change the type of the `filename` parameter in the `ReportManager.__init__` method from `str` to `Path`. - Rename `self.filename` with `self.report_file` in `ReportManager`. - Change the way the report file is created, opened and saved to use the `Path` object. * refactor(benchmark): Improve typing surrounding ChallengeData and clean up its implementation - Use `ChallengeData` objects instead of untyped `dict` in app.py, generate_test.py, reports.py. - Remove unnecessary methods `serialize`, `get_data`, `get_json_from_path`, `deserialize` from `ChallengeData` class. - Remove unused methods `challenge_from_datum` and `challenge_from_test_data` from `ChallengeData class. - Update function signatures and annotations of `create_challenge` and `generate_single_test` functions in generate_test.py. - Add types to function signatures of `generate_single_call_report` and `finalize_reports` in reports.py. - Remove unnecessary `challenge_data` parameter (in generate_test.py) and fixture (in conftest.py). * refactor(benchmark): Clean up generate_test.py, conftest.py and __main__.py - Cleaned up generate_test.py and conftest.py - Consolidated challenge creation logic in the `Challenge` class itself, most notably the new `Challenge.from_challenge_spec` method. - Moved challenge selection logic from generate_test.py to the `pytest_collection_modifyitems` hook in conftest.py. - Converted methods in the `Challenge` class to class methods where appropriate. - Improved argument handling in the `run_benchmark` function in `__main__.py`. * refactor(benchmark/config): Merge AGBenchmarkPathManager into AgentBenchmarkConfig and reduce fragmented/global state - Merge the functionality of `AGBenchmarkPathManager` into `AgentBenchmarkConfig` to consolidate the configuration management. - Remove the `.path_manager` module containing `AGBenchmarkPathManager`. - Pass the `AgentBenchmarkConfig` and its attributes through function arguments to reduce global state and improve code clarity. * feat(benchmark/serve): Configurable port for `serve` subcommand - Added `--port` option to `serve` subcommand to allow for specifying the port to run the API on. - If no `--port` option is provided, the port will default to the value specified in the `PORT` environment variable, or 8080 if not set. * feat(benchmark/cli): Add `config` subcommand - Added a new subcommand `config` to the AGBenchmark CLI, to display information about the present AGBenchmark config. * fix(benchmark): Gracefully handle incompatible challenge spec files in app.py - Added a check to skip deprecated challenges - Added logging to allow debugging of the loading process - Added handling of validation errors when parsing challenge spec files - Added missing `spec_file` attribute to `ChallengeData` * refactor(benchmark): Move `run_benchmark` entrypoint to main.py, use it in `/reports` endpoint - Move `run_benchmark` and `validate_args` from __main__.py to main.py - Replace agbenchmark subprocess in `app.py:run_single_test` with `run_benchmark` - Move `get_unique_categories` from __main__.py to challenges/__init__.py - Move `OPTIONAL_CATEGORIES` from __main__.py to challenge.py - Reduce operations on updates.json (including `initialize_updates_file`) outside of API * refactor(benchmark): Remove unused `/updates` endpoint and all related code - Remove `updates_json_file` attribute from `AgentBenchmarkConfig` - Remove `get_updates` and `_initialize_updates_file` in app.py - Remove `append_updates_file` and `create_update_json` functions in agent_api_interface.py - Remove call to `append_updates_file` in challenge.py * refactor(benchmark/config): Clean up and update docstrings on `AgentBenchmarkConfig` - Add and update docstrings - Change base class from `BaseModel` to `BaseSettings`, allow extras for backwards compatibility - Make naming of path attributes on `AgentBenchmarkConfig` more consistent - Remove unused `agent_home_directory` attribute - Remove unused `workspace` attribute * fix(benchmark): Restore mechanism to select (optional) categories in agent benchmark config * fix(benchmark): Update agent-protocol-client to v1.1.0 - Fixes issue with fetching task artifact listings
2024-01-02 21:23:09 +00:00
poetry run agbenchmark --no-dep
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
2023-11-16 13:48:41 +00:00
SERP_API_KEY: ${{ secrets.SERP_API_KEY }}
SERPAPI_API_KEY: ${{ secrets.SERP_API_KEY }}
WEAVIATE_API_KEY: ${{ secrets.WEAVIATE_API_KEY }}
WEAVIATE_URL: ${{ secrets.WEAVIATE_URL }}
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
GOOGLE_CUSTOM_SEARCH_ENGINE_ID: ${{ secrets.GOOGLE_CUSTOM_SEARCH_ENGINE_ID }}
Clean up & fix GitHub workflows (#6313) * ci: Mitigate security issues in autogpt-ci.yml - Remove unnecessary pull_request_target paths and related variables and config - Set permissions for contents to read only * ci: Simplify steps in autogpt-ci.yml workflow using GitHub CLI - Simplify step in 'autogpt-ci.yml' by using GitHub CLI instead of API for adding label and comment functionality - Replace curl command with 'gh issue edit' to add "behaviour change" label to the pull request - Replace gh api command with 'gh issue comment' to leave a comment about the changed behavior of AutoGPT in the pull request * ci: Fix issues in workflows - Move environment variable definition to top level in benchmark-ci.yml (because the other job also needs it) - Removed invalid 'branches: [hackathon]' restriction in hackathon.yml workflow - Removed redundant 'ref' and 'repository' fields in the 'checkout' step of both workflows. * ci: Delete legacy benchmarks.yml workflow * ci: Add triggers for CI workflows - Add triggers to run CI workflows when they are edited. - Update the paths for the CI workflows in the trigger configuration. * fix: Fix benchmark lint error - Removed unnecessary blank lines in report_types.py - Fixed string quotes in challenge.py to maintain consistency * fix: Update task description in password generator data.json - Update task description in `data.json` file for the password generator challenge to clarify the input requirements and error handling. - This change is made in an attempt to make the Benchmark CI pass. * fix: Fix PasswordGenerator challenge in CI - Fix the behavior of the reference password_generator.py to align with the task description - Use default password length 8 instead of a random length in the generate_password function - Retrieve the password length from the command line arguments if "--length" is provided, else set it to 8
2023-11-21 09:58:54 +00:00
AGENT_NAME: ${{ matrix.agent-name }}