- Added `task_cumulative_cost` and `task_total_cost` attributes to the `Step.additional_output` in the `AgentProtocolServer.execute_step` endpoint.
- Updated `agbenchmark` dependency in Agent and Forge
- Added `n_steps` attribute to `TestResult` type
- Added logic to record the number of steps to `BuiltinChallenge.test_method`, `WebArenaChallenge.test_method`, and `.reports.add_test_result_to_report`
- Add `challenge list` command with options `--all`, `--names`, `--json`
- Add `tabular` dependency
- Add `.utils.utils.sorted_by_enum_index` function to easily sort lists by an enum value/property based on the order of the enum's definition
- Add `challenge info [name]` command with option `--json`
- Add `.utils.utils.pretty_print_model` routine to pretty-print Pydantic models
- Refactor `config` subcommand to use `pretty_print_model`
- Reduce duplicate and nested statements
- Add `skip_unavailable` parameter
Related changes:
- Add `available` and `unavailable_reason` attributes to `ChallengeInfo` and `WebArenaChallengeSpec`
- Add `pytest.skip` statement to `WebArenaChallenge.test_method` to make sure unavailable challenges are not run
- Make `AgentBenchmarkConfig.reports_folder` directly configurable (through `REPORTS_FOLDER` env variable). The default is still `./agbenchmark_config/reports`.
- Change all mentions of `REPORT_LOCATION` (which fulfilled the same function at some point in the past) to `REPORTS_FOLDER`.
- Added a helper function `.app.utils.vcs_state_diverges_from_master()`. This function determines whether the relevant part of the codebase diverges from our `master`.
- Updated `.app.telemetry._setup_sentry()` to determine the default environment name using `vcs_state_diverges_from_master`.
- Added a helper function `wait_until_conn_ready(port)` to wait for the benchmark and agent applications to finish starting
- Improved the CLI's own logging (within the `agent start` command)
- Fixed `--mock` mode
- Moved interrupt to beginning of the step iterator pipeline (from `BuiltinChallenge` to `agent_api_interface.py:run_api_agent`). This ensures that any finish-up code is properly executed after executing a single step.
- Implemented mock mode in `WebArenaChallenge`
- Fixed `fixture 'i_attempt' not found` error when `--attempts`/`-N` is omitted
- Fixed handling of `python`/`pytest` evals in `BuiltinChallenge`
- Disabled left-over Helicone code (see 056163e)
- Fixed a couple of challenge definitions
- WebArena task 107: fix spelling of months (Sepetember, Octorbor *lmao*)
- synthesize/1_basic_content_gen (SynthesizeInfo): remove empty string from `should_contain` list
- Added some debug logging in agent_api_interface.py and challenges/builtin.py
OpenAI's newest models return JSON with markdown fences around it, breaking the `json.loads` parser.
This commit adds an `extract_list_from_response` function to json_utils/utilities.py and uses this function to replace `json.loads` in `_process_text`.
* Add Sentry integration for telemetry
- Add `sentry_sdk` dependency
- Add setup logic and config flow using `TELEMETRY_OPT_IN` environment variable
- Add app/telemetry.py with `setup_telemetry` helper routine
- Call `setup_telemetry` in `cli()` in app/cli.py
- Add `TELEMETRY_OPT_IN` to .env.template
- Add helper function `env_file_exists` and routine `set_env_config_value` to app/utils.py
- Add unit tests for `set_env_config_value` in test_utils.py
- Add prompt to startup to ask whether the user wants to enable telemetry if the env variable isn't set
* Add `capture_exception` statements for LLM parsing errors and command failures
- Change default `SMART_LLM` from `gpt-4` to `gpt-4-turbo-preview`
- Change default `FAST_LLM` from `gpt-3.5-turbo-16k` to `gpt-3.5-turbo-0125`
- Change default `EMBEDDING_MODEL` from `text-embedding-ada-002` to `text-embedding-3-small`
- Update .env.template, azure.yaml.template, and documentation accordingly
- Add `text-embedding-3-small` and `text-embedding-3-large` as `EMBEDDING_v3_S` and `EMBEDDING_v3_L` respectively
- Add `gpt-3.5-turbo-0125` as `GPT3_v4`
- Add `gpt-4-1106-vision-preview` as `GPT4_v3_VISION`
- Add GPT-4V models to info map
- Change chat model info mapping to derive info for aliases (e.g. `gpt-3.5-turbo`) from specific versions instead of the other way around
* Add `_sideload_chrome_extensions` subroutine to `open_page_in_browser` in web_selenium.py
* Sideloads uBlock Origin and I Still Don't Care About Cookies, downloading them if necessary
* Add 2-second delay to `open_page_in_browser` to allow time for handling cookie walls
Commit 956cdc7 "fix(agent/json_utils): Decode as JSON rather than Python objects" broke these unit tests because they generated "JSON" by stringifying a Python object.
* Compress steps in the prompt to reduce token usage, and to increase longevity when using models with limited context windows
* Move multiple copies of step formatting code to `Episode.format` method
* Add `EpisodicActionHistory.handle_compression` method to handle compression of new steps
* Implement `extract_information` function in `autogpt.processing.text` module. This function extracts pieces of information from a body of text based on a list of topics of interest.
* Add `topics_of_interest` and `get_raw_content` parameters to `read_webpage` commmand
* Limit maximum content length if `get_raw_content=true` is specified