Commit Graph

121 Commits (f4e7b1c61c876b839b9071c218d4f0c46a095f24)

Author SHA1 Message Date
merwanehamadi f4e7b1c61c
Add eval_id and sync Skill Tree with Frontend(#5287)
Add eval_id to skill tree

Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-09-21 13:36:17 -07:00
Swifty 3f83e20387
Update frontend build (#5282)
Co-authored-by: GitHub Action <action@github.com>
2023-09-21 08:07:15 +02:00
hunteraraujo 62efc6b07e Add Firebase Analytics dependency 2023-09-20 20:40:12 -07:00
hunteraraujo 22ea449850 Integrate LeaderboardService into SkillTreeViewModel
This commit integrates the `LeaderboardService` into `SkillTreeViewModel` to enable benchmark report submissions to the leaderboard. A `BenchmarkRun` object is created from the evaluation response and submitted using the `submitReport` method from `LeaderboardService`.
2023-09-20 19:36:25 -07:00
hunteraraujo 1a471b73cd Fix _leaderboardBaseUrl 2023-09-20 14:44:29 -07:00
hunteraraujo 7901c750b6 Extend RestApiUtility to Support Leaderboard Base URL
This commit extends the `RestApiUtility` class to include support for a new leaderboard base URL. A new `ApiType` enum value `ApiType.leaderboard` has been added, and the `_getEffectiveBaseUrl` method has been updated to handle this new type. The leaderboard base URL is "https://leaderboard.vercel.app/".
2023-09-20 14:43:45 -07:00
hunteraraujo a0512254ca Add LeaderboardService with submitReport Method
This commit adds a new `LeaderboardService` class featuring a `submitReport` method. This method allows for the submission of `BenchmarkRun` objects to the leaderboard via a POST request to the `/api/reports` endpoint. The new service uses the `ApiType.leaderboard` enum value.
2023-09-20 14:38:48 -07:00
hunteraraujo fe96664afb Update ApiType Enum to Include Leaderboard 2023-09-20 14:31:27 -07:00
hunteraraujo cfc6180233 Add BenchmarkRun Class to Model Complete Benchmark Runs
This commit introduces the `BenchmarkRun` class, designed to model a complete benchmark run. The class encapsulates all data and sub-models related to a benchmark, providing a centralized object to handle various aspects of a benchmark run.

The `BenchmarkRun` class includes the following sub-models:
- `RepositoryInfo`: Information about the repository and team.
- `RunDetails`: Specific details like the run identifier, command, and timings.
- `TaskInfo`: Information about the task being benchmarked.
- `Metrics`: Performance metrics for the benchmark run.
- `Config`: Configuration settings for the benchmark run.

A `reachedCutoff` field is also included to indicate whether a certain cutoff was reached during the benchmark run.

Methods for serializing and deserializing the object to and from JSON are also provided.
2023-09-20 13:24:19 -07:00
hunteraraujo 311f69b7cf Add RepositoryInfo Class for Benchmark Repository and Team Details
This commit introduces the RepositoryInfo class, designed to encapsulate details about the repository and team associated with a benchmark run.

The class includes the following fields:
- repoUrl: The URL of the repository where the benchmark code resides.
- teamName: The name of the team responsible for the benchmark.
- benchmarkGitCommitSha: The Git commit SHA for the benchmark code.
- agentGitCommitSha: The Git commit SHA for the agent code.

The class supports JSON serialization and deserialization, making it easy to use with Flutter's JSON handling mechanisms.
2023-09-20 13:17:46 -07:00
hunteraraujo fc193568b9 Add RunDetails class for encapsulating benchmark run information
Added a new Dart class called `RunDetails` to represent specific details related to a benchmark run.

The class includes fields for:
- The unique run identifier (`runId`)
- The command used to initiate the benchmark (`command`)
- The time the benchmark was completed (`completionTime`)
- The time the benchmark started (`benchmarkStartTime`)
- The name of the test being run (`testName`)

Serialization and deserialization methods are also provided for JSON compatibility.
2023-09-20 13:12:03 -07:00
hunteraraujo afe77bbc4f Add TaskInfo class with serialization and documentation
Added a new TaskInfo class to encapsulate information related to a specific benchmark task.

- The TaskInfo class holds attributes like the data file path, regression status, task categories, task details, expected answer, and description.
- Included methods for JSON serialization and deserialization.
- Added comprehensive documentation to describe the purpose, properties, and methods of the TaskInfo class.
2023-09-20 13:07:54 -07:00
hunteraraujo 50ef7b31eb Add Metrics class with serialization and documentation
Added a new Metrics class to represent key performance metrics of a benchmark test run.

- The Metrics class encapsulates various data points like difficulty, success rate, attempted status, success percentage, cost, and runtime.
- Included serialization and deserialization methods for converting between Metrics objects and JSON.
- Added comprehensive documentation to describe the purpose, properties, and methods of the Metrics class.
2023-09-20 13:04:47 -07:00
hunteraraujo 39f8ae515b Add Config Class for Benchmark Configuration Management
This commit introduces a new `Config` class, designed to manage and store configuration settings related to the benchmark run. The class contains two key fields:

1. `agentBenchmarkConfigPath`: The path to the agent's benchmark configuration file.
2. `host`: The address of the host where the benchmark is running.

The class includes methods for serialization and deserialization, allowing easy conversion between `Config` objects and JSON maps.

Documentation comments have also been added for better code readability and understanding.
2023-09-20 13:00:22 -07:00
Swifty 8897e47691
Update frontend build (#5270)
Co-authored-by: GitHub Action <action@github.com>
2023-09-20 09:53:00 +02:00
hunteraraujo 377d0af228
Refactor SkillTreeViewModel and Update TaskQueueView UI for Task Status (#5269)
* Refactor SkillTreeViewModel and Update TaskQueueView UI for Task Status

* Notify UI when updating benchmark status
2023-09-19 23:30:22 -07:00
hunteraraujo 99035103e0 Rename benchmark_service directory to benchmark 2023-09-19 22:16:58 -07:00
hunteraraujo 525571c32e
Enhance runBenchmark with TestSuite Tracking (#5268) 2023-09-19 21:31:02 -07:00
hunteraraujo 80682b41cb
Add Early Termination to runBenchmark on Benchmark Failure (#5267) 2023-09-19 20:24:52 -07:00
hunteraraujo a37b486227
Enhance SkillTreeViewModel to Manage Benchmark Status (#5266)
Enhance SkillTreeViewModel to Manage Benchmark Execution and Status
2023-09-19 20:20:31 -07:00
hunteraraujo f130aa7972 Correct triggerEvaluation endpoint 2023-09-19 17:19:59 -07:00
hunteraraujo 5afab461ee
Refactor Benchmarking Workflow and Introduce New Data Models (#5264)
* New benchmark data models

* Update _benchmarkBaseUrl

* Remove ReportRequestBody

* Update benchmark service methods for proxy approach

* Add eval id to SkillNodeData

* Refactor runBenchmark Method for proxy approach
2023-09-19 17:01:15 -07:00
Swifty ccd0eb800b
Update frontend build (#5258)
Co-authored-by: GitHub Action <action@github.com>
2023-09-19 13:06:20 +02:00
SwiftyOS 172d256e15 Switched pull request step 2023-09-19 12:57:49 +02:00
SwiftyOS 2c187b66b7 More messing with the action 2023-09-19 12:50:44 +02:00
SwiftyOS 9a94ce31d8 Testing PR creation 2023-09-19 12:44:21 +02:00
SwiftyOS c7f4bd265d Changed to push to a branch and make a pr 2023-09-19 12:35:04 +02:00
SwiftyOS de4839b050 Testing build action 2023-09-19 12:11:32 +02:00
SwiftyOS 833a37e9a6 Added action to build and commit the frontend 2023-09-19 12:02:50 +02:00
hunteraraujo bf03dd8739 Refactor runBenchmark in SkillTreeViewModel for New Report Generation Flow
This commit updates the runBenchmark method in the SkillTreeViewModel class to align with the new report generation flow. The updated method does the following:

1. Checks if a benchmark is already running to prevent overlapping runs.
2. Sets a flag to indicate that the benchmark is running and notifies the UI.
3. Reverses the selected node hierarchy for report generation.
4. Loops through each node in the reversed hierarchy to:
  - Generate a unique UUID for each test run.
  - Create a ReportRequestBody object.
  - Call the generateSingleReport method in the BenchmarkService.
  - Update the UI after each single report is generated.

5. After all single reports are generated, it calls the generateCombinedReport method in the BenchmarkService, passing in all the generated UUIDs.

6. Finally, it sets the benchmark running flag to false and notifies the UI.

This change improves the report generation flow and allows for both individual and combined reports.
2023-09-18 19:55:01 -07:00
hunteraraujo 5814c5a365 Change mock property to be required in ReportRequestBody 2023-09-18 19:46:56 -07:00
hunteraraujo b3d0cf9a22 Add UUID dependency 2023-09-18 19:42:42 -07:00
hunteraraujo 0e069c2679 Add generateCombinedReport Method and Rename Existing Method
This commit introduces two major updates to the BenchmarkService class:

1. Renamed the `generateReport` method to `generateSingleReport` for better clarity and specificity.

2. Added a new method called `generateCombinedReport` that takes a list of test run IDs and generates a combined report by posting to the `/reports/query` endpoint.

These changes aim to improve the modularity and readability of the code, while also extending its functionality to handle combined reports.
2023-09-18 17:15:44 -07:00
hunteraraujo da9fd926c8 Refactor ReportRequestBody for a single test 2023-09-18 17:09:23 -07:00
hunteraraujo 8923e79b29 Refactor TaskView to Support Combined Data and Test Suite Detail View
This commit introduces substantial improvements to the TaskView class to accommodate both tasks and test suites in a unified view. It also integrates the TestSuiteDetailView to display test suite details when a test suite is selected.

Key Enhancements:

1. Modified the `initState` method to call `fetchAndCombineData()` from TaskViewModel, thereby populating the combined data source.
2. Replaced the ListView that was rendering tasks with a ListView that can render both tasks and test suites.
3. Introduced conditional rendering for TestSuiteDetailView when a test suite is selected.
4. Updated onTap actions to select and deselect tasks and test suites appropriately.
5. Moved to using a Stack layout to allow overlay of TestSuiteDetailView on top of the existing layout.

This refactor enhances the TaskView's capabilities to manage and display both tasks and test suites, offering a more integrated user experience.
2023-09-18 15:08:22 -07:00
hunteraraujo 93094c7223 Extend TaskViewModel to Support Test Suites and Combined Data Sources
This commit significantly expands the functionalities of TaskViewModel to manage both tasks and test suites in a unified manner. The view model now serves as the primary business logic class that interacts with the UI for task and test suite management.

Key Enhancements:
- Introduced `_testSuites` list to store TestSuite objects.
- Added `combinedDataSource` to hold both tasks and test suites.
- Introduced `selectTestSuite` and `deselectTestSuite` methods for TestSuite selection management.
- Added methods for TestSuite CRUD operations (`addTestSuite`, `fetchTestSuites`, `_saveTestSuitesToPrefs`).
- Created `fetchAndCombineData` method to fetch and combine tasks and test suites into a single list, `combinedDataSource`.

This update provides a more robust and unified approach for managing tasks and test suites, thereby improving the application's modularity and scalability.
2023-09-18 15:03:53 -07:00
hunteraraujo 9f92488443 Add TestSuiteDetailView for Detailed Test Suite Management
This commit introduces a new StatefulWidget, TestSuiteDetailView, to offer a dedicated view for managing and interacting with individual Test Suites.

Key Features:
- Created a TestSuiteDetailView class that takes a TestSuite object and a TaskViewModel as parameters.
- Added an AppBar with a back button for easy navigation.
- Utilized ListView.builder to display a list of tasks that belong to the selected Test Suite.
- Integrated with existing TaskViewModel to select and delete tasks within the Test Suite.
- Included a Provider for the ChatViewModel to update the current task ID when a task is selected.

This new view enhances the user experience by providing a focused interface for managing tasks within individual Test Suites. This facilitates better organization and navigation for the user.
2023-09-18 14:59:26 -07:00
hunteraraujo 3cbe5a84e4 Implement TestSuiteListTile Widget for Displaying Test Suites
This commit adds a new StatelessWidget, TestSuiteListTile, designed to display individual TestSuite items in a list.

Key Features:
- Created a TestSuiteListTile class that takes a TestSuite object and a VoidCallback for the onTap event as parameters.
- Utilized Material Design with custom styling to ensure the tile fits well within the application's UI.
- The tile displays the timestamp of the TestSuite, which serves as its title.
- Included a play arrow icon to indicate that the tile is actionable.
- Utilized MediaQuery to adapt the tile width based on the screen size, capped at a maximum width of 260.

By adding this widget, we improve the UX by providing a consistent and intuitive way to interact with TestSuite objects in the UI.
2023-09-18 14:55:03 -07:00
hunteraraujo 1d735caf40 Add TestSuite Model with Serialization and Deserialization Support
This commit introduces a new class, TestSuite, designed to encapsulate a collection of Task objects under a common timestamp. This will help in grouping tasks that belong to a particular test suite.

Key Features:
- Add a TestSuite class with fields for `timestamp` and a list of `tests` (Task objects).
- Implement `toJson` method for serializing TestSuite objects to JSON-compatible format.
- Implement `fromJson` factory method for deserializing JSON data back into a TestSuite object.

By providing serialization and deserialization support directly in the model, we facilitate easier storage and data exchange for test suites.
2023-09-18 14:41:25 -07:00
hunteraraujo e446d723ee Extend Task Model to Include Serialization
This commit adds serialization support to the Task model by including a `toJson` method. This will allow easy conversion of Task objects to a JSON-compatible format, facilitating storage or network transmission.
2023-09-18 14:35:34 -07:00
hunteraraujo e90eb0fd61 Update ApiSettingsViewModel _baseURL 2023-09-18 13:31:48 -07:00
merwanehamadi 2cf350b783
Agent Protocol v1 (#5254)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-09-18 11:09:55 -07:00
Swifty 8b3a915b2f
Serving frontend from the forge agent server (#5252) 2023-09-18 16:27:03 +02:00
merwanehamadi f4d319cee4
Refactor benchmark (#5247)
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
2023-09-17 06:55:20 -07:00
hunteraraujo 4463f75756 Fix issue where side bar view is not disabled 2023-09-16 22:19:42 -07:00
hunteraraujo 60ae12dfd5 Implement UI Disable Feature During Benchmark Run
Added a state variable isBenchmarkRunning in SkillTreeViewModel to track the status of benchmark execution. This state variable is used to conditionally disable specific UI components:

- The "Initiate test suite" button in TaskQueueView is disabled during the benchmark.
- All IconButtons in SideBarView are disabled during the benchmark.
- Node selection in SkillTreeView is disabled during the benchmark.

This ensures that the user cannot interact with these components while a benchmark test is running, thereby improving UX and preventing potential issues.
2023-09-16 19:24:54 -07:00
hunteraraujo 11101286a3 Remove comment 2023-09-16 19:02:45 -07:00
hunteraraujo 6b921b5eda Refactor test suite button + rename method to runBenchmark 2023-09-16 18:56:42 -07:00
hunteraraujo 25ce1d6be0 Fix regression with deleting tasks 2023-09-16 17:28:58 -07:00
hunteraraujo d48eb99669 Pass Real Data to callGenerateReport in TaskQueueView
This commit updates the `TaskQueueView` to pass real data from the `selectedNodeHierarchy` to the `callGenerateReport` method in `SkillTreeViewModel`. An array of test names is constructed from the reversed node hierarchy, and these names are used as the `tests` field in the `ReportRequestBody`.

Note: The `category` field is now an empty string as per the new requirement, and `mock` continues to be set to true.
2023-09-16 15:14:34 -07:00