This commit introduces the `BenchmarkRun` class, designed to model a complete benchmark run. The class encapsulates all data and sub-models related to a benchmark, providing a centralized object to handle various aspects of a benchmark run.
The `BenchmarkRun` class includes the following sub-models:
- `RepositoryInfo`: Information about the repository and team.
- `RunDetails`: Specific details like the run identifier, command, and timings.
- `TaskInfo`: Information about the task being benchmarked.
- `Metrics`: Performance metrics for the benchmark run.
- `Config`: Configuration settings for the benchmark run.
A `reachedCutoff` field is also included to indicate whether a certain cutoff was reached during the benchmark run.
Methods for serializing and deserializing the object to and from JSON are also provided.
This commit introduces the RepositoryInfo class, designed to encapsulate details about the repository and team associated with a benchmark run.
The class includes the following fields:
- repoUrl: The URL of the repository where the benchmark code resides.
- teamName: The name of the team responsible for the benchmark.
- benchmarkGitCommitSha: The Git commit SHA for the benchmark code.
- agentGitCommitSha: The Git commit SHA for the agent code.
The class supports JSON serialization and deserialization, making it easy to use with Flutter's JSON handling mechanisms.
Added a new Dart class called `RunDetails` to represent specific details related to a benchmark run.
The class includes fields for:
- The unique run identifier (`runId`)
- The command used to initiate the benchmark (`command`)
- The time the benchmark was completed (`completionTime`)
- The time the benchmark started (`benchmarkStartTime`)
- The name of the test being run (`testName`)
Serialization and deserialization methods are also provided for JSON compatibility.
Added a new TaskInfo class to encapsulate information related to a specific benchmark task.
- The TaskInfo class holds attributes like the data file path, regression status, task categories, task details, expected answer, and description.
- Included methods for JSON serialization and deserialization.
- Added comprehensive documentation to describe the purpose, properties, and methods of the TaskInfo class.
Added a new Metrics class to represent key performance metrics of a benchmark test run.
- The Metrics class encapsulates various data points like difficulty, success rate, attempted status, success percentage, cost, and runtime.
- Included serialization and deserialization methods for converting between Metrics objects and JSON.
- Added comprehensive documentation to describe the purpose, properties, and methods of the Metrics class.
This commit introduces a new `Config` class, designed to manage and store configuration settings related to the benchmark run. The class contains two key fields:
1. `agentBenchmarkConfigPath`: The path to the agent's benchmark configuration file.
2. `host`: The address of the host where the benchmark is running.
The class includes methods for serialization and deserialization, allowing easy conversion between `Config` objects and JSON maps.
Documentation comments have also been added for better code readability and understanding.
* New benchmark data models
* Update _benchmarkBaseUrl
* Remove ReportRequestBody
* Update benchmark service methods for proxy approach
* Add eval id to SkillNodeData
* Refactor runBenchmark Method for proxy approach
This commit updates the runBenchmark method in the SkillTreeViewModel class to align with the new report generation flow. The updated method does the following:
1. Checks if a benchmark is already running to prevent overlapping runs.
2. Sets a flag to indicate that the benchmark is running and notifies the UI.
3. Reverses the selected node hierarchy for report generation.
4. Loops through each node in the reversed hierarchy to:
- Generate a unique UUID for each test run.
- Create a ReportRequestBody object.
- Call the generateSingleReport method in the BenchmarkService.
- Update the UI after each single report is generated.
5. After all single reports are generated, it calls the generateCombinedReport method in the BenchmarkService, passing in all the generated UUIDs.
6. Finally, it sets the benchmark running flag to false and notifies the UI.
This change improves the report generation flow and allows for both individual and combined reports.
This commit introduces two major updates to the BenchmarkService class:
1. Renamed the `generateReport` method to `generateSingleReport` for better clarity and specificity.
2. Added a new method called `generateCombinedReport` that takes a list of test run IDs and generates a combined report by posting to the `/reports/query` endpoint.
These changes aim to improve the modularity and readability of the code, while also extending its functionality to handle combined reports.
This commit introduces substantial improvements to the TaskView class to accommodate both tasks and test suites in a unified view. It also integrates the TestSuiteDetailView to display test suite details when a test suite is selected.
Key Enhancements:
1. Modified the `initState` method to call `fetchAndCombineData()` from TaskViewModel, thereby populating the combined data source.
2. Replaced the ListView that was rendering tasks with a ListView that can render both tasks and test suites.
3. Introduced conditional rendering for TestSuiteDetailView when a test suite is selected.
4. Updated onTap actions to select and deselect tasks and test suites appropriately.
5. Moved to using a Stack layout to allow overlay of TestSuiteDetailView on top of the existing layout.
This refactor enhances the TaskView's capabilities to manage and display both tasks and test suites, offering a more integrated user experience.
This commit significantly expands the functionalities of TaskViewModel to manage both tasks and test suites in a unified manner. The view model now serves as the primary business logic class that interacts with the UI for task and test suite management.
Key Enhancements:
- Introduced `_testSuites` list to store TestSuite objects.
- Added `combinedDataSource` to hold both tasks and test suites.
- Introduced `selectTestSuite` and `deselectTestSuite` methods for TestSuite selection management.
- Added methods for TestSuite CRUD operations (`addTestSuite`, `fetchTestSuites`, `_saveTestSuitesToPrefs`).
- Created `fetchAndCombineData` method to fetch and combine tasks and test suites into a single list, `combinedDataSource`.
This update provides a more robust and unified approach for managing tasks and test suites, thereby improving the application's modularity and scalability.
This commit introduces a new StatefulWidget, TestSuiteDetailView, to offer a dedicated view for managing and interacting with individual Test Suites.
Key Features:
- Created a TestSuiteDetailView class that takes a TestSuite object and a TaskViewModel as parameters.
- Added an AppBar with a back button for easy navigation.
- Utilized ListView.builder to display a list of tasks that belong to the selected Test Suite.
- Integrated with existing TaskViewModel to select and delete tasks within the Test Suite.
- Included a Provider for the ChatViewModel to update the current task ID when a task is selected.
This new view enhances the user experience by providing a focused interface for managing tasks within individual Test Suites. This facilitates better organization and navigation for the user.
This commit adds a new StatelessWidget, TestSuiteListTile, designed to display individual TestSuite items in a list.
Key Features:
- Created a TestSuiteListTile class that takes a TestSuite object and a VoidCallback for the onTap event as parameters.
- Utilized Material Design with custom styling to ensure the tile fits well within the application's UI.
- The tile displays the timestamp of the TestSuite, which serves as its title.
- Included a play arrow icon to indicate that the tile is actionable.
- Utilized MediaQuery to adapt the tile width based on the screen size, capped at a maximum width of 260.
By adding this widget, we improve the UX by providing a consistent and intuitive way to interact with TestSuite objects in the UI.
This commit introduces a new class, TestSuite, designed to encapsulate a collection of Task objects under a common timestamp. This will help in grouping tasks that belong to a particular test suite.
Key Features:
- Add a TestSuite class with fields for `timestamp` and a list of `tests` (Task objects).
- Implement `toJson` method for serializing TestSuite objects to JSON-compatible format.
- Implement `fromJson` factory method for deserializing JSON data back into a TestSuite object.
By providing serialization and deserialization support directly in the model, we facilitate easier storage and data exchange for test suites.
This commit adds serialization support to the Task model by including a `toJson` method. This will allow easy conversion of Task objects to a JSON-compatible format, facilitating storage or network transmission.
Added a state variable isBenchmarkRunning in SkillTreeViewModel to track the status of benchmark execution. This state variable is used to conditionally disable specific UI components:
- The "Initiate test suite" button in TaskQueueView is disabled during the benchmark.
- All IconButtons in SideBarView are disabled during the benchmark.
- Node selection in SkillTreeView is disabled during the benchmark.
This ensures that the user cannot interact with these components while a benchmark test is running, thereby improving UX and preventing potential issues.
This commit updates the `TaskQueueView` to pass real data from the `selectedNodeHierarchy` to the `callGenerateReport` method in `SkillTreeViewModel`. An array of test names is constructed from the reversed node hierarchy, and these names are used as the `tests` field in the `ReportRequestBody`.
Note: The `category` field is now an empty string as per the new requirement, and `mock` continues to be set to true.
This commit integrates the `callGenerateReport` method from `SkillTreeViewModel` into the `TaskQueueView`. Now, when the user clicks the green checkmark button, the `callGenerateReport` method is triggered with hardcoded values for testing purposes.
Note: The implementation is still temporary and will be updated for dynamic behavior in the future.
This commit adds a new boolean field, "mock", to the `ReportRequestBody` class. This additional field is in line with the new requirements to specify whether the report is a mock or not.
The `toJson()` method is also updated to include this new field during serialization.
This commit integrates Firebase Authentication into the application and refactors the `main.dart` file to streamline the user authentication flow. The application will now automatically switch between the main layout and the authentication UI based on the user's sign-in status.
- Added Firebase initialization to the `main()` function for Firebase Authentication.
- Replaced the home page in `MyApp` with a `StreamBuilder` that listens for Firebase auth state changes.
- Based on the auth state, the home page will either show `MainLayout` (if the user is signed in) or `FirebaseAuthView` (if the user is not signed in).
- Added necessary imports for the Firebase and authentication services.
This commit adds the `FirebaseAuthView` class, a Flutter widget that serves as the UI for user authentication using Google and GitHub. The class uses the `AuthService` to handle the actual sign-in logic.
Features:
- Added an OutlinedButton for Google Sign-In, styled with Google's colors and logo.
- Added an OutlinedButton for GitHub Sign-In, styled with GitHub's colors and logo.
- Integrated the `AuthService` methods for Google and GitHub sign-in.
This commit introduces the AuthService class, which encapsulates the logic for signing in with Google and GitHub using Firebase Authentication. The class provides methods for initiating the OAuth flows for both providers and for signing out the user.
- Implemented `signInWithGoogle()` to handle Google Sign-In.
- Implemented `signInWithGitHub()` to handle GitHub Sign-In via popup.
- Added `signOut()` method for logging out the user.
- Added `getCurrentUser()` method to fetch the currently authenticated user.
This commit integrates the `BenchmarkService` into the main application setup through the `MultiProvider` in `main.dart`. The changes include:
1. Adding `BenchmarkService` to the list of service providers, allowing it to be accessible throughout the application via dependency injection.
2. Using `ProxyProvider` to ensure `BenchmarkService` gets the `RestApiUtility` instance as a dependency.
3. Modifying the `MyApp` class to fetch the `BenchmarkService` from the provider, making it ready for use in the application's lifecycle.
This addition allows `BenchmarkService` to be centrally managed and readily available for any part of the application that requires benchmark-related functionalities.
This commit enhances the `RestApiUtility` class to support multiple base URLs by incorporating an `ApiType` enum parameter in its methods. The changes include:
1. `_agentBaseUrl`: The base URL for the agent-related API calls.
2. `_benchmarkBaseUrl`: A hard-coded base URL for benchmark-related API calls.
3. `_getEffectiveBaseUrl`: A new private method that determines the effective base URL based on the given `ApiType`.
All public methods (`get`, `post`, `getBinary`) have been updated to include an optional `ApiType` parameter, which defaults to `ApiType.agent`. Based on this parameter, `_getEffectiveBaseUrl` is called to decide the base URL for the HTTP request.
This change allows for flexible API calls without the need to instantiate multiple `RestApiUtility` objects for different services.
This commit extends the `SkillTreeViewModel` to include `BenchmarkService` as a dependency. This integration allows for leveraging benchmark-related API calls within the skill tree logic.
Two new methods have been added to `SkillTreeViewModel`:
1. `callGenerateReport`: This method attempts to call the `generateReport` function from the `BenchmarkService`. Currently, it only prints the API response and is incomplete in terms of full functionality.
2. `callPollUpdates`: Similar to `callGenerateReport`, this method aims to call `pollUpdates` from `BenchmarkService` and prints the API response. This is also incomplete and will require further development.
Both methods are preliminary and will require additional features to become fully functional.
This commit introduces the `BenchmarkService` class, which encapsulates API calls related to benchmarking operations. The class includes two methods:
1. `generateReport`: Takes a `ReportRequestBody` object as input and performs a POST request to the `/reports` URL, serializing the object to JSON format.
2. `pollUpdates`: Accepts a UNIX UTC timestamp as an argument and performs a GET request to the `/updates?last_update_time=<timestamp>` URL.
Both methods use the `RestApiUtility` for making HTTP requests and are designed to work with different base URLs, controlled by the `ApiType` enum.
This commit adds a new enum, `ApiType`, to allow dynamic selection between different base URLs for API calls. The enum has two values: `agent` and `benchmark`, corresponding to different services.
The `ApiType` enum is designed to be passed as a parameter to the `RestApiUtility` methods, enabling the utility to decide which base URL to use for HTTP requests.
This commit introduces a new Dart class, `ReportRequestBody`, which represents the request body for generating reports in the `BenchmarkService`. The class includes a `toJson` method for easy serialization to JSON format.
- `category`: Specifies the category of the report (e.g., "coding").
- `tests`: A list of tests to be included in the report.
This commit substantially upgrades the SkillTreeViewModel by incorporating asynchronous initialization from a JSON asset. Now, both nodes and edges of the skill tree are dynamically generated based on the JSON data. This not only enhances the modularity of the code but also simplifies the process of updating or modifying the skill tree.
Other improvements include:
- Changed node IDs from integers to strings for better flexibility.
- Added a function to get a node by its ID, improving code reusability.
- Introduced error handling for potential issues during JSON parsing or node retrieval.
- Updated the sibling, level, and subtree separation configurations for the graph view layout.
These changes make the skill tree more dynamic and maintainable, setting the stage for future extensions.
This commit refactors the SkillTreeView class to include asynchronous initialization through FutureBuilder. The new version also replaces the integer-based node IDs with string-based IDs, aligning better with the SkillTreeNode model. The code now clears previous graph nodes and edges before adding new ones, preventing duplication. Additionally, the TreeNodeView component is now populated dynamically with data from the SkillTreeNode model, making the tree view more robust and integrated.
This commit updates the TreeNodeView class from a stateless widget to a stateful widget to handle hover interactions. The new version also replaces the old simple text-based representation with a more interactive and visually appealing design that includes icons and hover effects. The SkillTreeNode model is now used to populate the node information, making the TreeNodeView more dynamic and integrated with the rest of the application.
This commit extends the SkillTreeNode class to incorporate new attributes such as 'data', 'label', and 'shape', making the model more comprehensive. The JSON deserialization is also updated to handle optional or missing fields by providing default values, improving the robustness of the model.
This commit updates the SkillNodeData class to handle optional or missing JSON fields more robustly. Now, the model provides default values for each field, ensuring that the object can be instantiated successfully even if some JSON fields are missing or set to null.
This commit updates the Info class to provide default values for optional or missing fields in the JSON payload. This ensures that the model can be successfully instantiated even when some JSON fields are absent or set to null.
This commit modifies the Ground class to make it more robust against optional or missing fields in the incoming JSON data. Default values have been added to ensure that the model can be instantiated even if some JSON fields are missing or set to null.
* Add TestQueueView to Main Layout
This commit integrates the TestQueueView into the main layout. The layout now conditionally displays the TestQueueView based on whether a node in the SkillTree is selected.
- TestQueueView appears when a SkillTree node is selected.
- Main layout adjusts to accommodate TestQueueView alongside SkillTreeView and ChatView.
- Implemented responsive layout logic to manage the widths of the different views based on the screen width and the state of the SkillTree.
* Extend SkillTreeViewModel to Track Selected Node Hierarchy
This commit enhances the SkillTreeViewModel to maintain a list of nodes that form a hierarchy from the currently selected node to the root. This allows for more interactive and informative views that can leverage this hierarchical data.
- Added a new property `selectedNodeHierarchy` to keep track of the node hierarchy.
- Modified the `toggleNodeSelection` method to populate or clear `selectedNodeHierarchy` based on node selection.
- Introduced a new method `populateSelectedNodeHierarchy` to build the hierarchy from the selected node to the root.
* Extract skill tree view model reset state to method
* Implement UI enhancements for TaskQueueView
This commit introduces several UI improvements to the TaskQueueView:
- Tiles are padded 20 units from both the leading and trailing edges.
- Tiles now have a white background.
- Added a thin black border to the tiles.
- Incorporated a slight corner radius for the tiles.
- Centered the title and subtitle horizontally within the tiles.
- Added a checkmark button with a tooltip at the bottom-right corner for running a suite of tests.
These changes aim to improve the user experience and visual appeal of the TaskQueueView.
* Make MainLayout a consumer of SkillTreeViewModel
This commit replaces the placeholder implementation of the SkillTreeView class with a complete, functional version.
Features:
- Initializes the skill tree from the SkillTreeViewModel during `initState`.
- Dynamically creates Node and Edge objects for GraphView based on ViewModel data.
- Utilizes the TreeNodeView to render individual nodes.
- Integrates node selection functionality from the ViewModel.
- Adds InteractiveViewer for zoom and pan capabilities.
The new SkillTreeView is designed to work closely with SkillTreeViewModel to manage and display the skill tree.
This commit adds the TreeNodeView class, a StatelessWidget responsible for rendering individual nodes in the skill tree.
Features:
- Displays the node ID in the view.
- Uses the Provider package to interact with the SkillTreeViewModel.
- Includes an onTap method to toggle node selection state.
- Updates the UI to reflect the selected state by changing the background color.
The TreeNodeView is designed to work in conjunction with SkillTreeViewModel to manage node selection states.
The SkillTreeViewModel class serves as the view model for the skill tree and extends Flutter's ChangeNotifier for state management.
Features include:
- Storing and managing the list of SkillTreeNodes and SkillTreeEdges.
- Managing the state of the selected node.
- Initializing the skill tree with predefined nodes and edges.
- Methods for toggling node selection, allowing for only a single node to be selected at any given time.
The view model utilizes the GraphView package for visualization and layout.
The SkillTreeEdge model represents the relationship between different skill nodes.
It includes:
- Edge ID
- Source node ID
- Destination node ID
- Arrows property to indicate directionality
The SkillNodeData model aggregates various data related to a skill node.
It includes:
- Node name
- Node category
- Associated task
- Dependencies
- Cutoff value
- Ground object for evaluation details
- Info object for metadata
The Info data model holds metadata about a skill node.
It includes:
- The difficulty level of the skill node
- A description of the skill node
- A list of potential side effects related to the skill node
The Ground data model stores evaluation information for each skill node.
It includes:
- The answer to be evaluated
- A list of terms that should be contained in the answer
- A list of terms that should not be contained in the answer
- A list of associated files
- A map for additional evaluation criteria
Added the downloadArtifact method to the ChatService class, enabling the download of artifacts in the Flutter web application. The function uses the dart:html package to trigger a browser-based file download, allowing users to save artifacts locally. This implementation complements the existing REST API and enhances the user experience.
Implemented auto-scrolling in the chat message list to ensure that the view scrolls down to the most recent message when a new chat is added. This behavior only triggers if the user is already at the bottom of the list, providing a seamless user experience.
- Added a new boolean state `isContinuousMode` to the `ChatInputField` widget to handle the continuous mode feature.
- Introduced a new callback function `onContinuousModePressed` to manage the state of the continuous mode from an external source.
- Conditionally rendered the send button based on the `isContinuousMode` state.
- Enhanced the UI by adding a button to toggle between continuous mode and single message mode, which triggers the `onContinuousModePressed` callback.
Added a new state variable `_isContinuousMode` to the ChatViewModel to track whether the chat is in continuous mode or not. This state is toggled via a setter and triggers UI updates through `notifyListeners()`.
Enhanced the `sendChatMessage` method to automatically send a null message if continuous mode is active, triggering the next step in the chat.
Updated the StepRequestBody class to allow both 'input' and 'additionalInput' to be optional. Added logic in toJson() method to return an empty JSON object if both fields are null.