Add the following functions to ingest data into memory before Auto-GPT run.
- split_file: given a content, split it in chunks of max_length with (or without) a specified overlap
- ingest_file: read a file, use split_file to split it in chunks and load each chunk in memory
- ingest_directory: ingest all files in a directory in memory
This pull request aims to enhance the organization, readability, and understanding of the .env.template file for users when they modify the settings. The changes include organizing the file in a tree-like structure with appropriate comments, providing clear guidance for users about the purpose of each variable, their possible values, and default settings when applicable.
As a user with no prior knowledge of best practices of contributing to a project / .env.template file documentation, I took the liberty to make changes to the file based on what I would have liked to have seen when I first encountered it. My goal was to include every configurable option for ease of use and better understanding of how the code works.
The key improvements made in this pull request are:
1. Grouping related variables under appropriate headers for better organization and ease of navigation.
2. Adding informative comments for each variable to help users understand their purpose and possible values.
3. Including default values in the comments to inform users of the consequences of not providing a specific value for a variable, allowing them to make
informed decisions when configuring the application.
4. Formatting the file consistently for better readability.
These changes will enhance user experience by simplifying the configuration process and reducing potential confusion. Users can quickly and easily configure the application without having to search through the code to determine default values or understand the relationship between various settings. Additionally, well-organized code and documentation can lead to fewer issues and misunderstandings, saving time for both users and maintainers of the project.
Please review these changes and let me know if you have any questions or suggestions for further improvement so I can make any necessary adjustments.
- Change the way User-Agent is handle when calling requests to browse website
- Add chunk to memory before and after summary. We do not save the "summary of summaries" as this wasn't performing great and caused noise when the "question" couldn't be answered.
- Use the newly added config parameters for max_length and max_token
I added two new config parameters:
- browse_chunk_max_length: define the max_length of a chunk being sent to the memory and to FAST_LLM_MODEL for summarizing
- browse_summary_max_token: define the max_token passed to the model use for summary creation. Changing this can help with complex subject, allowing the agent to be more verbose in its attemps to summarize the chunk and the chunks summary.
I've also edited the way the user_agent is handle.
By not having correct_json(json_str) in the try/except,
it was still easily possible to throw Invalid JSON errors.
When responses were received with no JSON at all, parsing
would fail on attempting to locate the braces.
updated README.md to explain new config
added Azure yaml loader to config class
centralized model retrieval into config class
this commit effectively combines and replaces #700 and #580