This adds the ability for the data generator to write to many databases. A new command line argument, `bucket_list`, is added which should be a file name. The file should contain a list of databsaes, one per line, with the structure of <org>_<bucket>. This is a little odd given the data generator expects org and bucket separately, but I expect the file that we'll be using will be database names, which have this format.
The configuration can specify what percentage of the list should get written to by which agents at what sampling interval. This should allow configurations where databases get different levels of ingest and different types (as specified via different agent specs). The structure is a little wonky, but I think it'll get the job done. The next step is to run some perf tests to see how the data generator performs if writing to 10k databases.
This is work leading up to giving the data generator the ability to write to many databases. The plan is to specify which agents databases will use to write data.
* chore: Update datafusion deps to pre-release
* refactor: Update IOx to use new datafusion Statistics
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>