This allows the `max_parquet_fanout` to be specified in the CLI for the `influxdb3 serve` command. This could be done previously via the `--datafusion-config` CLI argument, but the drawbacks to that were:
1. that is a fairly advanced option given the available key/value pairs are not well documented
2. if `iox.max_parquet_fanout` was not provided to that argument, the default would be set to `40`
This PR maintains the existing `--datafusion-config` CLI argument (with one caveat, see below) which allows users to provide a set key/value pairs that will be used to build the internal DataFusion config, but in addition provides the `--datafusion-max-parquet-fanout` argument:
```
--datafusion-max-parquet-fanout <MAX_PARQUET_FANOUT>
When multiple parquet files are required in a sorted way (e.g. for de-duplication), we have two options:
1. **In-mem sorting:** Put them into `datafusion.target_partitions` DataFusion partitions. This limits the fan-out, but requires that we potentially chain multiple parquet files into a single DataFusion partition. Since chaining sorted data does NOT automatically result in sorted data (e.g. AB-AB is not sorted), we need to preform an in-memory sort using `SortExec` afterwards. This is expensive. 2. **Fan-out:** Instead of chaining files within DataFusion partitions, we can accept a fan-out beyond `target_partitions`. This prevents in-memory sorting but may result in OOMs (out-of-memory) if the fan-out is too large.
We try to pick option 2 up to a certain number of files, which is configured by this setting.
[env: INFLUXDB3_DATAFUSION_MAX_PARQUET_FANOUT=]
[default: 1000]
```
with the default value of `1000`, which will override the core `iox_query` default of `40`.
A test was added to check that this is propagated down to the `IOxSessionContext` that is used during queries.
The only change to the `datafusion-config` CLI argument was to rename `INFLUXDB_IOX` in the environment variable to `INFLUXDB3`:
```
--datafusion-config <DATAFUSION_CONFIG>
Provide custom configuration to DataFusion as a comma-separated list of key:value pairs.
# Example ```text --datafusion-config "datafusion.key1:value1, datafusion.key2:value2" ```
[env: INFLUXDB3_DATAFUSION_CONFIG=]
[default: ]
```