Go to file
Trevor Hilton 03ea565802
feat: cli arg to specify max parquet fanout (#25714)
This allows the `max_parquet_fanout` to be specified in the CLI for the `influxdb3 serve` command. This could be done previously via the `--datafusion-config` CLI argument, but the drawbacks to that were:
1. that is a fairly advanced option given the available key/value pairs are not well documented
2. if `iox.max_parquet_fanout` was not provided to that argument, the default would be set to `40`

This PR maintains the existing `--datafusion-config` CLI argument (with one caveat, see below) which allows users to provide a set key/value pairs that will be used to build the internal DataFusion config, but in addition provides the `--datafusion-max-parquet-fanout` argument:
```
    --datafusion-max-parquet-fanout <MAX_PARQUET_FANOUT>
          When multiple parquet files are required in a sorted way (e.g. for de-duplication), we have two options:

          1. **In-mem sorting:** Put them into `datafusion.target_partitions` DataFusion partitions. This limits the fan-out, but requires that we potentially chain multiple parquet files into a single DataFusion partition. Since chaining sorted data does NOT automatically result in sorted data (e.g. AB-AB is not sorted), we need to preform an in-memory sort using `SortExec` afterwards. This is expensive. 2. **Fan-out:** Instead of chaining files within DataFusion partitions, we can accept a fan-out beyond `target_partitions`. This prevents in-memory sorting but may result in OOMs (out-of-memory) if the fan-out is too large.

          We try to pick option 2 up to a certain number of files, which is configured by this setting.

          [env: INFLUXDB3_DATAFUSION_MAX_PARQUET_FANOUT=]
          [default: 1000]
```
with the default value of `1000`, which will override the core `iox_query` default of `40`.

A test was added to check that this is propagated down to the `IOxSessionContext` that is used during queries.

The only change to the `datafusion-config` CLI argument was to rename `INFLUXDB_IOX` in the environment variable to `INFLUXDB3`:
```
    --datafusion-config <DATAFUSION_CONFIG>
          Provide custom configuration to DataFusion as a comma-separated list of key:value pairs.

          # Example ```text --datafusion-config "datafusion.key1:value1, datafusion.key2:value2" ```

          [env: INFLUXDB3_DATAFUSION_CONFIG=]
          [default: ]
```
2024-12-27 12:42:30 -05:00
.cargo chore: Upgrade to Rust 1.78.0 (#24953) 2024-05-02 13:39:20 -04:00
.circleci chore: invalidate CloudFront on upload (#25651) 2024-12-12 16:00:23 -05:00
.github chore: Remove dependabot for our repo (#24693) 2024-02-26 13:38:20 -05:00
assets chore: Update README for InfluxDB main repo (#25101) 2024-06-27 12:50:05 -04:00
docker fix: Add docker folder back for CI (#24720) 2024-02-29 16:47:41 -05:00
influxdb3 feat: cli arg to specify max parquet fanout (#25714) 2024-12-27 12:42:30 -05:00
influxdb3_cache feat: suport projection pushdown in metadata cache (#25675) 2024-12-17 20:13:25 -05:00
influxdb3_catalog fix: do not count deleted dbs and tables toward limit (#25702) 2024-12-21 16:47:17 -05:00
influxdb3_clap_blocks feat: cli arg to specify max parquet fanout (#25714) 2024-12-27 12:42:30 -05:00
influxdb3_client feat: Add json lines support to query output (#25698) 2024-12-20 14:57:19 -05:00
influxdb3_id fix: move to fetch_update from fetch_add for IDs (#25663) 2024-12-16 11:32:33 -05:00
influxdb3_load_generator feat(processing_engine): initial implementation of Processing Engine plugins and triggers (#25639) 2024-12-13 14:11:38 -08:00
influxdb3_process chore: remove check for VERSION_HASH in build.rs (#25271) 2024-08-26 12:20:38 -04:00
influxdb3_py_api feat(processing_engine): Runtime and write-back improvements (#25672) 2024-12-17 16:38:12 -08:00
influxdb3_server refactor: porting changes in pro to oss (#25712) 2024-12-27 15:02:22 +00:00
influxdb3_sys_events refactor: porting changes in pro to oss (#25712) 2024-12-27 15:02:22 +00:00
influxdb3_telemetry feat: telem uptime and rename (#25682) 2024-12-20 08:52:02 +00:00
influxdb3_test_helpers test: add test helpers for object store types (#25420) 2024-10-02 14:45:12 -04:00
influxdb3_wal fix(catalog): consistent ordering of catalog operations (#25690) 2024-12-20 15:17:38 -08:00
influxdb3_write fix(catalog): consistent ordering of catalog operations (#25690) 2024-12-20 15:17:38 -08:00
iox_query_influxql_rewrite feat: extend InfluxQL rewriter for SELECT and EXPLAIN (#24726) 2024-03-05 15:40:16 -05:00
.editorconfig chore: editor config spacing for shell scripts 2022-12-13 11:12:11 +01:00
.gitattributes feat: implement jaeger-agent protocol directly (#2607) 2021-09-22 17:30:37 +00:00
.gitignore chore: clean up heappy, pprof, and jemalloc (#24967) 2024-05-06 15:21:18 -04:00
.kodiak.toml chore: Set default to squash 2022-01-25 15:57:10 +01:00
CONTRIBUTING.md docs: rename influxdb_iox to influxdata (#24577) 2024-01-16 13:34:23 -05:00
Cargo.lock feat: cli arg to specify max parquet fanout (#25714) 2024-12-27 12:42:30 -05:00
Cargo.toml chore: update core dependencies (#25708) 2024-12-24 14:21:59 +00:00
Dockerfile fix: the cache target for build artefacts in Dockerfile (#25510) 2024-11-01 17:20:00 -04:00
Dockerfile.dockerignore fix: Readd the Dockerfile for the main branch (#24719) 2024-02-29 16:33:36 -05:00
LICENSE-APACHE fix: Add LICENSE (#430) 2020-11-10 12:10:07 -05:00
LICENSE-MIT fix: Add LICENSE (#430) 2020-11-10 12:10:07 -05:00
PROFILING.md docs: `PROFILING.md` (#25075) 2024-07-24 11:01:36 -04:00
README.md chore: Update README for InfluxDB main repo (#25101) 2024-06-27 12:50:05 -04:00
SECURITY.md chore: tweak wording and don't reference gpg key in SECURITY.md (#24838) 2024-03-25 14:34:36 -05:00
deny.toml chore: update core dependencies (#25708) 2024-12-24 14:21:59 +00:00
run-tests.sh fix: check num items to prune before pruning parquet cache (#25447) 2024-10-10 14:03:26 +01:00
rust-toolchain.toml chore: upgrade to rust 1.83.0 (#25605) 2024-11-29 18:21:48 -05:00
rustfmt.toml chore: use Rust edition 2021 2021-10-25 10:58:20 +02:00

README.md

InfluxDB Logo

InfluxDB is the leading open source time series database for metrics, events, and real-time analytics.

Project Status

This main branch contains InfluxDB v3 in pre-release and under active development. Build artifacts are not yet generally available and official installation instructions will be coming later this year. For now, a Dockerfile is provided and can be adapted or used for inspiration by intrepid users.

Learn InfluxDB

Documentation | Community Forum | Community Slack | Blog | InfluxDB University | YouTube

Try InfluxDB Cloud for free and get started fast with no local setup required. Click here to start building your application on InfluxDB Cloud.

Installation

We have nightly and versioned Docker images, Debian packages, RPM packages, and tarballs of InfluxDB available on the InfluxData downloads page. We also provide the InfluxDB command line interface (CLI) client as a separate binary available at the same location.

If you are interested in building from source, see the building from source guide for contributors.

To begin using InfluxDB, visit our Getting Started with InfluxDB documentation.

License

The open source software we build is licensed under the permissive MIT and Apache 2 licenses. Weve long held the view that our open source code should be truly open and our commercial code should be separate and closed.

Interested in joining the team building InfluxDB?

Check out current job openings at www.influxdata.com/careers today!