influxdb

database go influxdb metrics monitoring react time-series

Go to file

Trevor Hilton 03ea565802 feat: cli arg to specify max parquet fanout (#25714 ) This allows the `max_parquet_fanout` to be specified in the CLI for the `influxdb3 serve` command. This could be done previously via the `--datafusion-config` CLI argument, but the drawbacks to that were: 1. that is a fairly advanced option given the available key/value pairs are not well documented 2. if `iox.max_parquet_fanout` was not provided to that argument, the default would be set to `40` This PR maintains the existing `--datafusion-config` CLI argument (with one caveat, see below) which allows users to provide a set key/value pairs that will be used to build the internal DataFusion config, but in addition provides the `--datafusion-max-parquet-fanout` argument: ``` --datafusion-max-parquet-fanout <MAX_PARQUET_FANOUT> When multiple parquet files are required in a sorted way (e.g. for de-duplication), we have two options: 1. In-mem sorting: Put them into `datafusion.target_partitions` DataFusion partitions. This limits the fan-out, but requires that we potentially chain multiple parquet files into a single DataFusion partition. Since chaining sorted data does NOT automatically result in sorted data (e.g. AB-AB is not sorted), we need to preform an in-memory sort using `SortExec` afterwards. This is expensive. 2. Fan-out: Instead of chaining files within DataFusion partitions, we can accept a fan-out beyond `target_partitions`. This prevents in-memory sorting but may result in OOMs (out-of-memory) if the fan-out is too large. We try to pick option 2 up to a certain number of files, which is configured by this setting. [env: INFLUXDB3_DATAFUSION_MAX_PARQUET_FANOUT=] [default: 1000] ``` with the default value of `1000`, which will override the core `iox_query` default of `40`. A test was added to check that this is propagated down to the `IOxSessionContext` that is used during queries. The only change to the `datafusion-config` CLI argument was to rename `INFLUXDB_IOX` in the environment variable to `INFLUXDB3`: ``` --datafusion-config <DATAFUSION_CONFIG> Provide custom configuration to DataFusion as a comma-separated list of key:value pairs. # Example ```text --datafusion-config "datafusion.key1:value1, datafusion.key2:value2" ``` [env: INFLUXDB3_DATAFUSION_CONFIG=] [default: ] ```		2024-12-27 12:42:30 -05:00
.cargo	chore: Upgrade to Rust 1.78.0 (#24953 )	2024-05-02 13:39:20 -04:00
.circleci	chore: invalidate CloudFront on upload (#25651 )	2024-12-12 16:00:23 -05:00
.github	chore: Remove dependabot for our repo (#24693 )	2024-02-26 13:38:20 -05:00
assets	chore: Update README for InfluxDB main repo (#25101 )	2024-06-27 12:50:05 -04:00
docker	fix: Add docker folder back for CI (#24720 )	2024-02-29 16:47:41 -05:00
influxdb3	feat: cli arg to specify max parquet fanout (#25714 )	2024-12-27 12:42:30 -05:00
influxdb3_cache	feat: suport projection pushdown in metadata cache (#25675 )	2024-12-17 20:13:25 -05:00
influxdb3_catalog	fix: do not count deleted dbs and tables toward limit (#25702 )	2024-12-21 16:47:17 -05:00
influxdb3_clap_blocks	feat: cli arg to specify max parquet fanout (#25714 )	2024-12-27 12:42:30 -05:00
influxdb3_client	feat: Add json lines support to query output (#25698 )	2024-12-20 14:57:19 -05:00
influxdb3_id	fix: move to fetch_update from fetch_add for IDs (#25663 )	2024-12-16 11:32:33 -05:00
influxdb3_load_generator	feat(processing_engine): initial implementation of Processing Engine plugins and triggers (#25639 )	2024-12-13 14:11:38 -08:00
influxdb3_process	chore: remove check for VERSION_HASH in build.rs (#25271 )	2024-08-26 12:20:38 -04:00
influxdb3_py_api	feat(processing_engine): Runtime and write-back improvements (#25672 )	2024-12-17 16:38:12 -08:00
influxdb3_server	refactor: porting changes in pro to oss (#25712 )	2024-12-27 15:02:22 +00:00
influxdb3_sys_events	refactor: porting changes in pro to oss (#25712 )	2024-12-27 15:02:22 +00:00
influxdb3_telemetry	feat: telem uptime and rename (#25682 )	2024-12-20 08:52:02 +00:00
influxdb3_test_helpers	test: add test helpers for object store types (#25420 )	2024-10-02 14:45:12 -04:00
influxdb3_wal	fix(catalog): consistent ordering of catalog operations (#25690 )	2024-12-20 15:17:38 -08:00
influxdb3_write	fix(catalog): consistent ordering of catalog operations (#25690 )	2024-12-20 15:17:38 -08:00
iox_query_influxql_rewrite	feat: extend InfluxQL rewriter for SELECT and EXPLAIN (#24726 )	2024-03-05 15:40:16 -05:00
.editorconfig	chore: editor config spacing for shell scripts	2022-12-13 11:12:11 +01:00
.gitattributes	feat: implement jaeger-agent protocol directly (#2607 )	2021-09-22 17:30:37 +00:00
.gitignore	chore: clean up heappy, pprof, and jemalloc (#24967 )	2024-05-06 15:21:18 -04:00
.kodiak.toml	chore: Set default to squash	2022-01-25 15:57:10 +01:00
CONTRIBUTING.md	docs: rename influxdb_iox to influxdata (#24577 )	2024-01-16 13:34:23 -05:00
Cargo.lock	feat: cli arg to specify max parquet fanout (#25714 )	2024-12-27 12:42:30 -05:00
Cargo.toml	chore: update core dependencies (#25708 )	2024-12-24 14:21:59 +00:00
Dockerfile	fix: the cache target for build artefacts in Dockerfile (#25510 )	2024-11-01 17:20:00 -04:00
Dockerfile.dockerignore	fix: Readd the Dockerfile for the main branch (#24719 )	2024-02-29 16:33:36 -05:00
LICENSE-APACHE	fix: Add LICENSE (#430 )	2020-11-10 12:10:07 -05:00
LICENSE-MIT	fix: Add LICENSE (#430 )	2020-11-10 12:10:07 -05:00
PROFILING.md	docs: `PROFILING.md` (#25075 )	2024-07-24 11:01:36 -04:00
README.md	chore: Update README for InfluxDB main repo (#25101 )	2024-06-27 12:50:05 -04:00
SECURITY.md	chore: tweak wording and don't reference gpg key in SECURITY.md (#24838 )	2024-03-25 14:34:36 -05:00
deny.toml	chore: update core dependencies (#25708 )	2024-12-24 14:21:59 +00:00
run-tests.sh	fix: check num items to prune before pruning parquet cache (#25447 )	2024-10-10 14:03:26 +01:00
rust-toolchain.toml	chore: upgrade to rust 1.83.0 (#25605 )	2024-11-29 18:21:48 -05:00
rustfmt.toml	chore: use Rust edition 2021	2021-10-25 10:58:20 +02:00

README.md

InfluxDB is the leading open source time series database for metrics, events, and real-time analytics.

Project Status

This main branch contains InfluxDB v3 in pre-release and under active development. Build artifacts are not yet generally available and official installation instructions will be coming later this year. For now, a Dockerfile is provided and can be adapted or used for inspiration by intrepid users.

Learn InfluxDB

Try InfluxDB Cloud for free and get started fast with no local setup required. Click here to start building your application on InfluxDB Cloud.

Installation

We have nightly and versioned Docker images, Debian packages, RPM packages, and tarballs of InfluxDB available on the InfluxData downloads page. We also provide the InfluxDB command line interface (CLI) client as a separate binary available at the same location.

For v1 installation, use the main 1.x branch or install InfluxDB OSS directly.
For v2 installation, use the main 2.x branch.
v3 development is on this main branch. This project is actively under development and is not considered stable.

If you are interested in building from source, see the building from source guide for contributors.

To begin using InfluxDB, visit our Getting Started with InfluxDB documentation.

License

The open source software we build is licensed under the permissive MIT and Apache 2 licenses. We’ve long held the view that our open source code should be truly open and our commercial code should be separate and closed.

Interested in joining the team building InfluxDB?

Check out current job openings at www.influxdata.com/careers today!

README.md Unescape Escape