Commit Graph

49735 Commits (feat/implement-gen1-file-retention-period)

Author SHA1 Message Date
wayne warren 66c81eb00a refactor: move deleter module into its own crate 2025-06-06 20:54:16 -06:00
wayne warren 89fdb6010d chore(WIP): add Arc<dyn ObjectStore> to ObjectStoreDeleter 2025-06-06 20:54:16 -06:00
wayne warren 93276712cb feat: spawn deleter background task separately from catalog updater 2025-06-06 20:54:16 -06:00
wayne warren bd4b5aa8a0 feat: add DeleteTaskQueuer to WriteBufferImpl 2025-06-06 20:54:15 -06:00
wayne warren e4021c98f8 feat: introduce stubbed implementation of ObjectStoreDeleter 2025-06-06 20:54:15 -06:00
wayne warren ae79ed849c feat: add DeleteTask::ParqeutFile to support retention policy deletions 2025-06-06 20:54:15 -06:00
wayne warren b91a55d30b refactor: initialize DeleteManager with constructor 2025-06-06 20:54:15 -06:00
wayne fa646a6f64
chore: backport retention period implementation from enterprise (#26501)
Includes two main components:

* Removal of expired data from `PersistedFiles`.
* Modified `ChunkFilter` that precisely excludes expired data from query
  results even if the expired data hasn't been removed from the object
  store yet.

---------

Co-authored-by: Michael Gattozzi <mgattozzi@influxdata.com>
2025-06-06 18:07:34 -06:00
wayne d499c59bb1
chore: backport hard delete time in Catalog and deleter service from enterprise (#26500)
* Merge pull request #881 from influxdata/sgc/26156/hard_delete_table_apis

feat: Catalog tracks hard delete time; implement deleter service

* Merge pull request #885 from influxdata/sgc/26156/pr_881_followup

chore: PR #881 followup

---------

Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
2025-06-05 14:41:01 -06:00
praveen-influx e392a3803e
chore: fix flakey test in write buffer (#26494) 2025-06-03 22:27:21 +01:00
Trevor Hilton 531f8dc9b6
chore: update `influxdb3_core` (#26491)
* chore: patch influxdb3_core 634a3f142

* chore: update tower-http for cargo audit

* chore: update to latest sha on influxdb3_core
2025-06-03 13:52:59 -04:00
praveen-influx 9486e0ae13
feat: add precise load generator based on given tput (#26492)
This commit adds another sub command to load generator that allows
creating constrained throughput of line protocol data shared between
given number of writers. It uses a very naive approach to generate data
which may contain some duplicates. However it is useful when you need to
generate a very specific amount of data per writer. This approach has
been used to reproduce OOMs observed in perf tests.

This does not create a report like other sub-commands, and it also does
not observe any errors in the writes.

pro PR: https://github.com/influxdata/influxdb_pro/pull/886
2025-06-03 17:37:08 +01:00
praveen-influx a67b50dac5
feat: add concurrency limit for WAL replay (#26483)
WAL replay currently loads _all_ WAL files concurrently running into
OOM. This commit adds a CLI parameter `--wal-replay-concurrency-limit`
that would allow the user to set a lower limit and run WAL replay again.

closes: https://github.com/influxdata/influxdb/issues/26481
2025-06-03 16:34:31 +01:00
wayne 041c2c43d7
chore: address a couple post-merge PR comments (#26489) 2025-06-02 19:50:02 -06:00
wayne acdb8f650e
feat: add retention period to catalog (#26479)
* feat: add retention period to catalog

* fix: handle humantime parsing error properly

* refactor: use new iox_http_util types

---------

Co-authored-by: Michael Gattozzi <mgattozzi@influxdata.com>
2025-06-02 18:36:04 -06:00
Stuart Carnie 494847b3e9
fix: Automatic intermediate directory cleanup for file object store (#26480)
Removes empty intermediate directories when a key is removed from
local file system object storage, which matches cloud-based providers.
2025-06-03 09:31:40 +10:00
Carol (Nichols || Goulding) 4c62b5273c
refactor: Use iox_http_util to make updating to hyper 1 easier (#26436)
* refactor: Use iox_http_util::Request instead of hyper::Request

* refactor: Use iox_http_util::RequestBuilder instead of hyper::Request::builder

* refactor: Use iox_http_util::empty_request_body instead of Body::empty

* refactor: Use iox_http_util::bytes_to_request_body instead of Body::from

* refactor: Use http_body::Body instead of hyper::body::HttpBody

* refactor: Use iox_http_util::Response instead of hyper::Response

* refactor: Use iox_http_util::ResponseBuilder instead of hyper::Response::builder

* refactor: Use iox_http_util::empty_response_body instead of Body::empty

* refactor: Use iox_http_util::bytes_to_response_body instead of Body::from

* refactor: Use iox_http_util::stream_results_to_response_body instead of Body::wrap_stream

* refactor: Use the read_body_bytes_for_tests helper fn
2025-06-02 14:57:55 -04:00
praveen-influx 1c8b428917
feat: separate out write path executor with unbounded memory (#26455)
Currently when there is an OOM while snapshotting, the process keeps
going without crashing. This behaviour is observed in main (commit:
be25c6f52b). This means the wal files keep
increasing to a point that restarts never can replay all the files.

This is happening because of the distribution of memory, in enterprise
especially there is no need for an ingester to be allocated just 20% for
datafusion memory pool (which runs the snapshot) as parquet cache is not
in use at all. This 20% is too conservative for an ingester, so instead
of redistributing the memory settings based on the mode it's running,
a separate write path executor is introduced in this commit with no
bound on memory (still uses `GreedyMemoryPool` under the hoold with
`usize::MAX` as upper limit). This means write path executor will always
run into OOM and stop the whole process.

Also, it is important to let snapshotting process use as much memory
as it needs as without that, the buffer will keep getting bigger and run
into OOM anyway.

closes: https://github.com/influxdata/influxdb/issues/26422
2025-06-02 17:54:39 +01:00
Trevor Hilton be25c6f52b
test: deduplication across memory and parquet chunks (#26477) 2025-05-29 16:27:32 -04:00
Trevor Hilton 756a50daa6
docs: minor release docs (#26476) 2025-05-29 12:16:23 -04:00
Trevor Hilton 561824ec47
chore: update install script to point to 3.1.0 (#26473) 2025-05-29 10:37:39 -04:00
Trevor Hilton 5bf3a1aef8
test: add integration tests to influxdb3_server (#26474) 2025-05-28 21:39:40 -04:00
Trevor Hilton 6289a258cb
chore: update to version 3.2.0-nightly on main (#26472) 2025-05-28 20:17:37 -04:00
wayne 147e272172
fix: use associated constant to ensure correct file ID is always used for serialization (#873) (#26469)
* fix: always serialize catalog to latest version
* refactor: use associated constant fixed size array of bytes
2025-05-28 08:01:25 -06:00
Adrian Thurston cbf5e3a806
fix: docker: fetch python deps ahead of copying in full source tree (#26462)
The current build order:

 1. Copy in full source tree
 2. Fetch python build dependencies
 3. Build

The issue with this is any source tree change causes the python dependencies to
be re-fetched, then the rust components of those deps rebuilt.

Instead, copy in just the .circleci directory, which informs the python build
dependency fetch. After we have the python deps, then copy in the full source
tree and build.
2025-05-27 11:26:29 -07:00
Trevor Hilton d955574593
chore: add workspace lints to influxd3_authz crate (#26467) 2025-05-27 14:07:56 -04:00
Trevor Hilton d1bf17017d
refactor: move jemalloc to `influxdb3 serve` and address workspace lints (#26466)
* refactor: move jemalloc code to influxdb3 serve

* refactor: enable workspace lints in influxdb3

* feat: add FromStr impl to catalog identifier types
2025-05-27 12:51:23 -04:00
praveen-influx 881aca4fb7
feat: port v3 catalog along with conversions (#26464)
This commit brings in v3 catalog changes that's been made in pro. The
pro PR is https://github.com/influxdata/influxdb_pro/pull/866. There is
no real change to catalog that affects core but it is important that
core and enterprise catalogs versions are in sync so that it's easier to
track changes to catalog across core and enterprise.

Includes:
- catalog log file conversions and v3 log file
- catalog snapshot file conversions and v3 snapshot file
- update deserialization function to return v3 version of log/snapshot
  file after applying conversion
- `CatalogSnapshot` and all the types that it depends on that implement
  `Snapshot` trait are moved to `mod.rs` as only recent version of
  snapshot should implement that trait
2025-05-27 15:35:27 +01:00
Stuart Carnie 959555ab22
fix: Distinct Value Cache handles `NULL` values (#26457)
Closes #26451
2025-05-27 09:24:17 +10:00
Trevor Hilton 4dc61df77f
chore: update to latest influxdb3_core (#26429)
* chore: update to latest core

* chore: allow CDLA permissive 2 license

* chore: update insta snapshot for new internal df tables

* test: update assertion in flightsql test

* fix: object store size hinting workaround in clap_blocks

Applied a workaround from upstream to strip size hinting from the object
store get request options. See:

https://github.com/influxdata/influxdb_iox/issues/13771

* fix: query_executor tests use object store size hinting workaround

* fix: insta snapshot test for show system summary command

* chore: update windows- crates for advisories

* chore: update to latest sha on influxdb3_core branch

* chore: update to latest influxdb3_core rev

* refactor: pr feedback

* refactor: do not use object store size hint layer

Instead of using the ObjectStoreStripSizeHint layer, just provide the
configuration to datafusion to disable the use of size hinting from
iox_query.

This is used in IOx and not relevant to Monolith.

* fix: use parquet cache for get_opts requests

* test: that the parquet cache is being hit from write buffer
2025-05-26 14:11:06 -04:00
Trevor Hilton 9ed1af7d7a
fix: use correct ENV vars in Dockerfile (#26461)
The `INFLUXDB3_OBJECT_STORE` and `INFLUXDB3_DB_DIR` env vars were not
being used in the Dockerfile.
2025-05-26 14:09:59 -04:00
Stuart Carnie c5ed113c5b
chore: Update rust toolchain to 1.87.0 (#26456)
Changes were due to a number of clippy improvements
2025-05-26 09:22:32 -04:00
Stuart Carnie 1abbb525db
fix: Ensure series key metadata is persisted to Parquet snapshots (#26449)
* chore: Ensure Parquet sort key is serialised with snapshots

* chore: PR feedback, rename state variable to match intent

* chore: Use `Default` trait to implement `TableBuffer::new`

* chore: Fix change in file size with extra metadata

* chore: Add rustdoc for `sort_key` field
2025-05-26 09:27:07 +10:00
Trevor Hilton 760c89873d
fix: add key variant back to wal to fix bitcode deserialization (#26453) 2025-05-23 13:04:18 -04:00
Trevor Hilton 6e9446a8bb
test: reproduce problem with NULL backfill of omitted tag cols (#26448)
* test: reproduce problem with NULL backfill of omitted tag cols

* fix: do not fill tag columns with empty string on persist

* chore: clippy
2025-05-22 12:31:53 -04:00
Trevor Hilton 56df158afd
docs: add rust docs to catalog types (#26447) 2025-05-22 11:18:42 -04:00
Trevor Hilton d1c10f4b29
fix: backfill new tags with NULL instead of empty string (#26446)
* fix: backfill new tags with NULL instead of empty string

* refactor: use helper for append_null

* test: add a test to check null back/forward fill
2025-05-21 17:23:51 -04:00
Trevor Hilton 4a917c5a9f
refactor: remove variants for unused series key type (#26443)
* refactor: remove unused Key type from write buffer

The write buffer had a Key variant for handling the experimental v3
write API that was phased out and removed from an earlier iteration
of influxdb3.

* refactor: remove key column type from last cache
2025-05-21 15:45:23 -04:00
praveen-influx 1ec063b0c4
feat: support named admin tokens (#26434)
* feat: support named admin tokens

- `--name` and `--expiry` are now allowed for `--admin` subcommand
- `--regenerate` is restricted to operator token only
- the endpoint is not allowed if auth is disabled

closes: https://github.com/influxdata/influxdb_pro/issues/854

This is a port of https://github.com/influxdata/influxdb_pro/pull/850 (hash:156981e4a1)

* refactor: address PR feedback
2025-05-20 15:30:19 +01:00
Stuart Carnie bf83e7fbb3
feat: `/ping` API contains versioning headers (#26433)
* feat: `/ping` API contains versioning headers

Further, the product version can be modified by updating the metadata in
the `influxdb3_process` `Cargo.toml`.

* chore: PR feedback

* chore: placate linter
2025-05-20 08:50:27 +10:00
peterbarnett03 b615e5c370
chore: update install_influxdb.sh (#26428) 2025-05-19 11:27:34 -04:00
Stuart Carnie a967e23171
chore: Export APIs necessary for future work in Pro (#26424) 2025-05-19 09:27:44 +10:00
Trevor Hilton 4822886495
docs: update release.md with additional details (#26427) 2025-05-16 13:38:04 -04:00
praveen-influx 1f076b69c8
feat: add trigger count to telemetry (#26426)
* feat: add trigger count to telemetry

closes: https://github.com/influxdata/influxdb/issues/26285

* refactor: do trigger counts by type
2025-05-16 17:18:26 +01:00
praveen-influx b404e8475c
fix: do not allow operator token from being deleted (#26418)
* fix: do not allow operator token from being deleted

closes: https://github.com/influxdata/influxdb_pro/issues/819

* refactor: address PR feedback

* fix: add a word and clarifying colon

* fix: failing test

---------

Co-authored-by: Peter Barnett <peter.barnett03@gmail.com>
2025-05-15 09:10:37 +01:00
Trevor Hilton 2a94f4232b
feat: query duration metrics in lvc (#26388) 2025-05-14 10:25:13 -04:00
praveen-influx 8aab3cc607
feat: allow health,ping,metrics to opt out of auth (#26406)
* feat: allow health,ping,metrics to opt out of auth

This commit introduces `--disable-authz <DISABLE_AUTHZ_RESOURCES>`. The
options for `DISABLE_AUTHZ_RESOURCES` are health, ping and metrics. By
default all these resources will be guarded

closes: https://github.com/influxdata/influxdb_pro/issues/774

* chore: update influxdb3/src/commands/helpers.rs

space after comma in help text

Co-authored-by: Trevor Hilton <thilton@influxdata.com>

* chore: update influxdb3/src/help/serve.txt

space after comma in help text

Co-authored-by: Trevor Hilton <thilton@influxdata.com>

* chore: update influxdb3/src/help/serve_all.txt

space after comma in help text

Co-authored-by: Trevor Hilton <thilton@influxdata.com>

* refactor: use statics to reduce clones/copies

---------

Co-authored-by: Trevor Hilton <thilton@influxdata.com>
2025-05-13 15:47:53 +01:00
Trevor Hilton ed80c852c2
refactor: use truncate in lvc to ensure elements removed (#26401)
This removes the `pop_back` methods from the lvc and uses truncate instead
so that it ensures that the cache is at its desired size.

Adjusted logic in the eviction of LVC to try to be a bit more efficient
2025-05-12 15:14:10 -04:00
Stuart Carnie 510325fff1
chore: Increase the visibility of some APIs to the crate (#26385)
These will be used in Enterprise.
2025-05-12 08:52:22 +10:00
praveen-influx 8a3d98a273
feat: support `Basic $TOKEN` for all apis (#26363)
* feat: support `Basic $TOKEN` for all apis

closes: https://github.com/influxdata/influxdb/issues/25833

* refactor: address PR feedback to return MalformedRequest error when `:` is used more than once in user-pass pair

* refactor: change the message sent back for malformed auth header
2025-05-09 18:11:37 +01:00