Commit Graph

204 Commits (main)

Author SHA1 Message Date
Trevor Hilton 5d7cb88f87
feat: track catalog retries as prometheus metric (#26251)
Adds a metric to track total retried catalog operations due to the catalog
being updated elsewhere. Includes a test to check the counter increments
on basic catalog operations.
2025-04-11 15:24:10 -04:00
praveen-influx 1983818e36
feat: porting token work from enterprise (#26239)
* feat: generate persistable admin token

- this commit allows admin token creation using `influxdb3 create token
  --admin` and also allows regeneration of admin token by `influxdb3
  create token --admin --regenerate`
- `influxdb3_authz` crate hosts all low level token types and behaviour
- catalog log and snapshot types updated to use the token repo
- tests that relied on auth have been updated to use the new token
  generation mechanism and new admin token generation/regeneration tests
  have been added

* feat: list admin tokens

- allows listing admin tokens
- uses _internal db for token system table
- mostly test fixes due to _internal db
2025-04-09 16:31:59 +01:00
Trevor Hilton 7d95c48ecf
fix: empty writes do not corrupt wal (#26223)
* test: reproducer for empty write issue

* fix: empty writes do not corrupt wal
2025-04-05 16:44:34 -04:00
praveen-influx e653f736d9
chore: couple of updates to fix cargo audit job (#26209)
* chore: couple of updates to fix cargo audit job

- remove humantime ignore in deny.toml
- update pyo3 to use 0.24.1 (https://rustsec.org/advisories/RUSTSEC-2025-0020.html)

* chore: moved pyo3 version to root cargo.toml
2025-04-01 15:36:01 +01:00
Trevor Hilton 9401137825
feat: handle graceful shutdown (#26197)
* feat: add influxdb3_shutdown crate

provides basic wait methods for unix/windows OS's

* feat: graceful shutdown

* docs: add rust docs and test to influxdb3_shutdown

Added rustdoc comments to types and methods in the influxdb3_shutdown
crate as well as a test that shows the ordering of a shutdown.
2025-03-31 09:58:40 -04:00
Michael Gattozzi b792f70960
fix: maybe fix flakiness for parquet cache test (#26159)
This adds a sleep so that the parquet cache has a little bit of time to
populate before we make another request to the query buffer. Sometimes
it does not populate and so we have a race condition where the new
request comes in and actually goes to object store. This is fine in
practice because it would also take time to fill the cache in production
as well. I haven't really seen the test fail since adding this, but
triggering it in the first place is really hard and in practice does not
happen all that often.
2025-03-18 11:31:30 -04:00
Trevor Hilton 863a6d0b4a
feat: ack catalog update broadcast (#26118)
This creates a CatalogUpdateMessage type that is used to send
CatalogUpdates; this type performs the send on the oneshot Sender so
that the consumer of the message does not need to do so.

Subscribers to the catalog get a CatalogSubscription, which uses the
CatalogUpdateMessage type to ACK the message broadcast from the catalog.

This means that catalog message broadcast can fail, but this commit does
not provide any means of rolling back a catalog update.

A test was added to check that it works.
2025-03-17 20:20:07 -04:00
Jamie Strandboge ba7e6c6986
feat(python): update to python-build-standalone 3.13.2 (#26125)
* feat(python): update to python-build-standalone 3.13.2

References:
- https://github.com/influxdata/influxdb/issues/26044

* fix: update fetch-python-standalone.bash to properly set 'executable'

* fix: use PYO3_CONFIG_FILE to find PYTHONHOME.

* fix: add comment about PYO3_CONFIG_FILE.

* fix: remove ensure_pyo3().

* fix: add some sleep so catalog is updated.

---------

Co-authored-by: Jackson Newhouse <jnewhouse@influxdata.com>
2025-03-14 14:44:05 -05:00
Trevor Hilton 0c8d17fb89
refactor: use repositories in catalog (#26135)
* refactor: use repository in catalog

The catalog was refactored to use identifiers on everything, and store
everything in a consistent structure. This structure makes use of the
`Repository` type that holds a `SerdeVecMap` of Id to Resource, along
with the next Id, and a bi-map of Id to resource name.

The `Repository` type is used at each level of the catalog where a
resource is stored.

This simplified repeated logic for snapshot'ing, insert and update of
resources in the catalog, as well as accessor methods for getting by id
or name, and mapping names to ids and vice-versa.

In addition, the process for catalog batch verification and permit was
altered so that the permit process induces a retry if the catalog was
updated while the catalog batch function was producing the batch, i.e, if
the catalog sequence incremented while the caller was waiting for a permit.
This eliminated the need for verifying the catalog batch after it had been
generated, and allows for a single path to apply a catalog batch after it
has been persisted to object store.

This assumes that the generation of the catalog batch implies validity.

Irelevant tests were removed.

Last and Distinct cache's now rely more heavily on Ids, though the proc-
essing engine still needs to switch over to use Ids for starting/stopping
triggers.
2025-03-13 22:42:18 -04:00
Michael Gattozzi 00aa90fe6f
feat: Add PeristedSnapshotVersion for snapshots (#26117)
Continuing our work of creating versioned files before Beta, this commit
adds a PersistedSnapshotVersion which is used at the boundary of
serializing and deserializing so that we can easily upgrade to a newer
version and handle old versions without breaking things for users.
2025-03-12 13:21:54 -04:00
Trevor Hilton b6cb6dd51e
chore: back-port catalog debug log cleanup from enterprise (#26128)
* chore: back-port debug log cleanup for catalog

* chore: back-port debug log cleanup for wal

* chore: back-port debug log cleanup for write
2025-03-12 13:20:21 -04:00
Trevor Hilton 503819468e
feat: catalog checkpoints (#26126) 2025-03-11 18:20:36 -04:00
Trevor Hilton 72dc4458fd
chore: backport changes to catalog from enterprise (#26116)
* chore: backport changes to influxdb3_catalog crate

* chore: backport changes to influxdb3_cache crate

* chore: backport changes to influxdb3_write crate

* chore: backport changes to influxdb3_proc_eng crate

* chore: backport influxdb3 crate changes for catalog

* chore: backport changes to influxdb3_id crate

* chore: backport changes to influxdb3_wal crate

* chore: backport changes to influxdb3_clap_blocks crate

* chore: backport changes to influxdb3_client crate

* chore: backport influxdb3_server crate changes

* chore: fix after full backport

* fix: ordering of catalog broadcast
2025-03-11 12:11:51 -04:00
Jackson Newhouse 5fa417c3f0
feat: remove system-py (#26087)
* feat: remove system-py

* chore: allow Apache-2.0 WITH LLVM-exception license.
2025-03-10 11:10:33 -07:00
Michael Gattozzi 329ef2f11b
feat: allow new tags in schema again (#26108)
This commit restores the old behavior we had where new tags can be added
to a schema. To do this we made tags nullable and brings us in line with
our other products. These changes were made in this PR:

https://github.com/influxdata/influxdb3_core/pull/41.

Changes to accomplish this new behavior were:

- Queries now do not return an empty string for null tags instead they
  are returned as null, or in many formats not at all.
- References to v1 for parsing and validating lines were removed as we
  only have one path for doing so these days shared amongst all the
  write_lp endpoints.
- We fixed failing tests that expected us to not be able to have new
  tags or depended on that functionality indirectly
- Tests had their snapshot files updated to reflect that tags are
  nullable by default
- Behavior for making a schema and checking whether a column can be null
  were updated in a separate repo and integrated here
- The series_key is updated whenever we get a new tag added to the
  schema
- New tests were added to show that you can add a new tag and that the
  series key is updated as part of that

With the above changes we can now allow tags to be added again by users
like they would expect, especially with v1 and v2 apis and Telegraf
plugins.
2025-03-06 13:59:15 -05:00
praveen-influx c724e06e3f
feat: query path instrumentation (#26106)
- spans added for buffer, parquet chunks along with number of files that
  are already in parquet cache along with the sql
2025-03-06 17:24:34 +00:00
Michael Gattozzi 1f72bfcc33
feat: Update to Rust 1.85 and 2024 Edition (#26046) 2025-02-20 14:58:07 -05:00
Trevor Hilton 9646691d96
fix: serialize distinct cache in catalog (#25990)
The distinct cache info for tables was not serialized in the catalog.
This fixes it, but also updates the catalog serialization to use the
snapshot type serialization from the Catalog type all the way down.

The Eq and PartialEq impls were removed from Catalog and InnerCatalog
as they were only used in tests, and wer replaced by pure insta snapshot
tests.

A test was added to check that the distinct cache serializes/deserializes
2025-02-11 11:04:31 -05:00
praveen-influx 5b2354c7ab
feat: port changes back to core from enterprise (#25975)
Includes 2 main changes
- update the function signature for `cache_parquet_files`
- bring in `Evict` variant for parquet `CacheRequest`
2025-02-05 22:22:04 +00:00
wayne 27653f5a76
fix: enable workspace lints on all crates, fix all lints (#25961) 2025-02-03 17:38:20 -07:00
wayne 0fffcc8c37
refactor: introduce influxdb3_types crate (#25946)
Partially fixes https://github.com/influxdata/influxdb/issues/24672

* move most HTTP req/resp types into `influxdb3_types` crate
* removes the use of locally-scoped request type structs from the `influxdb3_client` crate
* fix plugin dependency/package install bug
  * it looks like the `DELETE` http method was being used where `POST` was expected for `/api/v3/configure/plugin_environment/install_packages` and `/api/v3/configure/plugin_environment/install_requirements`
2025-02-03 11:28:47 -07:00
praveen-influx 911ba92ab4
feat: clear query buffer incrementally when snapshotting (#25948)
* feat: clear query buffer incrementally when snapshotting

This commit clears the query buffer incrementally as soon as a table's
data in buffer is written into parquet file and cached. Previously,
clearing the buffer happened at the end in the background

* refactor: only clear buffer after adding to persisted files

* refactor: rename function
2025-02-02 16:51:53 +00:00
Trevor Hilton 23b77946f4
refactor: remove buffer index and literal guarantee analysus in filter (#25949)
This removes the buffer index from the write buffer in core and lifts the
literal guarantee analysis from the ChunkFilter.
2025-01-31 14:43:10 -05:00
praveen-influx 56ca85ef8e
feat: introduce parquet caching in query path (#25937)
* feat: introduce parquet caching in query path

This commit scans the parquet files that will be used in query to check
if they can be cached. There are three conditions to satisfy,
  - not cached already
  - cache has enough space
  - file times overlap with the cache policy times

closes: https://github.com/influxdata/influxdb/issues/25906

* refactor: rename env var
2025-01-30 21:16:37 +00:00
wayne 05da40fa9b
fix: clarify table creation conflict error message (#25936)
Also include a basic CLI integration test to exemplify the new error message.
2025-01-30 13:08:35 -07:00
Jackson Newhouse 8840d99e9d
feat(processing_engine): integration with virtual environments. (#25895)
* feat(processing_engine): integration with virtual environments.

* feat: Initial scaffolding for environment managers (pip, pipx, uv).

* feat(processing_engine): CLI for package management, remove pipx support.

* feat(processing_engine): test installations in virtualenvs.

* feat(processing_engine): Automatically setup virtual environment on startup.
2025-01-28 15:30:17 -08:00
Michael Gattozzi b9a8adbe98
feat: persist snapshots in parallel (#25901)
This speeds up snapshot persistence by taking all of the persist jobs
and running them simultaneously on a JoinSet. With this we can speed
things up a bit by not waiting for each file to persist before the next
one can be persisted. Instead we now can run all the persisting at the
same time using the tokio runtime.

Closes #24658
2025-01-27 11:44:23 -05:00
Paul Dix d49276a7fb
feat: Refactor plugins to only require creating trigger (#25914)
This refactors plugins and triggers so that plugins no longer need to be "created". Since plugins exist in either the configured local directory or on the Github repo, a user now only needs to create a trigger and reference the plugin filename.

Closes #25876
2025-01-27 11:26:46 -05:00
Michael Gattozzi 43e186d761
feat: add no_sync write_lp param for fast writes (#25902) 2025-01-24 13:34:38 -05:00
praveen-influx 4ef972eab4
feat: first stab at locally updating parquet cache (#25904)
* feat: first stab at locally updating parquet cache

closes: https://github.com/influxdata/influxdb/issues/25887

* refactor: use enums to separate out the modes

This commit introduced the `Immediate` and `Eventual` modes for
fulfilling the cache request. In immediate mode since the data is
readily available to be cached, we can avoid extra requests to object
store.

part of: https://github.com/influxdata/influxdb/issues/25887
2025-01-24 14:36:06 +00:00
Trevor Hilton 07bd04b423
fix: add alias for node_id to writer_id for backward compatibility (#25910) 2025-01-23 22:10:16 -05:00
Trevor Hilton d451ef0de6
refactor: writer-id to node-id (#25905) 2025-01-23 18:09:24 -05:00
Michael Gattozzi 63bd5096f5
feat: loosen 72 hour query/write restriction (#25890)
This commit does a few key things:

- Removes the 72 hour query and write restrictions in Core
- Limits the queries to a default number of parquet files. We chose 432
  as this is about 72 hours using default settings for the gen1
  timeblock
- The file limit can be increased, but the help text and error message
  when exceeded note that query performance will likely be degraded as
  a result.
- We warn users to use smaller time ranges if possible if they hit this
  query error

With this we eliminate the hard restriction we have in place, but
instead create a soft one that users can choose to take the performance
hit with. If they can't take that hit then it's recomended that they
upgrade to Enterprise which has the compactor built in to make
performant historical queries.
2025-01-23 10:02:26 -05:00
Trevor Hilton 44ca7a4d36
refactor: reduce catalog locks when getting chunks (#25896)
* refactor: reduce catalog locks when getting chunks

The main refactor was to change the ChunkContainer trait to use the
DatabaseSchema and TableDefinition types directly in the signature, vs.
the names, which then required an additional catalog lock and lookups for
both entities. This was already handled upstream in the QueryTable, so
there was no need to do the lookups again.

This required the addition of a test helper in influxdb3_write::test_helpers
that provides convenience methods for getting record batches from the
WriteBuffer. We have been implementing such a method manually in several
places, so this is nice to have it unified. This provides a blanket impl
so that anything implementing WriteBuffer gets the method.

Some other house cleaning was included.

* refactor: clean up test helpers in influxdb3_write

* refactor: pass original df filters forward with ChunkFilter

* chore: clippy
2025-01-22 14:38:46 -05:00
Trevor Hilton d1fd155b21
feat: use u64 hash in buffer index instead of str literal (#25883)
* feat: use u64 hash in buffer index instead of str literal

* refactor: move hash of column after if branch and add docs
2025-01-21 09:09:25 -05:00
Trevor Hilton 7eb99569b5
chore: fix main (#25882) 2025-01-20 20:55:24 -05:00
Trevor Hilton b9a79277ef
feat: expr analyzer for buffer to filter table chunks (#25866)
Related to https://github.com/influxdata/influxdb_pro/issues/436

This PR updates the filter handling in the `WriteBuffer` so that sets of `Expr`s provided in a query will better prune both chunks from the in-memory buffer, as well as the set of parquet file chunks that are forwarded to DataFusion, for query execution.

### New `BufferFilter` type

This introduces the [`BufferFilter`](bab428f0eb/influxdb3_write/src/lib.rs (L496)) type. This converts a set of `Expr`s from a logical query plan into a filter that can be used to:
* prune chunks based on a provided lower/upper `time` boundary from both the buffer and parquet
* prune chunks from the buffer based on any literal guarantees predicated on tag columns in the query, e.g., `WHERE tag = 'a'` or `WHERE tag IN ['a', 'b']`

This type is exposed such that it will be easy to use from replicated buffers and from the compactor when producing `Arc<dyn QueryChunk>`s in Enterprise.

### Tests

* Tests in the [`table_buffer`](bab428f0eb/influxdb3_write/src/write_buffer/table_buffer.rs) module were updated to use the `WriteValidator`. This allows construction of rows based on line protocol directly, and in cleaning up the tests a bit, allowed me to extend some of the test cases in [this test](bab428f0eb/influxdb3_write/src/write_buffer/table_buffer.rs (L979)).
* I added [a test](bab428f0eb/influxdb3_write/src/write_buffer/table_buffer.rs (L1243)) that checks the buffer chunk index filtering for expressions against multiple tag columns. 
* Added [a test](bab428f0eb/influxdb3_write/src/write_buffer/table_buffer.rs (L1153)) that checks time pruning
* Added [a test](bab428f0eb/influxdb3_write/src/write_buffer/persisted_files.rs (L279)) that checks time pruning in `PersistedFiles`
* I renamed several tests to start with `test_`.
2025-01-20 20:20:03 -05:00
praveen-influx d3ad071e5a
chore: add out of order tests (#25869)
* chore: add out of order tests

- assertions for what remains in the queryable buffer when out of order
  timestamps are encountered. This could be true for back filling, and
  in that case back filled data takes over the queryable buffer and
  moving all the recent data into parquet files (as part of snapshotting)
- assertions to check last cache still retains the most recent values
  when out of order data is encountered

* chore: update comment

Co-authored-by: Trevor Hilton <thilton@influxdata.com>

---------

Co-authored-by: Trevor Hilton <thilton@influxdata.com>
2025-01-20 16:26:25 +00:00
praveen-influx 4eccc38129
fix: reproducer for the empty snapshot file issue (#25835)
* fix: reproducer for the empty snapshot file issue

* fix: avoid creating empty (0 dbs) snapshot file
2025-01-15 20:01:57 +00:00
Michael Gattozzi aa8a8c560d
feat: Set 72 hour query/write limit for Core (#25810)
This commit sets InfluxDB 3 Core to have a 72 hour limit for queries and
writes. What this means is that writes that contain historical data
older than 72 hours will be rejected and queries will filter out data
older than 72 hours. Core is intended to be a recent timeseries database
and performance over data older than 72 hours will degrade without a
garbage collector, a core feature of InfluxDB 3 Enterprise. InfluxDB 3
Enterprise does not have this write or query limit in place.

Note that this does *not* mean older data is deleted. Older data is
still accessible in object storage as Parquet files that can still be
used in other services and analyzed with dataframe libraries like pandas
and polars.

This commit does a few things:
- Uses timestamps in the year 2065 for tests as these should not break
  for longer than many of us will be working in our lifetimes. This is
  only needed for the integration tests as other tests use the
  MockProvider for time.
- Filters the buffer and persisted files to only show data newer than
  3 days ago
- Fixes the integration tests to work with the fact that writes older
  than 3 days are rejected
2025-01-12 13:08:01 -05:00
Trevor Hilton db24a62658
refactor: change host-id to writer-id (#25804)
This changes the CLI arg `host-id` to `writer-id` to more accurately
indicate meaning.

This changes also goes through the codebase and changes struct fields,
methods, and variables to use the term `writer_id` or `writer_identifier_prefix`
instead of `host_id` etc., to make the meaning clear in the code.

This also changes the catalog serialization to use the field `writer_id`
instead of `host_id`, which is breaking change.
2025-01-12 11:40:47 -05:00
praveen-influx 50963443a4
feat: introduce num wal files to keep (#25801)
* feat: introduce num wal files to keep

This commit allows a configurable number of wal files to be left behind
in OS. This is necessary as enterprise replicas rely on these files.

closes: https://github.com/influxdata/influxdb/issues/25788

* refactor: address PR feedback

* refactor: address PR comment
2025-01-12 00:33:13 +00:00
Trevor Hilton 0bdc2fa953
chore: patch enterprise back to core (#25798) 2025-01-11 17:26:41 -05:00
Trevor Hilton 1ff4f76896
feat: only load wal files after most recent snapshot (#25787) 2025-01-11 10:27:58 -05:00
Trevor Hilton c71dafc313
refactor: rename metadata cache to distinct value cache (#25775) 2025-01-10 08:48:51 -05:00
Paul Dix 7230148b58
feat: Update WAL plugin for new structure (#25777)
* feat: Update WAL plugin for new structure

This ended up being a very large change set. In order to get around circular dependencies, the processing engine had to be moved into its own crate, which I think is ultimately much cleaner.

Unfortunately, this required changing a ton of things. There's more testing and things to add on to this, but I think it's important to get this through and build on it.

Importantly, the processing engine no longer resides inside the write buffer. Instead, it is attached to the HTTP server. It is now able to take a query executor, write buffer, and WAL so that the full range of functionality of the server can be exposed to the plugin API.

There are a bunch of system-py feature flags littered everywhere, which I'm hoping we can remove soon.

* refactor: PR feedback
2025-01-10 05:52:33 -05:00
Paul Dix 2d18a61949
feat: Add query API to Python plugins (#25766)
This ended up being a couple things rolled into one. In order to add a query API to the Python plugin, I had to pull the QueryExecutor trait out of server into a place so that the python crate could use it.

This implements the query API, but also fixes up the WAL plugin test CLI a bit. I've added a test in the CLI section so that it shows end-to-end operation of the WAL plugin test API and exercise of the entire Plugin API.

Closes #25757
2025-01-09 20:13:20 -05:00
Trevor Hilton 63d3b867f1
chore: patch changes from enterprise (#25776)
- reduce parquet row group size to 100k
- add cli option to disable cached parquet loader
2025-01-09 16:02:12 -05:00
praveen-influx aa9213c4f4
feat: check mem and force snapshot (#25767)
This commit allows checking memory in the background and force
snapshotting if query buffer size is > mem threshold. This hooks into
the function (`force_flush_buffer`) to achieve it.

closes: https://github.com/influxdata/influxdb/issues/25685
2025-01-09 18:40:14 +00:00
praveen-influx 6e2e39cd4c
feat: snapshot when wal buffer is empty (#25765)
* feat: snapshot when wal buffer is empty

- This commit changes the functionality to allow snapshots to happen even when
  wal buffer is empty. For snapshots wal periods are still required but
  not the wal buffer. To allow this, we write a no-op into wal file with
  snapshot details. This enables force snapshotting functionality

closes: https://github.com/influxdata/influxdb/issues/25685

* refactor: address PR feedback
2025-01-09 12:12:37 +00:00