Commit Graph

15 Commits (mgattozzi/serde-catalog)

Author SHA1 Message Date
Trevor Hilton d265e111ce
feat: suport projection pushdown in metadata cache (#25675) 2024-12-17 20:13:25 -05:00
Trevor Hilton df84f9e68e
feat: support LIMIT in metadata cache (#25658) 2024-12-14 13:33:46 -08:00
Jackson Newhouse 486d79d801
feat(processing_engine): initial implementation of Processing Engine plugins and triggers (#25639) 2024-12-13 14:11:38 -08:00
Michael Gattozzi 9292a3213d
feat: Significantly decrease startup times for WAL (#25643)
* feat: add startup time to logging output

This change adds a startup time counter to the output when starting up
a server. The main purpose of this is to verify whether the impact of
changes actually speeds up the loading of the server.

* feat: Significantly decrease startup times for WAL

This commit does a few important things to speedup startup times:
1. We avoid changing an Arc<str> to a String with the series key as the
   From<String> impl will call with_column which will then turn it into
   an Arc<str> again. Instead we can just call `with_column` directly
   and pass in the iterator without also collecting into a Vec<String>
2. We switch to using bitcode as the serialization format for the WAL.
   This significantly reduces startup time as this format is faster to
   use instead of JSON, which was eating up massive amounts of time.
   Part of this change involves not using the tag feature of serde as
   it's currently not supported by bincode
3. We also parallelize reading and deserializing the WAL files before
   we then apply them in order. This reduces time waiting on IO and we
   eagerly evaluate each spawned task in order as much as possible.

This gives us about a 189% speedup over what we were doing before.

Closes #25534
2024-12-12 11:27:51 -05:00
Trevor Hilton 37219af9d4
feat: track parquet cache metrics (#25632)
* feat: parquet cache metrics

* feat: track parquet cache metrics

Adds metrics to track the following in the in-memory parquet cache:
* cache size in bytes (also included a fix in the calculation of that)
* cache size in n files
* cache hits
* cache misses
* cache misses while the oracle is fetching a file

A test was added to check this functionality

* refactor: clean up logic and fix cache removal tracking error

Some logic and naming was cleaned up and the boolean to optionally track
metrics on entry removal was removed, as it was incorrect in the first place:
a fetching entry still has a size, which counts toward the size of the
cache. So, this makes is such that anytime an entry is removed, whether
its state is success or fetching, its size will be decremented from
the cache size metrics.

The sizing caclulations were made to be correct, and the cache metrics
test was updated with more thurough assertions
2024-12-10 09:32:15 -05:00
Trevor Hilton 0bfef47ff9
refactor: move parquet cache to influxdb3_cache crate (#25630) 2024-12-09 11:56:52 -05:00
Trevor Hilton 154ff7da23
feat: LastCacheExec to track predicate pushdown in last cache queries (#25621) 2024-12-06 10:53:19 -08:00
Trevor Hilton 9b87cd7a65
refactor: move last cache to influxdb3_cache crate (#25620)
Moved all of the last cache implementation into the `influxdb3_cache`
crate. This also splits out the implementation into three modules:
- `cache.rs`: the core cache implementation
- `provider.rs`: the cache provider used by the database to hold multiple
  caches.
- `table_function.rs`: same as before, holds the DataFusion impls

Tests were preserved and moved to `mod.rs`, however, they were updated to
not rely on the WriteBuffer implementation, and instead use the types in
the `influxdb3_cache::last_cache` module directly. This simplified the
test code, while not changing any of the test assertions at all.
2024-12-05 14:04:25 -05:00
Trevor Hilton dbb1f55b5e
chore: update core for latest sync (#25617) 2024-12-04 14:11:13 -05:00
Trevor Hilton b7fd8e2386
feat: remove metadata caches on db and table delete (#25599) 2024-11-28 11:35:29 -05:00
Trevor Hilton 81715fbfea
refactor: display column names for predicates in EXPLAIN for metadata cache (#25598) 2024-11-28 11:18:12 -05:00
Trevor Hilton 9ead1dfe4b
feat: meta_caches system table (#25593)
This adds a new system table "meta_caches" that allows users to view the
state of their metadata caches on a per-db basis

An integration test was added to verify that it works.
2024-11-28 08:57:02 -05:00
Trevor Hilton 234d37329a
feat: metacache REST APIs to create and delete (#25587) 2024-11-27 08:41:46 -05:00
Trevor Hilton 8e23032ceb
feat: add metadata cache provider with APIs for write and query (#25566)
This adds the MetaDataCacheProvider for managing metadata caches in the
influxdb3 instance. This includes APIs to create caches through the WAL
as well as from a catalog on initialization, to write data into the
managed caches, and to query data out of them.

The query side is fairly involved, relying on Datafusion's TableFunctionImpl
and TableProvider traits to make querying the cache using a user-defined
table function (UDTF) possible.

The predicate code was modified to only support two kinds of predicates:
IN and NOT IN, which simplifies the code, and maps nicely with the DataFusion
LiteralGuarantee which we leverage to derive the predicates from the
incoming queries.

A custom ExecutionPlan implementation was added specifically for the
metadata cache that can report the predicates that are pushed down to
the cache during query planning/execution.

A big set of tests was added to to check that queries are working, and
that predicates are being pushed down properly.
2024-11-22 10:57:26 -05:00
Trevor Hilton 53f54a6845
feat: metadata cache core impl (#25552)
* feat: core metadata cache structs with basic tests

Implement the base MetaCache type that holds the hierarchical structure
of the metadata cache providing methods to create and push rows from the
WAL into the cache.

Added a prune method as well as a method for gathering record batches
from a meta cache. A test was added to check the latter for various
predicates and that the former works, though, pruning shows that we need
to modify how record batches are produced such that expired entries are
not emitted.

* refactor: filter expired entries and do some clean up in the meta cache
2024-11-18 12:28:12 -05:00