In theory on a multi-threaded tokio executor, the following could have
happened:
| Thread 1 | Thread 2 |
| --------------------- | ----------------------------------- |
| | Running query begin |
| | ... |
| | `loader.await` finished |
| `Cache::set` begin | |
| state locked | |
| | try state lock, blocking |
| running query removed | |
| ... | |
| state unlocked | |
| `Cache::set` end | |
| | state locked |
| | panic because running query is gone |
Another issue that could happen is if we:
1. issue a get request, loader takes a while, this results in task1
2. side-load data into the running query (task1 still running)
3. the underlying cache backend drops the result very quickly (task1
still running)
4. we request the same data again, resulting in yet another query task
(task2), task1 is still running at this point
In this case the original not-yet-finalized query task (task1) would
remove the new query task (task2) from the active query set, even
though task2 is actually not done.
We fix this by the following measures:
- **task tagging:** tasks are tagged so if two tasks for the same key
are running, we can tell them apart
- **task->backend propagation:** let the query task only write to the
underlying backend if it is actually sure that it is running
- **prefer side-loaded results:** restructure the query task to strongly
prefer side-loaded data over whatever comes from the loader
- **async `Cache::set`:** Let `Cache::set` wait until a running query
task completes. This has NO correctness implications, it's probably
just nicer for resource management.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Changes all consumers of the object store to use the dynamically
dispatched DynObjectStore type, instead of using a hardcoded concrete
implementation type.
* feat: `Cache::set`
This will be helpful to fill caches if we got the information from
somewhere else.
For #3985.
* docs: improve
Co-authored-by: Edd Robinson <me@edd.io>
* docs: explain lock gap
* feat: add debug log to `Cache`
Co-authored-by: Edd Robinson <me@edd.io>
* feat: `CacheBackend::as_any`
* refactor: add TTL cache backend
This is based on the new `AddressableHeap`, which simplifies the
implementation quite a lot.
For #3985.
* refactor: `TtlBackend::{update->evict_expired}`
* docs: exlain ttl cache eviction
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: add addressable heap for query cache
This will be used as a helper data structure for TTL and LRU. It's
probably not the most performant implementation but it's good enough for
now.
This is for #3985.
* fix: test + explain tie breaking in `AddressableHeap`
* feat: extract "backend" from querier cache
The backend will implement pruning policies like LRU and TTL as well as
where/how the data is stored. Having a proper interface for that
simplifies the implementation since we don't need to have one massive
`Cache` object with a super complex mechanism.
This is for #3985.
* refactor: `Backend` -> `CacheBackend`
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: querier test system, ground work
See #3985 for the motivation.
This introduces a cache system for the querier which can later be
extended to support the remaining features listed in #3985 (e.g.
metrics, LRU/TTL).
All current caches are wired up to go throw the new cache system. Once
we move away from (ab)using `db`, the set of caches will be different
but the system will remain.
* test: explain it
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
* refactor: simplify cache result broadcast
* refactor: introduce `Loader` crate
* fix: docs
* docs: explain why we manually drop removed hashmap entries
* docs: fix intra-doc link
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
- This is not used by the query engine at all.
- The query engine should not care about ALL chunks but only about the
chunks it gets via `QueryDatabase::chunks` (which includes a table
name and a predicate).
- All other users of that API are NOT really query-related.
- This was not actually used by the query engine.
- The query engine doesn't have a concept of a "partition", it only
cares about chunks.
- Unbound access to all partitions in the database is quite expensive
(esp. on NG).
* refactor: wire exectution context to Deduplicator
* feat: example trace to chunk read_filter
* refactor: make execution context required
* refactor: expose metadata API
* refactor: more span context for chunk read_filter
* refactor: fix build
* refactor: push context into result stream
* refactor: make executor optional
* feat: add `success` column to system.queries
* refactor: Remove lifetime from QueryCompletedToken and thread through flight
* test: update test to make incomplete query clearer
* refactor: use better patter to set complete
* fix: logical merge conflict
Before adding more and more features, here is a bit of a clean up and
prep work:
- Pull out caching into its own module and add proper tests for it.
- Start to build a test infrastructure so tests are shorter and easier
to read. This doesn't fully pay off just yet but gets more and more
important when we actually sync tables and chunks.
* feat: skeleton of querier CLI
* chore: wrap metrics in opt&arc in querier to satisfy new api
* chore: derive debug in querier handler
* chore: add join handles and their shutdown to nascent querier server
* chore: querier server http unimpl -> 404
* fix: join/shutdown fix in querier; removed unused delegates
* feat: Add a way to run ingester with an in-memory catalog from the CLI
If you set the --catalog-dsn string to "mem", rather than using that as
a Postgres connection URL, create an in-memory catalog.
Planning on using this in tests, so not documenting.
* fix: Set default topic to the same value as SHARED_KAFKA_TOPIC
Namely, both should use an underscore. I don't think there's a way to
directly share these values between a constant and an annotation.
* feat: Add a flight API (handshake only) to ingester
* fix: Create partitions if using file-based write buffer
* fix: Change the server fixture to handle ingester server type
For now, the ingester doesn't implement the deployment API. Not sure if
it should or not.
* feat: Start implementing ingester do_get, namely decoding the query
Skip serialization of the predicate for the moment.
* refactor: Rename ingest protos to ingester to match crate name
* refactor: Rename QueryResults to QueryData
* feat: Move ingester flight client to new querier crate
* fix: Off by one error, different starting indexes in sequencers
* fix: Create new CLI argument to pick the catalog type
* fix: Create a CLI option to set the number of topics to auto-create in the write buffer
* fix: Check the arrow flight service's health to tell that the ingester gRPC is up
* fix: Set postgres as the default catalog type
* fix: Return an error rather than panicking if CLI args aren't right