Commit Graph

439 Commits (0d111c4672102ceb373c19e77e2f9cc98103e12d)

Author SHA1 Message Date
Marco Neumann 97d595e4fb
feat: `Cache::set` (#4036)
* feat: `Cache::set`

This will be helpful to fill caches if we got the information from
somewhere else.

For #3985.

* docs: improve

Co-authored-by: Edd Robinson <me@edd.io>

* docs: explain lock gap

* feat: add debug log to `Cache`

Co-authored-by: Edd Robinson <me@edd.io>
2022-03-15 12:12:26 +00:00
Marco Neumann 4b5cf6a70e
feat: cache processed tombstones (#4030)
* test: allow to mock time in `iox_test`

* feat: cache processed tombstones

For #3974.

* refactor: introduce `TTL_NOT_PROCESSED`
2022-03-15 10:28:08 +00:00
Marco Neumann 87e53f30d1
refactor: add TTL cache backend (#4027)
* feat: `CacheBackend::as_any`

* refactor: add TTL cache backend

This is based on the new `AddressableHeap`, which simplifies the
implementation quite a lot.

For #3985.

* refactor: `TtlBackend::{update->evict_expired}`

* docs: exlain ttl cache eviction

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-14 15:35:27 +00:00
Marco Neumann 27efb66237
test: add proptest for `AddressableHeap` (#4025)
* test: add proptest for `AddressableHeap`

For #3985.

* refactor: simplify code

Co-authored-by: Edd Robinson <me@edd.io>

Co-authored-by: Edd Robinson <me@edd.io>
2022-03-14 12:38:30 +00:00
Marco Neumann 632c4953b4
feat: add addressable heap for query cache (#4016)
* feat: add addressable heap for query cache

This will be used as a helper data structure for TTL and LRU. It's
probably not the most performant implementation but it's good enough for
now.

This is for #3985.

* fix: test + explain tie breaking in `AddressableHeap`
2022-03-14 09:35:38 +00:00
Marco Neumann d46de98183
feat: extract "backend" from querier cache (#4015)
* feat: extract "backend" from querier cache

The backend will implement pruning policies like LRU and TTL as well as
where/how the data is stored. Having a proper interface for that
simplifies the implementation since we don't need to have one massive
`Cache` object with a super complex mechanism.

This is for #3985.

* refactor: `Backend` -> `CacheBackend`

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-11 11:49:57 +00:00
Nga Tran f03ebd79ab
refactor: move querier's test utils to a new crate to get reused by tests in other crates (#4013)
* refactor: move querier's test utils to a new crate to be able resued by tests in other crates

* chore: remove unused import
2022-03-10 18:17:58 +00:00
Paul Dix 27999ff72f
feat: add compaction_level and created_at to parquet_file (#3972) 2022-03-10 15:56:57 +00:00
Andrew Lamb 2c3d30ca32
chore: Update datafusion, arrow, flight and parquet (#4000)
* chore: Update datafusion, arrow, flight and parquet

* fix: api change

* fix: fmt

* fix: update test metadata size

* fix: Update sizes in parquet test

* fix: more metadata size update
2022-03-10 12:24:47 +00:00
Marco Neumann 30d1c77d36
feat: querier test system, ground work (#3991)
* feat: querier test system, ground work

See #3985 for the motivation.

This introduces a cache system for the querier which can later be
extended to support the remaining features listed in #3985 (e.g.
metrics, LRU/TTL).

All current caches are wired up to go throw the new cache system. Once
we move away from (ab)using `db`, the set of caches will be different
but the system will remain.

* test: explain it

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>

* refactor: simplify cache result broadcast

* refactor: introduce `Loader` crate

* fix: docs

* docs: explain why we manually drop removed hashmap entries

* docs: fix intra-doc link

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
2022-03-10 11:27:24 +00:00
Nga Tran c6cab3538f
refactor: move parquet chunk's new and decode to parquet_file crate (#3987) 2022-03-08 22:04:32 +00:00
Marco Neumann 77f6153f72
refactor: remove `QueryDatabase::chunk_summaries` (#3977)
- This is not used by the query engine at all.
- The query engine should not care about ALL chunks but only about the
  chunks it gets via `QueryDatabase::chunks` (which includes a table
  name and a predicate).
- All other users of that API are NOT really query-related.
2022-03-08 11:34:26 +00:00
Marco Neumann 5cc1c697fc
refactor: remove `QueryDatabase::partition_addr` (#3976)
- This was not actually used by the query engine.
- The query engine doesn't have a concept of a "partition", it only
  cares about chunks.
- Unbound access to all partitions in the database is quite expensive
  (esp. on NG).
2022-03-08 11:17:31 +00:00
Marco Neumann a3e952847a
refactor: split `querier::namespace` into submodules (#3975)
This makes it easier to see what's required to support the query
interface.
2022-03-08 10:49:00 +00:00
Marco Neumann db3f1e8db7
feat: wire up tombstones into querier (#3962)
* feat: `TombstoneRepo::list_by_namespace`

* test: model sequencer properly

* feat: wire up tombstones into querier

Closes #3932.

* refactor: `override_delete_predicates` => `set_delete_predicates`
2022-03-08 10:06:22 +00:00
Carol (Nichols || Goulding) 9961efd702
feat: Send parquet and tombstone seq nums with ingester query response (#3925)
Fixes #3867.
2022-03-04 15:22:29 +00:00
Marco Neumann 8d00aaba90
feat: sync chunks in querier (#3911)
* feat: `ParquetFileRepo::list_by_namespace_not_to_delete`

* feat: `ChunkAddr: Clone`

* test: ensure that querier keeps same partition objects

* test: improve `create_parquet_file` flexibility

* feat: sync chunks in querier

* test: improve `test_parquet_file`
2022-03-04 08:53:39 +00:00
Raphael Taylor-Davies e304613546
feat: include trace ID in query log (#3912) (#3923)
* feat: include trace ID in query log (#3912)

* chore: fmt

* chore: lint
2022-03-03 17:50:49 +00:00
Marco Neumann bbeba73345
feat: sync partitions in querier (#3900)
* feat: sync partitions in querier

* docs: explain per-table partition grouping
2022-03-03 09:28:44 +00:00
kodiakhq[bot] caba3e9fd2
Merge branch 'main' into cn/querier-flight-request 2022-03-02 20:30:00 +00:00
Edd Robinson 3d047073b9
feat: add tracing down to the chunk level (#3804)
* refactor: wire exectution context to Deduplicator

* feat: example trace to chunk read_filter

* refactor: make execution context required

* refactor: expose metadata API

* refactor: more span context for chunk read_filter

* refactor: fix build

* refactor: push context into result stream

* refactor: make executor optional
2022-03-02 19:08:22 +00:00
Carol (Nichols || Goulding) 3f2a58b47f
refactor: pub use data_types from data_types2
So it's clearer which parts of data_types the NG design is using, and
which types can be cleaned up eventually.
2022-03-02 13:55:31 -05:00
Carol (Nichols || Goulding) 2a90841715
refactor: Move IngesterQueryRequest to data_types2
So that querier doesn't need to depend on ingester.
2022-03-02 13:52:13 -05:00
Carol (Nichols || Goulding) 8f3e44bf76
refactor: Extract a crate for shared data types in the new design 2022-03-02 12:16:15 -05:00
Carol (Nichols || Goulding) 16d86ed05b
feat: Deserialize metadata to get max_sequencer_number
And add an end-to-end test for the flight request to the ingester.
2022-03-02 11:50:47 -05:00
Carol (Nichols || Goulding) 141a6087d0
feat: Querier able to send Flight queries to Ingester
Fixes #3773.
2022-03-02 11:50:45 -05:00
Marco Neumann 2fd68ea75f
feat: sync tables and schemas in querier (#3895)
* feat: convert `iox_catalog` schema to `schema::Schema`

* fix: remove leftover println statements

* feat: sync tables and schemas in querier

* feat: `PartitionRepo::list_by_namespace`

* docs: explain `QuerierNamespace` data structs a bit

* refactor: improve variable naming

* test: extend `test_sync_schemas

* fix: do not block forever when namespace is gone
2022-03-02 15:32:03 +00:00
Andrew Lamb 286d5f7b2b
feat: add `success` column to system.queries (#3891)
* feat: add `success` column to system.queries

* refactor: Remove lifetime from QueryCompletedToken and thread through flight

* test: update test to make incomplete query clearer

* refactor: use better patter to set complete

* fix: logical merge conflict
2022-03-02 15:05:06 +00:00
Marco Neumann af57664f53
feat: wire up flight into the querier (#3889) 2022-03-02 09:20:26 +00:00
Marco Neumann 936f51013d
test: querier shutdown and background task handling (#3881)
This is similar to what we already have for the ingester.
2022-03-02 08:59:50 +00:00
Marco Neumann daf14f6506 refactor: clean up `querier` a bit
Before adding more and more features, here is a bit of a clean up and
prep work:

- Pull out caching into its own module and add proper tests for it.
- Start to build a test infrastructure so tests are shorter and easier
  to read. This doesn't fully pay off just yet but gets more and more
  important when we actually sync tables and chunks.
2022-03-01 13:24:20 +01:00
Marco Neumann 48722783f9
feat: offer metrics for in-mem catalog (#3876)
This can be quite helpful to test certain caching behavior w/o writing
yet-another abstraction layer.
2022-03-01 11:33:54 +00:00
Marco Neumann 6e2470bf5f
feat: create CatalogChunk in querier (#3862)
* feat: `NamespaceRepo::get_by_id`

* feat: create `CatalogChunk` in querier
2022-02-28 17:20:38 +00:00
Marco Neumann b213796c98
feat: sync namespaces in querier (#3865)
* feat: `NamespaceRepo::list`

* feat: sync namespaces in querier

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-28 15:01:28 +00:00
Marco Neumann f966f4c7a4
feat: create `ParquetChunk` in querier (#3857)
Adds a small adapter that is able to produce `ParquetChunk`s for NG.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-25 08:54:16 +00:00
Luke Bond 4731913c44
feat: skeleton of querier CLI (#3843)
* feat: skeleton of querier CLI

* chore: wrap metrics in opt&arc in querier to satisfy new api

* chore: derive debug in querier handler

* chore: add join handles and their shutdown to nascent querier server

* chore: querier server http unimpl -> 404

* fix: join/shutdown fix in querier; removed unused delegates
2022-02-24 15:42:56 +00:00
dependabot[bot] 5a79b3a68b
chore(deps): Bump arrow-flight from 9.0.2 to 9.1.0 (#3829)
Bumps [arrow-flight](https://github.com/apache/arrow-rs) from 9.0.2 to 9.1.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0)

---
updated-dependencies:
- dependency-name: arrow-flight
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-23 11:03:22 +00:00
Andrew Lamb a30803e692
chore: Update datafusion, update `arrow`/`parquet`/`arrow-flight` to 9.0 (#3733)
* chore: Update datafusion

* chore: Update arrow

* fix: missing updates

* chore: Update cargo.lock

* fix: update for smaller parquet size

* fix: update test for smaller parquet files

* test: ensure parquet_file tests write multiple row groups

* fix: update callsite

* fix: Update for tests

* fix: harkari

* fix: use IoxObjectStore::existing

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-15 12:10:24 +00:00
Carol (Nichols || Goulding) 73828323ac
feat: Ingester Flight gRPC API (#3623)
* feat: Add a way to run ingester with an in-memory catalog from the CLI

If you set the --catalog-dsn string to "mem", rather than using that as
a Postgres connection URL, create an in-memory catalog.

Planning on using this in tests, so not documenting.

* fix: Set default topic to the same value as SHARED_KAFKA_TOPIC

Namely, both should use an underscore. I don't think there's a way to
directly share these values between a constant and an annotation.

* feat: Add a flight API (handshake only) to ingester

* fix: Create partitions if using file-based write buffer

* fix: Change the server fixture to handle ingester server type

For now, the ingester doesn't implement the deployment API. Not sure if
it should or not.

* feat: Start implementing ingester do_get, namely decoding the query

Skip serialization of the predicate for the moment.

* refactor: Rename ingest protos to ingester to match crate name

* refactor: Rename QueryResults to QueryData

* feat: Move ingester flight client to new querier crate

* fix: Off by one error, different starting indexes in sequencers

* fix: Create new CLI argument to pick the catalog type

* fix: Create a CLI option to set the number of topics to auto-create in the write buffer

* fix: Check the arrow flight service's health to tell that the ingester gRPC is up

* fix: Set postgres as the default catalog type

* fix: Return an error rather than panicking if CLI args aren't right
2022-02-09 19:07:44 +00:00