Commit Graph

493 Commits (be9b3a48534e2292d9fc26c4fd1c88c1db72254f)

Author SHA1 Message Date
Marco Neumann 294c304491 feat: impl catalog checkpointing infrastructure
This implements a way to add checkpoints to the preserved catalog and
speed up replay.

Note: This leaves the "hook it up into the actual DB" for a future PR.

Issue: #1381.
2021-06-10 15:42:21 +02:00
kodiakhq[bot] 3ba27bdbd9
Merge branch 'main' into crepererum/clippy_future_not_send_part1 2021-06-10 07:19:31 +00:00
kodiakhq[bot] 5f863a59fd
Merge branch 'main' into crepererum/extract_server_init 2021-06-10 07:14:57 +00:00
kodiakhq[bot] 44d8fb9472
Merge branch 'main' into crepererum/clippy_future_not_send_part1 2021-06-10 07:10:11 +00:00
kodiakhq[bot] eed73a30c5
Merge branch 'main' into ntran/dedup_within_chunk 2021-06-09 18:19:17 +00:00
Nga Tran c1c58018fc refactor: address review comments 2021-06-09 14:17:47 -04:00
Marco Neumann 4fe2d7af9c chore: enforce `clippy::future_not_send` for `parquet_file` 2021-06-09 18:18:27 +02:00
Marco Neumann d9c38dfe88 refactor: extract server init code
This prepares for #1624, so the end results looks a bit cleaner.
2021-06-09 16:53:11 +02:00
kodiakhq[bot] b49abf9b02
Merge branch 'main' into crepererum/lazy_db_loading 2021-06-09 07:23:35 +00:00
Raphael Taylor-Davies 07c4277ca7
refactor: schema merge to give more control over field merging (#1653)
* refactor: schema merge to give more control over field merging

* chore: review feedback
2021-06-09 06:30:45 +00:00
Nga Tran 3e10351538 test: add tests for the sort plan 2021-06-08 21:40:46 -04:00
Nga Tran 68e3a2121f feat: add SortExec 2021-06-08 15:04:31 -04:00
Andrew Lamb fd8a87484e feat: Hook up chunk grouping into provider 2021-06-08 14:42:37 -04:00
Nga Tran edbf1b7d5e Merge branch 'main' into ntran/dedup_within_chunk 2021-06-08 13:18:40 -04:00
Nga Tran 40cb4f741f feat: initial implementaton 2021-06-08 13:17:36 -04:00
Carol (Nichols || Goulding) 50a69a7f18 fix: Don't mention Kafka unless it's absolutely necessary 2021-06-07 13:01:04 -04:00
Carol (Nichols || Goulding) 2bb2c4ba47 docs: Add some doc comments about the WriteBuffer trait 2021-06-07 11:22:33 -04:00
Carol (Nichols || Goulding) a8a4a5f29d fix: Return the Sequence type from the write buffer, not vague WriteMetadata 2021-06-07 11:15:46 -04:00
Carol (Nichols || Goulding) a63c12acfb fix: Remove references to Kafka from db tests 2021-06-07 10:58:34 -04:00
Carol (Nichols || Goulding) 45a3547978 refactor: Take ownership of Entry and transform into SequencedEntry
Rather than cloning the data. The Entry is no longer used after this
point.
2021-06-07 09:56:23 -04:00
Carol (Nichols || Goulding) 8ab8544d4a feat: Wire up a WriteBuffer trait implemented by a mock
With an unimplemented where the Kafka implementation will be.
2021-06-07 09:56:23 -04:00
Carol (Nichols || Goulding) 2418e91001 feat: Add a DatabaseRule field for an optional Kafka write buffer connection string 2021-06-07 09:56:23 -04:00
Carol (Nichols || Goulding) b5fac8cd59 refactor: Rearrange database rule checks and SequencedEntry construction
There are going to be more cases here when the Kafka write buffer is
introduced that affect how the SequencedEntry is created and whether a
database being immutable is an error or not.
2021-06-07 09:37:22 -04:00
Carol (Nichols || Goulding) 7ff2c5c951 refactor: Rearrange reading of db rules and locking 2021-06-07 09:37:22 -04:00
Carol (Nichols || Goulding) 0139167c98 refactor: Extract a Sequence type
A sequencer id and sequence number should always go together, so convey
that with a type. Also, this removes lots of repetition of "sequence" 😅
2021-06-07 09:37:22 -04:00
Carol (Nichols || Goulding) 4d6569583e fix: Partially restore SequencedEntry as Entry+sequencer_id+sequence_num 2021-06-04 14:40:19 -04:00
Carol (Nichols || Goulding) f4a9a5ae56 fix: Remove write buffer 2021-06-04 14:40:17 -04:00
Andrew Lamb 42f26b609b
refactor: Move `query_tests` and `server_benchmarks` into their own crate --> smaller `server` (#1628)
* refactor: Separate query_tests into its own crate

* fix: references

* refactor: break out server benchmarks

* fix: Update query_tests/src/lib.rs

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-06-04 17:31:19 +00:00
Andrew Lamb ff3215e6a9
feat: Implement Chunk Pruning (#1567) 2021-06-04 13:05:22 +00:00
Marco Neumann 195644da04 docs: document semaphore design in server 2021-06-04 12:52:13 +02:00
kodiakhq[bot] 402ef0ebde
Merge branch 'main' into crepererum/limit_cleanup_amount 2021-06-04 10:47:33 +00:00
Marco Neumann e06d65bb2a refactor: migrate "DBs initialized" RPC to "server status" 2021-06-04 11:33:41 +02:00
Marco Neumann b30d7e2821 feat: move DB loading into background worker
Before this change we loaded databases eagerly when a serverID was
passed on startup BEFORE starting up the gRPC server. Since loading
(esp. at its current state without checkpoints and with too many small
parquet files) can take very long, K8s thinks IOx is unhealthy. With
this change we are now loading databases in the server background worker
once a serverID is available. Until then we block all DB-related
interactions including adding new databases (since without inspecting
the object store there is now way we can check if the DB already
exists).

Furthermore we now load database no matter if the serverID was passed on
startup (via CLI or environment variable) or was set later via gRPC
call. Before this change the latter case was somewhat forgotten.
2021-06-04 11:33:41 +02:00
Raphael Taylor-Davies 696ebdc4db
feat: recover failed lifecycle actions (#1099) (#1592)
* feat: recover failed lifecycle actions (#1099)

* chore: review feedback

* chore: fix logical conflicts
2021-06-03 15:46:33 +00:00
Marco Neumann 91df8a30e7 feat: limit number of files during storage cleanup
Since the number of parquet files can potentially be unbound (aka very
very large) and we do not want to hold the transaction lock for too
long and also want to limit memory consumption of the cleanup routine,
let's limit the number of files that we collect for cleanup.
2021-06-03 17:43:11 +02:00
Edd Robinson e583e1fbda
Merge branch 'main' into er/feat/read_buffer/float_int 2021-06-03 14:48:36 +01:00
Andrew Lamb eaa5b75437
refactor: Make it clear only partition_key and table name pruning happens in catalog (#1608)
* refactor: Make it clear only partition_key and table name pruning is happening in catalog

* fix: clippy

* fix: Update server/src/db/catalog.rs

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* refactor: use TableNameFilter enum rather than Option

* docs: Add docstring to the `From` implementation

* fix: Update server/src/db/catalog/partition.rs

Co-authored-by: Edd Robinson <me@edd.io>

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Edd Robinson <me@edd.io>
2021-06-03 13:09:09 +00:00
Edd Robinson 65bfa4dd10 test: fix tests 2021-06-03 12:32:40 +01:00
Marco Neumann 27b9477aa4 test: fix flaky test 2021-06-03 11:23:29 +02:00
Marco Neumann 7b2663a38a test: make tests faster 2021-06-03 11:23:29 +02:00
Marco Neumann 3c9fd81697 refactor: split overlong line 2021-06-03 11:23:29 +02:00
Marco Neumann bbd73e59be feat: jitter background clean-up job + wait on first job 2021-06-03 11:23:29 +02:00
Marco Neumann ce412dbce2 fix: use structured error for background cleanup task reporting 2021-06-03 11:23:29 +02:00
kodiakhq[bot] 1c764c47a2
Merge branch 'main' into ntran/deduplicate 2021-06-02 17:42:36 +00:00
Nga Tran 40bd932fff refactor: address Andrew's comment 2021-06-02 13:41:46 -04:00
Andrew Lamb 32c6ed1f34
refactor: More cleanup related to multi-table chunks (#1604)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-02 17:00:23 +00:00
Nga Tran e7a97f3ac1 test: merge main and add more tests for deduplicate work 2021-06-02 12:00:40 -04:00
Marco Neumann 80f4d84ce8 refactor: isolate DB loading and streamline error handling
There are not functional changes here (except that errors look slightly
different) but it should allow for an easier move of the DB loading into
a delayed task.
2021-06-02 13:42:24 +02:00
kodiakhq[bot] 0e09b20ca8
Merge branch 'main' into crepererum/issue1513-b 2021-06-02 07:08:29 +00:00
Nga Tran 40df7def0e test: ttests for the deduplicate work 2021-06-01 18:06:35 -04:00