Commit Graph

4038 Commits (c386ac013c3669adcc6e65217a0f0534a8c6f692)

Author SHA1 Message Date
Marco Neumann 50241bae9e refactor: do not abuse `uint64::MAX` as sentinal for `None` 2021-07-22 12:51:43 +02:00
Marco Neumann 47ad397918 fix: address review comments 2021-07-22 12:37:07 +02:00
Paul Dix d95b5df03e refactor: move cache to ObjectStore
Since the consumers of ObjectStore always use the concrete type rather than the ObjectStoreApi trait, it makes more sense to just change the concrete type to have a pointer to the cache. This removes the cache from the ObjectStoreApi trait and changes the ObjectStore to be a regular struct rather than a tuple around the ObjectStoreIntegration. Future work will have the server configure the cache on the ObjectStore struct when its options are set.
2021-07-21 18:27:56 -04:00
Paul Dix 47044d537c chore: make cache a type in object store trait 2021-07-21 18:27:56 -04:00
Paul Dix d0ea812041 feat: add skeleton for object store file cache 2021-07-21 18:27:56 -04:00
Nga Tran b2063fb29f test: fix the stats and discover a bug in compaction/split/dedupplication 2021-07-21 17:40:48 -04:00
Marco Neumann 57a9d5ade0 refactor: correctly track "seen" ranges in persistence checkpoints
Now we can handle all these cases:

There are two partitions w/ a single write each:

1. A reads sequence number 1
2. B reads sequence number 2
3. we persist A which only knows the sequences up until 1
=> the DB checkpoint needs the global max, otherwise we forget sequences
   during replay (2 in this case, so B would be gone)

1. B reads sequence number 1
2. A reads sequence number 2
3. we persist A which (w/o this commit) would not track the sequencer at
   all in this checkpoint (since there is nothing to replay)
=> we MUST also remember that we already read up until 2, otherwise we'll
   re-read 2 after replay
=> the partition checkpoint needs the local seen max (no matter if there's
   something to to persist)
2021-07-21 19:19:49 +02:00
kodiakhq[bot] e1b2909818
Merge pull request #2079 from influxdata/crepererum/fix_db_checkpoints
fix: checkpoint collection (replay preparation)
2021-07-21 16:53:52 +00:00
kodiakhq[bot] 8c4f5cb237
Merge branch 'main' into crepererum/fix_db_checkpoints 2021-07-21 16:46:13 +00:00
kodiakhq[bot] 13ae2f0d78
Merge pull request #2070 from influxdata/ntran/dedup_compare_cols_order
feat: new algorithm to compute key ranges for deduplication
2021-07-21 15:50:11 +00:00
kodiakhq[bot] 18dd108ba6
Merge branch 'main' into ntran/dedup_compare_cols_order 2021-07-21 15:42:30 +00:00
Nga Tran 86add39175 refactor: address review comments 2021-07-21 11:41:21 -04:00
kodiakhq[bot] 56dd430d8f
Merge pull request #2077 from influxdata/crepererum/sequencer_metrics
feat: write buffer ingestion metrics
2021-07-21 13:30:22 +00:00
kodiakhq[bot] 91acf3911c
Merge branch 'main' into crepererum/sequencer_metrics 2021-07-21 13:23:23 +00:00
Marco Neumann 55490c279a fix: Kafka watermark error for new partitions 2021-07-21 15:21:52 +02:00
Marco Neumann cddf94653c refactor: use `write_buffer` subsystem for ingest metrics 2021-07-21 15:07:59 +02:00
Marco Neumann fd00206fbb refactor: increase watermark update frequence to once per 10s 2021-07-21 15:02:48 +02:00
Marco Neumann 2f1efcf517 docs: clarify difference 2021-07-21 15:00:53 +02:00
Marco Neumann 4d5f209030 docs: do not repeat unix that often 2021-07-21 14:59:07 +02:00
Marco Neumann a5fc1c7d38 fix: collect min AND max in database checkpoints
This is required to correctly handle the following case:

1. There are two partitions A and B w/ a single write each (from the same
   sequencer).
2. We persist A:
   - The partition checkpoint for A will be empty because after persistence
     there will be nothing to replay (the single write is persisted and
     we're ready).
   - The database checkpoint that contains the global minimum of all ranges
     recognizes that for the sequencer there is indeed something left (the
     minimum sequence number from B).
3. DB restart happens, replay starts
4. We scan all persisted files, figure out that we have a DB checkpoint
   with a sequence minimum but (w/o the change in this commit) there is no
   maximum. Only partition checkpoints contain maxima, and the only partition
   checkpoint that was persisted was the one for partition A and that one was
   empty (see above).
5. So now how do we recover partition B?
2021-07-21 14:48:29 +02:00
Marco Neumann ec866de193 fix: collect checkpoint data from all tables 2021-07-21 14:48:29 +02:00
Marco Neumann 7d597d1d5c refactor: make ingest metrics easier to understand 2021-07-21 13:57:53 +02:00
Raphael Taylor-Davies ffe6e62aee
feat: add instant to datetime conversion (#2078)
* feat: add instant to datetime conversion

* chore: review feedback
2021-07-21 11:43:27 +00:00
Marco Neumann fb931bb1ca feat: write buffer ingestion metrics 2021-07-21 11:59:52 +02:00
Marco Neumann 5df88c70aa feat: add ability to fetch watermarks from write buffer 2021-07-21 11:59:52 +02:00
kodiakhq[bot] 58108b79ec
Merge pull request #2058 from influxdata/pd/add-cache-config
feat: add parquet cache size setting to database rules
2021-07-21 09:42:07 +00:00
kodiakhq[bot] 94a45339fd
Merge branch 'main' into pd/add-cache-config 2021-07-21 09:35:26 +00:00
Andrew Lamb 387667330a
chore: Update datafusion deps (#2073)
* chore: Update datafusion deps

* fix: update tests
2021-07-21 08:27:03 +00:00
Paul Dix a4704dd165 chore: update parquet_cache_limit to u64 and 0 for default 2021-07-20 15:41:06 -04:00
Paul Dix 297e059085 feat: add parquet cache size setting to database rules 2021-07-20 15:41:06 -04:00
Nga Tran d547c22e97 refactor: comments 2021-07-20 15:27:41 -04:00
Nga Tran 150e166813 refactor: fix comments 2021-07-20 15:16:24 -04:00
Nga Tran fa6d216a85 refactor: cleanup 2021-07-20 15:11:02 -04:00
Nga Tran b98888e8d6 feat: implement key_ranges function that uses new range identify algo 2021-07-20 14:58:54 -04:00
Raphael Taylor-Davies 61da0fe4df
fix: update last_instant when rotating into persistable window (#2067) 2021-07-20 16:38:28 +00:00
Raphael Taylor-Davies 091837420f
feat: add PersistenceWindows sytem table (#2030) (#2062)
* feat: add PersistenceWindows sytem table (#2030)

* chore: update log

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-20 13:10:57 +00:00
Raphael Taylor-Davies e4d2c51e8b
fix: update PersistenceWindows on rules update (#2018) (#2060)
* fix: update PersistenceWindows on rules update (#2018)

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-20 12:44:47 +00:00
kodiakhq[bot] 7d9e1f9704
Merge pull request #2059 from influxdata/crepererum/writer_buffer_seek
feat: implement `seek` for write buffer
2021-07-20 12:36:20 +00:00
kodiakhq[bot] 58dd7e9532
Merge branch 'main' into crepererum/writer_buffer_seek 2021-07-20 12:29:18 +00:00
kodiakhq[bot] 2a7848cbf2
Merge pull request #2064 from influxdata/biggermsg
fix: Increase kafka message size to 30MiB
2021-07-20 12:28:55 +00:00
kodiakhq[bot] a4951b5835
Merge branch 'main' into biggermsg 2021-07-20 12:22:19 +00:00
Raphael Taylor-Davies cf8a60252d
refactor: split system_tables module into smaller modules (#2061)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-20 12:19:20 +00:00
Marko Mikulicic c01cfbc34c
fix: Increase kafka message size 2021-07-20 14:17:37 +02:00
kodiakhq[bot] cf30d19fd7
Merge pull request #2063 from influxdata/er/fix/flaky_compact_test
test: ensure high enough limit
2021-07-20 12:02:27 +00:00
Marco Neumann ec7ebdff29 refactor: use lifetimes to ensure single stream / no seek while streaming 2021-07-20 13:52:33 +02:00
Edd Robinson cc0aaa58a7 test: ensure high enough limit 2021-07-20 12:43:10 +01:00
Marco Neumann b0663a0337 feat: disallow multiple write buffer streams and seeking while streams
Multiple streams will mess up ordering. Seeking while streaming is
likely a bug and should not work.
2021-07-20 12:35:20 +02:00
Raphael Taylor-Davies 767c2a6fe1
refactor: explicit server startup state machine (#2040)
* refactor: explicit server startup state machine

* chore: update `ServerStage` docs

* chore: further docs

* chore: more logging

* chore: format
2021-07-20 10:11:18 +00:00
Andrew Lamb 2c20528c69
chore: use upstream versions of some workarounds (#2057)
* chore: use upstream versions of some workarounds

* docs: update docstring

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-20 08:53:46 +00:00
Raphael Taylor-Davies 8e5d5928cf
feat: compute WriteSummary from PersistenceWindows (#2030) (#2054)
* feat: compute WriteSummary from PersistenceWindows (#2030)

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-20 08:46:52 +00:00