Commit Graph

3991 Commits (d347750366ae383fdde686e9727bd4ecca48c811)

Author SHA1 Message Date
Carol (Nichols || Goulding) d347750366 refactor: Make collect_rub create the RBChunk
Which gets rid of the need for new_rub_chunk.

This will enable creating RBChunks that are guaranteed to have data.
2021-07-22 11:15:18 -04:00
Carol (Nichols || Goulding) 0a724878e6 refactor: Organize uses 2021-07-22 11:15:18 -04:00
Carol (Nichols || Goulding) 7371b0aabf refactor: Use existing new_rub_chunk function that has the same code 2021-07-22 11:15:18 -04:00
Carol (Nichols || Goulding) eadcb3265a refactor: Use some TryStreamExt adapters in collect_rub 2021-07-22 11:15:18 -04:00
Raphael Taylor-Davies 38e375d11a
feat: add chunk storage metrics (#2069)
* feat: add chunk storage metrics

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-22 15:13:09 +00:00
Raphael Taylor-Davies 8c974beba0
feat: add access timestamps to CatalogChunk (#2075) (#2081)
* feat: add access timestamps to CatalogChunk (#2075)

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-22 12:19:30 +00:00
kodiakhq[bot] f4b9fe20fd
Merge pull request #2084 from influxdata/crepererum/fix_checkpoints_again
refactor: correctly track "seen" ranges in persistence checkpoints
2021-07-22 11:45:39 +00:00
Marco Neumann 50241bae9e refactor: do not abuse `uint64::MAX` as sentinal for `None` 2021-07-22 12:51:43 +02:00
Marco Neumann 47ad397918 fix: address review comments 2021-07-22 12:37:07 +02:00
Marco Neumann 57a9d5ade0 refactor: correctly track "seen" ranges in persistence checkpoints
Now we can handle all these cases:

There are two partitions w/ a single write each:

1. A reads sequence number 1
2. B reads sequence number 2
3. we persist A which only knows the sequences up until 1
=> the DB checkpoint needs the global max, otherwise we forget sequences
   during replay (2 in this case, so B would be gone)

1. B reads sequence number 1
2. A reads sequence number 2
3. we persist A which (w/o this commit) would not track the sequencer at
   all in this checkpoint (since there is nothing to replay)
=> we MUST also remember that we already read up until 2, otherwise we'll
   re-read 2 after replay
=> the partition checkpoint needs the local seen max (no matter if there's
   something to to persist)
2021-07-21 19:19:49 +02:00
kodiakhq[bot] e1b2909818
Merge pull request #2079 from influxdata/crepererum/fix_db_checkpoints
fix: checkpoint collection (replay preparation)
2021-07-21 16:53:52 +00:00
kodiakhq[bot] 8c4f5cb237
Merge branch 'main' into crepererum/fix_db_checkpoints 2021-07-21 16:46:13 +00:00
kodiakhq[bot] 13ae2f0d78
Merge pull request #2070 from influxdata/ntran/dedup_compare_cols_order
feat: new algorithm to compute key ranges for deduplication
2021-07-21 15:50:11 +00:00
kodiakhq[bot] 18dd108ba6
Merge branch 'main' into ntran/dedup_compare_cols_order 2021-07-21 15:42:30 +00:00
Nga Tran 86add39175 refactor: address review comments 2021-07-21 11:41:21 -04:00
kodiakhq[bot] 56dd430d8f
Merge pull request #2077 from influxdata/crepererum/sequencer_metrics
feat: write buffer ingestion metrics
2021-07-21 13:30:22 +00:00
kodiakhq[bot] 91acf3911c
Merge branch 'main' into crepererum/sequencer_metrics 2021-07-21 13:23:23 +00:00
Marco Neumann 55490c279a fix: Kafka watermark error for new partitions 2021-07-21 15:21:52 +02:00
Marco Neumann cddf94653c refactor: use `write_buffer` subsystem for ingest metrics 2021-07-21 15:07:59 +02:00
Marco Neumann fd00206fbb refactor: increase watermark update frequence to once per 10s 2021-07-21 15:02:48 +02:00
Marco Neumann 2f1efcf517 docs: clarify difference 2021-07-21 15:00:53 +02:00
Marco Neumann 4d5f209030 docs: do not repeat unix that often 2021-07-21 14:59:07 +02:00
Marco Neumann a5fc1c7d38 fix: collect min AND max in database checkpoints
This is required to correctly handle the following case:

1. There are two partitions A and B w/ a single write each (from the same
   sequencer).
2. We persist A:
   - The partition checkpoint for A will be empty because after persistence
     there will be nothing to replay (the single write is persisted and
     we're ready).
   - The database checkpoint that contains the global minimum of all ranges
     recognizes that for the sequencer there is indeed something left (the
     minimum sequence number from B).
3. DB restart happens, replay starts
4. We scan all persisted files, figure out that we have a DB checkpoint
   with a sequence minimum but (w/o the change in this commit) there is no
   maximum. Only partition checkpoints contain maxima, and the only partition
   checkpoint that was persisted was the one for partition A and that one was
   empty (see above).
5. So now how do we recover partition B?
2021-07-21 14:48:29 +02:00
Marco Neumann ec866de193 fix: collect checkpoint data from all tables 2021-07-21 14:48:29 +02:00
Marco Neumann 7d597d1d5c refactor: make ingest metrics easier to understand 2021-07-21 13:57:53 +02:00
Raphael Taylor-Davies ffe6e62aee
feat: add instant to datetime conversion (#2078)
* feat: add instant to datetime conversion

* chore: review feedback
2021-07-21 11:43:27 +00:00
Marco Neumann fb931bb1ca feat: write buffer ingestion metrics 2021-07-21 11:59:52 +02:00
Marco Neumann 5df88c70aa feat: add ability to fetch watermarks from write buffer 2021-07-21 11:59:52 +02:00
kodiakhq[bot] 58108b79ec
Merge pull request #2058 from influxdata/pd/add-cache-config
feat: add parquet cache size setting to database rules
2021-07-21 09:42:07 +00:00
kodiakhq[bot] 94a45339fd
Merge branch 'main' into pd/add-cache-config 2021-07-21 09:35:26 +00:00
Andrew Lamb 387667330a
chore: Update datafusion deps (#2073)
* chore: Update datafusion deps

* fix: update tests
2021-07-21 08:27:03 +00:00
Paul Dix a4704dd165 chore: update parquet_cache_limit to u64 and 0 for default 2021-07-20 15:41:06 -04:00
Paul Dix 297e059085 feat: add parquet cache size setting to database rules 2021-07-20 15:41:06 -04:00
Nga Tran d547c22e97 refactor: comments 2021-07-20 15:27:41 -04:00
Nga Tran 150e166813 refactor: fix comments 2021-07-20 15:16:24 -04:00
Nga Tran fa6d216a85 refactor: cleanup 2021-07-20 15:11:02 -04:00
Nga Tran b98888e8d6 feat: implement key_ranges function that uses new range identify algo 2021-07-20 14:58:54 -04:00
Raphael Taylor-Davies 61da0fe4df
fix: update last_instant when rotating into persistable window (#2067) 2021-07-20 16:38:28 +00:00
Raphael Taylor-Davies 091837420f
feat: add PersistenceWindows sytem table (#2030) (#2062)
* feat: add PersistenceWindows sytem table (#2030)

* chore: update log

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-20 13:10:57 +00:00
Raphael Taylor-Davies e4d2c51e8b
fix: update PersistenceWindows on rules update (#2018) (#2060)
* fix: update PersistenceWindows on rules update (#2018)

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-20 12:44:47 +00:00
kodiakhq[bot] 7d9e1f9704
Merge pull request #2059 from influxdata/crepererum/writer_buffer_seek
feat: implement `seek` for write buffer
2021-07-20 12:36:20 +00:00
kodiakhq[bot] 58dd7e9532
Merge branch 'main' into crepererum/writer_buffer_seek 2021-07-20 12:29:18 +00:00
kodiakhq[bot] 2a7848cbf2
Merge pull request #2064 from influxdata/biggermsg
fix: Increase kafka message size to 30MiB
2021-07-20 12:28:55 +00:00
kodiakhq[bot] a4951b5835
Merge branch 'main' into biggermsg 2021-07-20 12:22:19 +00:00
Raphael Taylor-Davies cf8a60252d
refactor: split system_tables module into smaller modules (#2061)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-20 12:19:20 +00:00
Marko Mikulicic c01cfbc34c
fix: Increase kafka message size 2021-07-20 14:17:37 +02:00
kodiakhq[bot] cf30d19fd7
Merge pull request #2063 from influxdata/er/fix/flaky_compact_test
test: ensure high enough limit
2021-07-20 12:02:27 +00:00
Marco Neumann ec7ebdff29 refactor: use lifetimes to ensure single stream / no seek while streaming 2021-07-20 13:52:33 +02:00
Edd Robinson cc0aaa58a7 test: ensure high enough limit 2021-07-20 12:43:10 +01:00
Marco Neumann b0663a0337 feat: disallow multiple write buffer streams and seeking while streams
Multiple streams will mess up ordering. Seeking while streaming is
likely a bug and should not work.
2021-07-20 12:35:20 +02:00