Commit Graph

3691 Commits (c53ae41d57cbf8727015f3570cab7c0a20a335ab)

Author SHA1 Message Date
Marco Neumann 09b7405b20
docs: spelling fixes
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-07-06 17:46:36 +02:00
kodiakhq[bot] 27d9bf4b2d
Merge pull request #1896 from influxdata/crepererum/issue1874
fix: `persist_row_threshold` limits the out chunk row count
2021-07-06 15:19:21 +00:00
Marco Neumann f45679183c chore: use saturating sub instead of simple sub 2021-07-06 16:41:36 +02:00
Marco Neumann 677314f52f fix: `persist_row_threshold` limits the out chunk row count
`persist_row_threshold` should limit the rows of the post-compaction
output chunk (and hence the sum of rows over the input chunks), not
the number of rows of each individual input chunk.

Fixes #1874.
2021-07-06 15:17:56 +02:00
kodiakhq[bot] 76d2c317a5
Merge pull request #1866 from influxdata/er/fix/read_buffer/predicate
fix: ensure no rows returned for predicates with disjoint matching expressions
2021-07-06 12:42:06 +00:00
Edd Robinson 2ec9151b32
Merge branch 'main' into er/fix/read_buffer/predicate 2021-07-06 13:35:04 +01:00
Marco Neumann 3d644b63a1 feat: add `Replay` state to DB init 2021-07-06 14:24:39 +02:00
kodiakhq[bot] 246a07f884
Merge pull request #1893 from influxdata/crepererum/fix_query_tests_rebuild
test: don't rebuild `query_tests` all the time
2021-07-05 13:37:30 +00:00
Marco Neumann 8387eaed27 test: do not recompile `query_tests` when test content changes
There is no need to recompile the entire `query_tests` crate when the
CONTENT (not the SET) of the test cases changes, e.g. due to new
optimizations, datafusion upgrades, query additions, etc. We now check
if `cases.rs` really changed before touching it, so that Cargo can rely
on the files mtime.
2021-07-05 15:30:10 +02:00
Marco Neumann d6cff911b6 test: ensure that query tests don't rebuild all the time
Beforehand:

```text
❯ env CARGO_LOG=cargo::core::compiler::fingerprint=info cargo test -p query_tests
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint] stale: changed "/home/mneumann/src/influxdb_iox/query_tests/cases"
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint]           (vs) "/home/mneumann/src/influxdb_iox/target/debug/build/query_tests-0e8f741dfb84437f/output"
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint]                FileTime { seconds: 1625474716, nanos: 436081357 } != FileTime { seconds: 1625474752, nanos: 52625167 }
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint] fingerprint error for query_tests v0.1.0 (/home/mneumann/src/influxdb_iox/query_tests)/Test/TargetInner { ..: lib_target("query_tests", ["lib"], "/home/mneumann/src/influxdb_iox/query_tests/src/lib.rs", Edition2018) }
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint]     err: current filesystem status shows we're outdated
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint] fingerprint error for query_tests v0.1.0 (/home/mneumann/src/influxdb_iox/query_tests)/RunCustomBuild/TargetInner { ..: custom_build_target("build-script-build", "/home/mneumann/src/influxdb_iox/query_tests/build.rs", Edition2018) }
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint]     err: current filesystem status shows we're outdated
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint] fingerprint error for query_tests v0.1.0 (/home/mneumann/src/influxdb_iox/query_tests)/Build/TargetInner { ..: lib_target("query_tests", ["lib"], "/home/mneumann/src/influxdb_iox/query_tests/src/lib.rs", Edition2018) }
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint]     err: current filesystem status shows we're outdated
   Compiling query_tests v0.1.0 (/home/mneumann/src/influxdb_iox/query_tests)
```

The issue is that both the input and the test output files are located
under `cases/`. `build.rs` used `cargo:rerun-if-changed=cases` which per
Cargo doc will scan ALL files in that directory. Note that the normal
`exclude` directive in `Cargo.toml` does NOT work, see
https://github.com/rust-lang/cargo/issues/4587 .

So we need to split input and output files into separate directories
(`cases/{in,out}`).
2021-07-05 15:30:10 +02:00
kodiakhq[bot] 403a2cdde4
Merge pull request #1892 from influxdata/crepererum/move_persistence_windows_code
chore: move persistence windows related code into own crate
2021-07-05 13:28:22 +00:00
Marco Neumann 4ca2d3e148 chore: move persistence windows related code into own crate
The entire persistence windows data structures (including the
checkpoints) have nothing to do with the mutable buffer per se. So lets
move them into their own crate. This also makes `parquet_file` not
longer depend on `mutable_buffer`.
2021-07-05 10:23:58 +02:00
kodiakhq[bot] 060689b050
Merge pull request #1872 from influxdata/crepererum/ckpt_in_parquet
feat: persist part+db checkpoint in parquets and catalog
2021-07-05 07:49:15 +00:00
Marco Neumann d96e15c3f7 docs: explain why we store checkpoints in parquet files 2021-07-05 09:42:46 +02:00
Marco Neumann cdab1bed05 feat: persist part+db checkpoint in parquets and catalog
This will be required for replay on server startup.
2021-07-05 09:42:46 +02:00
kodiakhq[bot] a35b334ee5
Merge pull request #1880 from influxdata/crepererum/db_state_in_grpc
feat: expose DB state in gRPC interface
2021-07-05 07:28:10 +00:00
kodiakhq[bot] bcf43a3de5
Merge branch 'main' into crepererum/db_state_in_grpc 2021-07-05 07:21:48 +00:00
Raphael Taylor-Davies 5fe49aa017
feat: add flush guard to PersistenceWindows (#1883)
* feat: add flush guard to PersistenceWindows

* docs: Update comments based on code review

* fix: fmt

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2021-07-02 20:15:33 +00:00
Raphael Taylor-Davies b4534883fe
refactor: remove table name from upsert_table (#1882)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-02 15:22:41 +00:00
Marko Mikulicic fba64a41f5
docs: improve profiling docs (#1869)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-02 10:47:13 +00:00
Marco Neumann 54fbb60740 feat: expose DB state in gRPC interface 2021-07-02 11:24:36 +02:00
kodiakhq[bot] 8386b1528e
Merge pull request #1875 from influxdata/pd-remove-mb-size-limit-checks
feat: remove MUB size threshold
2021-07-01 20:08:20 +00:00
kodiakhq[bot] 404da38d6f
Merge branch 'main' into pd-remove-mb-size-limit-checks 2021-07-01 20:01:32 +00:00
Raphael Taylor-Davies 5b00bc69e6
refactor: use Arc<Db> in lifecycle actions (#1873)
* refactor: use Arc<Db> in lifecycle actions

* chore: review feedback
2021-07-01 19:56:33 +00:00
Paul Dix 61917c107f chore: add test for can_move on row count 2021-07-01 15:49:44 -04:00
Paul Dix 91f5478012 feat: remove MUB size threshold
Removes the MUB chunk close based on size. Also add a check in lifecycle policy to move if the MUB chunk crosses a default row count threshold.
2021-07-01 14:58:29 -04:00
Andrew Lamb 56c8c8d428
feat: Use separate executor for queries and compactions/moves (#1870)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 16:47:50 +00:00
Raphael Taylor-Davies f1a100c6ae
refactor: remove now unused chunk sort order (#1854)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 16:39:45 +00:00
Raphael Taylor-Davies 43cabac3ac
feat: don't compact more than row threshold (#1868)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 16:31:50 +00:00
Andrew Lamb 07826306ed
fix: Always deduplicate data prior to insertion into the ReadBuffer (#1863)
* fix: mark ReadBuffer as always deduplicated

* fix: Use compact plans during merge

* docs: Update server/src/db/chunk.rs

Co-authored-by: Nga Tran <ntran@influxdata.com>

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Nga Tran <ntran@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 16:23:37 +00:00
Jacob Marble 0779b0d9bd
feat: add gRPC listener for new write protocol (#1842)
* feat: add gRPC listener for new write protocol

* chore: clippy happy

* chore: lint

* chore: cargo fmt --all

* chore: cargo clippy

* chore: protobuf-lint

* chore: more formatting

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 16:15:12 +00:00
kodiakhq[bot] 68c94283bd
Merge pull request #1841 from influxdata/ntran/dedup_less_concat
feat: avoid concat_batches if possible
2021-07-01 16:06:49 +00:00
kodiakhq[bot] e03a1a1def
Merge branch 'main' into ntran/dedup_less_concat 2021-07-01 15:59:22 +00:00
kodiakhq[bot] 26167a9e70
Merge pull request #1867 from influxdata/crepererum/rework_db_init_state_machine
refactor: rework DB init state machine
2021-07-01 15:31:10 +00:00
kodiakhq[bot] 84f2391edd
Merge branch 'main' into crepererum/rework_db_init_state_machine 2021-07-01 15:24:12 +00:00
Edd Robinson 8fc07cf4f0 fix: correctly evaluate exprs matching disjoint rows 2021-07-01 16:05:09 +01:00
Nga Tran d0afc7a176 refactor: clean up and add a missing else case 2021-07-01 11:00:30 -04:00
Nga Tran 5cf623201d fix: deduplicate the last batch before sending it downstream 2021-07-01 10:45:23 -04:00
Andrew Lamb 7235c7b965
refactor: Remove vestigial execution counters (#1865)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 14:08:06 +00:00
Marco Neumann e1e3163752 refactor: rework DB init state machine
Since adding new features like "sequencer replay" or init retries would
make the current code too complex, a refactor is required:

Config:
The config struct now holds a `DatabaseState` which is a simple linear
state machine representing the different stages of the database init.

Init:
The init module now has a fixpoint-loop which looks at the state,
decides what to do based on it and repeats until either the DB is
initialized or an error occured. This also makes it easier to continue
the init process "in the middle", e.g. when the preserved catalog is
broken or the sequencer (e.g. Kafka) could not be reached.
2021-07-01 13:47:51 +02:00
kodiakhq[bot] 8174af9137
Merge pull request #1856 from influxdata/crepererum/parquet_metadata_protobuf
refactor: use protobuf for in-parquet metadata
2021-07-01 08:00:27 +00:00
kodiakhq[bot] b817ea88dd
Merge branch 'main' into crepererum/parquet_metadata_protobuf 2021-07-01 07:52:39 +00:00
Raphael Taylor-Davies cc038010cd
feat: add persist_age_threshold to LifecycleRules (#1853)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-30 21:27:06 +00:00
Andrew Lamb cfa06e1497
chore: Add query tests for compacted chunks (#1861)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-30 20:59:29 +00:00
Nga Tran e8ef8e2790 chore: Merge branch 'main' into ntran/dedup_less_concat 2021-06-30 16:45:01 -04:00
kodiakhq[bot] 99093b18fb
Merge pull request #1862 from influxdata/ntran/timeout
fix: change timeout to have all tests passed on slow laptop
2021-06-30 20:17:39 +00:00
kodiakhq[bot] 0d24584ed3
Merge branch 'main' into ntran/timeout 2021-06-30 20:10:18 +00:00
Nga Tran f6731c60d7 fix: change timeout to have all tests passed on slow laptop 2021-06-30 16:04:02 -04:00
Nga Tran ba919726b6 test: unit tests 2021-06-30 15:01:31 -04:00
Raphael Taylor-Davies 99a15cd452
refactor: single lifecycle error enumeration (#1859)
* refactor: single lifecycle error enumeration

* fix: fmt

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2021-06-30 18:35:57 +00:00