* refactor: Separate query_tests into its own crate
* fix: references
* refactor: break out server benchmarks
* fix: Update query_tests/src/lib.rs
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Before this change we loaded databases eagerly when a serverID was
passed on startup BEFORE starting up the gRPC server. Since loading
(esp. at its current state without checkpoints and with too many small
parquet files) can take very long, K8s thinks IOx is unhealthy. With
this change we are now loading databases in the server background worker
once a serverID is available. Until then we block all DB-related
interactions including adding new databases (since without inspecting
the object store there is now way we can check if the DB already
exists).
Furthermore we now load database no matter if the serverID was passed on
startup (via CLI or environment variable) or was set later via gRPC
call. Before this change the latter case was somewhat forgotten.
Since the number of parquet files can potentially be unbound (aka very
very large) and we do not want to hold the transaction lock for too
long and also want to limit memory consumption of the cleanup routine,
let's limit the number of files that we collect for cleanup.
* refactor: Make it clear only partition_key and table name pruning is happening in catalog
* fix: clippy
* fix: Update server/src/db/catalog.rs
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* refactor: use TableNameFilter enum rather than Option
* docs: Add docstring to the `From` implementation
* fix: Update server/src/db/catalog/partition.rs
Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Edd Robinson <me@edd.io>
There are not functional changes here (except that errors look slightly
different) but it should allow for an easier move of the DB loading into
a delayed task.
* refactor: Remove last vestiges of multi-table chunks from PartitionChunk API
* fix: remove test that can no longer fail
* fix: update tests + code review comments
* fix: clippy
* fix: clippy
* fix: restore test_measurement_fields_error test
This is the first step towards #1513. However it leaves all consumers
bascially unchanged and also does NOT touch state transitions. These
changes will follow in upcoming PRs.
* feat: only store ChunkSnapshot in Closed state
* chore: review feedback
* feat: record MUB size as closed size
* chore: document column ordering assumption
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: Rearrange to allow injection of the current time in tests
* test: Failing test showing a point can be in the wrong partition
* fix: Only get the default time once per ShardedEntry creation, in router
The parquet files produced by this code path are only semi-specified and
will miss many important metadata aspects that we will require for data
lineage.
That involves some refactoring which we are going to need anyway for
hooking up the "read" path of the catalog into the DB startup, namely:
- make `Db::new` require a preserved catalog
- introduce a helper function that can provide that
- as a consequence, all test-creations of a Db are now async
This prepares for #1382.
Required a bit of refactoring:
- Add an extra layer between DB an catalog which is the "preserved
catalog" wrapper. This is required to make the ownership model
somewhat sane, because during the read operations the "preserved
catalog" is going to act on the in-mem catalog.
- Move "parquet file written" logic into binding `preserved catalog <->
catalog state`, so we have a single place where new parquet files are
announced. For now this only works for chunks that are already known
(i.e. the writing->written transation when coming from read buffer),
however in the next PR this will be extended to also handle totally
new parquet files during transaction playback.
**NOTE: This does NOT include the read path yet!**
Issue: #1382.
* feat: push metrics into catalog
* chore: minor cleanup
* fix: include db labels in chunk metric domains
* chore: fmt
* fix: don't allow dropping moving chunks
* chore: further tweaks
* chore: review feedback
* feat: use new_unregistered() for metric instruments instead of default
* chore: use &[KeyValue] instead of &Vec<KeyValue>
* refactor: make GauageValue non default constructible
ingest_lines_total count lines (which apparently are the same as points, quite confusingly)
No yaks harmed in the making of this PR.
(NOTE: the code around metric, especially dealing with happy and error paths is very painful;
to be done in another PR)
This will allow us to:
- handle all-NULL columns correctly
- be in-line with Parquet (where min/max are optional)
- handle NaNs at least somewhat sane (they do not "poison" stats
anymore)
* feat: Expose the storage usage for each column in system.chunk_columns
* fix: fixup logical conflicts
* refactor: move coalsce logic into the read buffer
* fix: Update system_tables to not use coalese
* fix: Improve comments
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* feat: add column_type and influxdb_column_type, remove row_count from system.columns
* fix: update tests
* fix: more test update
* fix: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* fix: fmt
* fix: copy/paste type conversion to avoid cross dependency between data_types and internal_types
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Connects to #1157.
Rearrange some code and comments to be consistent with the design. Make
some more places not care whether they're getting an owned or borrowed
SequencedEntry.
The syntax highlighting in my editor broke because of the unmatched
double quote, which got me to look a bit closer at this test. These
tests would have failed on Windows.
This refactors the write buffer to use the sequenced entry structure and the new segment definition. It removes the old replicated write and write_buffer.fbs.
Finally, it updates the SequencedEntry wrapper type around the Flatbuffer structure to be a trait so that SequencedEntry can be initialized from a borrowed Flatbuffer or an owned Vec<u8>.
How writes go into segments in the buffer and any kind of validation will likely have to be updated based on what kinds of guarantees we want to make in the buffer. However, that should probably come after we've rethought the design a bit around the new layout of chunks in the Parquet persistence.
* test: add test for operations.system_tables
* fix: only show operations for current database
* fix: update test
* fix: improve test
* refactor: filter in Schema provider rather than in job tracker
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
When we get the flatbuffers, we won't have the server ID in addition to
the flatbuffers-- it's in the flatbuffers. But we want to validate the
`ServerId` once when the `SequencedEntry` is created so that its
`server_id` method can assume it has a valid `ServerId`.
Pre-sized channels get full when the results to send over them are larger than the capacities. This causes significant runtime overhead and slows down query performance.
This commit removes the intermediate channels. The potential downside to this approach is there may be more buffering which could increase memory usage during query and also block a thread for longer periods of time.
Chunks now always have "data" (aka exactly 1 table including
schema/columns). Open chunks can only be created with data. Rollovers do
NOT create open chunks anymore (this is now only done for incoming
data).
Closes#1296.
This changes the hierarchy from
```
database -> partition -> chunk -> table
```
to
```
database -> partition -> table -> chunk
```
Only the high-level APIs are changed for now. The chunk states (like
MutableBuffer and ReadBuffer) still multiplex tables, although they will
always only get a single table assigned (or no table if no data was
presented yet).
Closes#1256.
* refactor: plumb executor into all Db instances
* refactor: Route all query executions through worker pool
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: use a dedicated tokio threadpool for running queries
* feat: plumb number of executor threads through to command line
thread through command line
* fix: Logical merge conflict
* fix: another logical conflict
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
The initial benchmarks look like this on my i9 MBP:
```
Data in one open chunk and one closed chunk of mutable buffer/tag0/no_pred 1.00 91.0±2.55ms ? ?/sec
Data in one open chunk and one closed chunk of mutable buffer/tag0/with_pred 1.00 11.5±0.72ms ? ?/sec
Data in one open chunk and one closed chunk of mutable buffer/tag1/no_pred 1.00 120.3±5.10ms ? ?/sec
Data in one open chunk and one closed chunk of mutable buffer/tag1/with_pred 1.00 11.2±0.22ms ? ?/sec
Data in one open chunk and one closed chunk of mutable buffer/tag2/no_pred 1.00 203.2±8.45ms ? ?/sec
Data in one open chunk and one closed chunk of mutable buffer/tag2/with_pred 1.00 11.2±0.21ms ? ?/sec
Data in open chunk of mutable buffer, and one chunk of read buffer/tag0/no_pred 1.00 100.3±3.73ms ? ?/sec
Data in open chunk of mutable buffer, and one chunk of read buffer/tag0/with_pred 1.00 31.2±1.80ms ? ?/sec
Data in open chunk of mutable buffer, and one chunk of read buffer/tag1/no_pred 1.00 126.7±2.29ms ? ?/sec
Data in open chunk of mutable buffer, and one chunk of read buffer/tag1/with_pred 1.00 33.0±1.70ms ? ?/sec
Data in open chunk of mutable buffer, and one chunk of read buffer/tag2/no_pred 1.00 212.0±6.86ms ? ?/sec
Data in open chunk of mutable buffer, and one chunk of read buffer/tag2/with_pred 1.00 18.1±0.99ms ? ?/sec
Data in single open chunk of mutable buffer/tag0/no_pred 1.00 98.7±6.08ms ? ?/sec
Data in single open chunk of mutable buffer/tag0/with_pred 1.00 11.2±0.37ms ? ?/sec
Data in single open chunk of mutable buffer/tag1/no_pred 1.00 118.9±3.97ms ? ?/sec
Data in single open chunk of mutable buffer/tag1/with_pred 1.00 11.7±0.64ms ? ?/sec
Data in single open chunk of mutable buffer/tag2/no_pred 1.00 202.1±8.49ms ? ?/sec
Data in single open chunk of mutable buffer/tag2/with_pred 1.00 11.1±0.27ms ? ?/sec
Data in two read buffer chunks/tag0/no_pred 1.00 109.2±5.20ms ? ?/sec
Data in two read buffer chunks/tag0/with_pred 1.00 44.2±1.83ms ? ?/sec
Data in two read buffer chunks/tag1/no_pred 1.00 132.9±3.79ms ? ?/sec
Data in two read buffer chunks/tag1/with_pred 1.00 41.7±2.43ms ? ?/sec
Data in two read buffer chunks/tag2/no_pred 1.00 222.4±7.00ms ? ?/sec
Data in two read buffer chunks/tag2/with_pred 1.00 27.9±0.92ms ? ?/sec
```