Nga Tran
76789e5902
feat: store sotkey into the chunk schema of RUB
2021-07-06 17:00:35 -04:00
Marco Neumann
b6185982f7
refactor: make `ProviderBuilder` a build-time-checked builder
...
It's safer and also avoids cloning / copying state around.
2021-07-06 18:20:05 +02:00
Marco Neumann
4f5fe62428
feat: add DB name to lifecycle logs ( #1900 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-06 16:14:28 +00:00
Marco Neumann
09b7405b20
docs: spelling fixes
...
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-07-06 17:46:36 +02:00
Marco Neumann
3d644b63a1
feat: add `Replay` state to DB init
2021-07-06 14:24:39 +02:00
Marco Neumann
4ca2d3e148
chore: move persistence windows related code into own crate
...
The entire persistence windows data structures (including the
checkpoints) have nothing to do with the mutable buffer per se. So lets
move them into their own crate. This also makes `parquet_file` not
longer depend on `mutable_buffer`.
2021-07-05 10:23:58 +02:00
Marco Neumann
cdab1bed05
feat: persist part+db checkpoint in parquets and catalog
...
This will be required for replay on server startup.
2021-07-05 09:42:46 +02:00
kodiakhq[bot]
bcf43a3de5
Merge branch 'main' into crepererum/db_state_in_grpc
2021-07-05 07:21:48 +00:00
Nga Tran
405a6a691b
feat: intial implementation of #1886 : avoid resort if appropriate
2021-07-02 17:57:48 -04:00
Raphael Taylor-Davies
b4534883fe
refactor: remove table name from upsert_table ( #1882 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-02 15:22:41 +00:00
Marco Neumann
54fbb60740
feat: expose DB state in gRPC interface
2021-07-02 11:24:36 +02:00
kodiakhq[bot]
404da38d6f
Merge branch 'main' into pd-remove-mb-size-limit-checks
2021-07-01 20:01:32 +00:00
Raphael Taylor-Davies
5b00bc69e6
refactor: use Arc<Db> in lifecycle actions ( #1873 )
...
* refactor: use Arc<Db> in lifecycle actions
* chore: review feedback
2021-07-01 19:56:33 +00:00
Paul Dix
61917c107f
chore: add test for can_move on row count
2021-07-01 15:49:44 -04:00
Paul Dix
91f5478012
feat: remove MUB size threshold
...
Removes the MUB chunk close based on size. Also add a check in lifecycle policy to move if the MUB chunk crosses a default row count threshold.
2021-07-01 14:58:29 -04:00
Andrew Lamb
56c8c8d428
feat: Use separate executor for queries and compactions/moves ( #1870 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 16:47:50 +00:00
Raphael Taylor-Davies
f1a100c6ae
refactor: remove now unused chunk sort order ( #1854 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 16:39:45 +00:00
Andrew Lamb
07826306ed
fix: Always deduplicate data prior to insertion into the ReadBuffer ( #1863 )
...
* fix: mark ReadBuffer as always deduplicated
* fix: Use compact plans during merge
* docs: Update server/src/db/chunk.rs
Co-authored-by: Nga Tran <ntran@influxdata.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Nga Tran <ntran@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 16:23:37 +00:00
Jacob Marble
0779b0d9bd
feat: add gRPC listener for new write protocol ( #1842 )
...
* feat: add gRPC listener for new write protocol
* chore: clippy happy
* chore: lint
* chore: cargo fmt --all
* chore: cargo clippy
* chore: protobuf-lint
* chore: more formatting
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 16:15:12 +00:00
Marco Neumann
e1e3163752
refactor: rework DB init state machine
...
Since adding new features like "sequencer replay" or init retries would
make the current code too complex, a refactor is required:
Config:
The config struct now holds a `DatabaseState` which is a simple linear
state machine representing the different stages of the database init.
Init:
The init module now has a fixpoint-loop which looks at the state,
decides what to do based on it and repeats until either the DB is
initialized or an error occured. This also makes it easier to continue
the init process "in the middle", e.g. when the preserved catalog is
broken or the sequencer (e.g. Kafka) could not be reached.
2021-07-01 13:47:51 +02:00
Andrew Lamb
cfa06e1497
chore: Add query tests for compacted chunks ( #1861 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-30 20:59:29 +00:00
Raphael Taylor-Davies
99a15cd452
refactor: single lifecycle error enumeration ( #1859 )
...
* refactor: single lifecycle error enumeration
* fix: fmt
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2021-06-30 18:35:57 +00:00
Andrew Lamb
817a480cde
refactor: move lifecycle implementations out of db.rs and into their own modules ( #1858 )
...
* refactor: move lifecycle implementations out of db.rs and into their own modules
* fix: clippy
2021-06-30 17:24:04 +00:00
Andrew Lamb
9e1723620c
refactor: rename load_chunk_to_read_buffer to move_chunk_to_read_buffer ( #1857 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-30 16:53:18 +00:00
Marco Neumann
043890369f
refactor: make `MinMaxSequence` safer to use
2021-06-30 16:37:48 +02:00
kodiakhq[bot]
983062f6fa
Merge branch 'main' into crepererum/no_catalog_on_db_creation
2021-06-30 10:04:00 +00:00
Edd Robinson
2e430ac7f0
refactor: remove table name from read_filter schema
2021-06-30 09:50:53 +01:00
Edd Robinson
62f274cc1b
refactor: remove table name from column_values
2021-06-30 09:46:54 +01:00
Edd Robinson
5737c9d962
refactor: remove table name from column_names
2021-06-30 09:43:41 +01:00
Marco Neumann
c4e054f909
feat: do NOT load preserved catalogs on late DB creation
...
When a DB is created AFTER the server is initialized, then we can assume
it is a new DB (because the rules file did not exist beforehand). We
shall treat it as a new DB with no data and should not try to load some
leftover / stale / whatever preserved catalog for it. How this catalog
came into existence we do not know and it was certainly not properly
managed by IOx. So we error if there is a catalog.
Furthermore the old implementation was kinda broken since it loaded the
perserved catalog "in-sync" with the gRPC call that issued the DB
creation (we only have a delayed init concept for DBs that are loaded on
instance startup). In production that would very likely provoke nasty
timeouts.
On top of that this new behavior will also be somewhat more sane when we
think about sequencer (e.g. Kafka) replays. We certainly do not wanna do
any replays for newly created DBs.
TLDR: New behavior for DBs created via gRPC is "new empty DB". This does
NOT affect DBs loaded on instance startup (aka existing DBs).
2021-06-30 10:12:38 +02:00
Marco Neumann
58310abfee
refactor: de-duplicate code in `server::db::load`
2021-06-30 10:08:25 +02:00
Marco Neumann
9d10ac9f6a
refactor: write parquet files w/o holding the transaction lock
...
This allows to prepare writes per-tableXpartition before entering the
database-exclusive section that deals with catalog transactions.
Closes #1821 .
2021-06-29 14:23:06 +02:00
Marco Neumann
3ebb6a3037
refactor: do not capture txn-specific information in parquet files
...
This helps with #1821 .
2021-06-29 14:22:36 +02:00
Edd Robinson
a7198ea78b
refactor: use satisfies_predicate in apply_predicate
2021-06-29 11:58:28 +01:00
kodiakhq[bot]
eda9532eb2
Merge branch 'main' into crepererum/issue1821-cleanup-lock
2021-06-29 10:48:43 +00:00
Andrew Lamb
3ee96c4618
fix: Do not sequence local writes (avoid panic under load) ( #1826 )
...
* fix: Do not sequence local writes
* fix: Update server/src/db.rs
Co-authored-by: Edd Robinson <me@edd.io>
* fix: review comments
* fix: restore passing sequence information down to mutable buffer
* fix: store min/max times even when there are no sequence numbers
Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-29 10:39:37 +00:00
Marco Neumann
2cd5ce98be
refactor: do not pass locks around for catalog cleanup
2021-06-29 10:21:41 +02:00
Marco Neumann
730a23faa3
refactor: improve locking around the parquet file cleanup
...
Instead of (ab)using the transaction lock to prevent the cleanup job
from removing just-written parquet files, use a dedicated lock. This
will later allow us to write parquet files before starting a transaction
(i.e. w/o holding the transaction lock).
This will help with #1821 .
2021-06-29 10:20:03 +02:00
Edd Robinson
12ae9b012a
refactor: clarify intent of
2021-06-28 17:39:48 +01:00
Carol (Nichols || Goulding)
0f7c47d10e
fix: Limit the number of errors per sequenced entry we'll collect
2021-06-28 09:29:17 -04:00
Carol (Nichols || Goulding)
1e171e2e9a
refactor: Organize `use` statements and let rustfmt manage order
2021-06-28 09:29:15 -04:00
Carol (Nichols || Goulding)
f3a3a9b267
fix: Try to write all partition_writes even if one fails, collect all errors and report at the end
2021-06-28 09:24:23 -04:00
Carol (Nichols || Goulding)
4d2954ec1d
test: Write a failing tests for partition_writes being ignored after a failure
2021-06-28 09:24:23 -04:00
Marco Neumann
65e65412cc
refactor: move catalog loading code into its own module
2021-06-28 12:46:25 +02:00
Paul Dix
de236c5a6f
feat: update persistence windows to support late arrival less than 30 seconds
2021-06-25 15:34:11 -04:00
Paul Dix
435b4b6a94
feat: add persistence windows to partition and update on write
...
This brings the persistence windows into the catalog partition. It adds a helper method on TableBatch to get the min and max times for a given write. Finally, it adds this logic to the db to update persistence windows on every write while the partition write lock is being held.
2021-06-25 15:34:11 -04:00
Raphael Taylor-Davies
3046b1692c
chore: include table name in compaction log ( #1805 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-25 15:20:44 +00:00
Andrew Lamb
79446d45be
feat: Implement split_plans ( #1794 )
...
* feat: implement split plan / planner
* fix: Apply suggestions from code review
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
* fix: resolve merge conflicts
* fix: add values to panic
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
2021-06-24 18:38:00 +00:00
Raphael Taylor-Davies
297fc12db8
feat: compact chunks ( #1776 )
...
* feat: compact chunks
* chore: review feedback
* chore: clippy lints
* chore: document sort key algorithm
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-24 16:49:10 +00:00
Carol (Nichols || Goulding)
c0c1c3fd8e
refactor: Extract a struct to hold all the arguments needed to make a Db
2021-06-23 16:31:38 -04:00
Carol (Nichols || Goulding)
f903b6eca8
fix: Create WriteBuffer outside of commit_db so committing can't fail
2021-06-23 13:56:50 -04:00
Carol (Nichols || Goulding)
51e72c8821
refactor: Extract a function for creating a write buffer from database rules
2021-06-23 13:33:29 -04:00
Carol (Nichols || Goulding)
57aee2f770
fix: Remove TODO that's a TODON'T
2021-06-23 13:18:28 -04:00
Carol (Nichols || Goulding)
6ec3c03b0a
fix: Handle failure to create a Kafka producer rather than panicking
2021-06-23 10:51:23 -04:00
Carol (Nichols || Goulding)
c66f9e5aeb
feat: Write entries to Kafka when configured as the write buffer
2021-06-23 10:48:18 -04:00
Carol (Nichols || Goulding)
08f0696890
refactor: Extract a type alias for the trait's error type
2021-06-23 10:48:18 -04:00
Carol (Nichols || Goulding)
250b9362a6
fix: Pass the database to the KafkaBuffer to use as the topic
2021-06-23 10:48:18 -04:00
Carol (Nichols || Goulding)
93881da016
feat: Make Write Buffer store_entry async
...
In preparation for the Kafka write buffer implementation needing to call
async functions.
2021-06-23 10:48:18 -04:00
kodiakhq[bot]
59993e8b8f
Merge branch 'main' into crepererum/issue1623
2021-06-23 12:40:05 +00:00
Marco Neumann
c395409b51
feat: include UUIDv4 into parquet file names
...
Change schema from
```text
<server_id>/<db_name>/data/<part_key>/<chunk_id>/<table_name>.parquet
```
to
```text
<server_id>/<db_name>/data/<table_name>/<part_key>/<chunk_id>.<uuid>.parquet
```
So parquet files will NEVER be overwritten. This is especially helpful
when dealing with old catalog leftovers (i.e. a parquet file that
belonged to an old but wiped catalog). It also simplifies the reasoning
about file references in the future and follows what other dataset
formats are usually doing (i.e. never replace files).
Also use `ChunkAddr` where it makes sense.
2021-06-23 14:30:28 +02:00
kodiakhq[bot]
70817a474c
Merge branch 'main' into crepererum/issue1740-d
2021-06-23 12:29:54 +00:00
Raphael Taylor-Davies
5cd911c74a
fix: correct row count for object store chunks ( #1789 )
2021-06-23 12:06:49 +00:00
kodiakhq[bot]
d94a9ea94a
Merge branch 'main' into crepererum/better_served_uninit_error
2021-06-23 08:54:48 +00:00
Marco Neumann
cf55df68b5
refactor: remove some `Arc`s around the in-mem catalog
...
This is for #1740 .
2021-06-23 10:51:22 +02:00
Marco Neumann
39eac62d5d
fix: improve "server not initialized" error
...
We've reported "databases not loaded" which is a bit confusing for
router nodes, so change the description to "server not initialized".
2021-06-23 10:47:51 +02:00
Marco Neumann
d2be641864
refactor: make checkpointing easier to use
...
Don't mix commit+checkpoint in a single call so that the caller has to
reason about the error type and which of the two operations has failed.
Splitting it also makes it easier to create the correct checkpoint data.
2021-06-23 10:25:05 +02:00
Marco Neumann
4a961694ec
refactor: make caller sync mem<>OS view during catalog transactions
...
This is for #1740 . Greatly simplifies the integration of the persisted
catalog into the DB.
2021-06-23 10:25:05 +02:00
kodiakhq[bot]
c3dbe4c571
Merge branch 'main' into crepererum/fix_auto_wipe
2021-06-22 13:50:53 +00:00
Marco Neumann
a98b10745f
fix: auto-wipe should still be enabled
...
Auto-wipe broken catalogs should be enabled until #1522 is closed.
2021-06-22 15:45:32 +02:00
kodiakhq[bot]
b77bff449b
Merge branch 'main' into crepererum/issue1740-b
2021-06-22 13:27:26 +00:00
Raphael Taylor-Davies
01b0fdabb7
feat: make lifecycle partition-aware ( #1767 )
...
* feat: make lifecycle partition-aware
* chore: further docs
* chore: rename to maybe_free_memory
* chore: fix logical conflicts
* chore: ensure only drops unpersisted chunks
* chore: clippy lints
* chore: fix doc
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-22 09:24:15 +00:00
Marco Neumann
d1db0dfaeb
refactor: remove type parameter from preserved catalog
...
For #1740 .
2021-06-22 10:53:10 +02:00
kodiakhq[bot]
799e2caa34
Merge branch 'main' into crepererum/issue1740-a
2021-06-22 07:19:27 +00:00
Andrew Lamb
5362c7c924
feat: enable query deduplication ( #1762 )
2021-06-21 18:49:04 +00:00
Marco Neumann
ff60627500
refactor: make preserved catalog NOT own the in-mem catalog
...
Works towards #1740 .
2021-06-21 18:39:43 +02:00
Marco Neumann
881729bd23
refactor: make caller responsible to create checkpoint data
...
This decouples the in-mem and preserved catalog a bit and works
towards #1740 .
2021-06-21 18:33:23 +02:00
Edd Robinson
7e3df17896
test: update benchmarks
2021-06-21 15:29:23 +01:00
Edd Robinson
ac54320821
refactor: update server with new chunk API
2021-06-21 15:12:17 +01:00
Raphael Taylor-Davies
ea04ce40dc
feat: transactional lifecycle API ( #1753 )
...
* feat: transactional lifecycle API
* chore: remove redundant upgrade
* feat: lifecycle error propagation
* chore: add usage doctest
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-21 13:09:53 +00:00
Marco Neumann
0d7c3ff279
docs: fix typos
...
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-06-21 13:18:20 +02:00
Marco Neumann
4d3432a1e0
docs: improve `server::config` docs
2021-06-21 10:06:50 +02:00
Marco Neumann
29bbc9a384
refactor: recoverable DB init
...
- store read and parsed DB rules even the the catalog is broken
- allow wiping the catalog for DBs w/ init failures
- try to bring the DB back online after successful wipes
Note that this does yet allow to update rules for broken DBs or to fix
DBs w/ broken rule files. However this can be implemented easily on top
of this.
2021-06-21 09:31:23 +02:00
Marco Neumann
d17b5710a8
feat: add server functionality to wipe preserved catalogs
2021-06-21 09:31:23 +02:00
Marco Neumann
aba973a6e1
refactor: make catalog `wipe` a freestanding function
...
It does not interact with the `CatalogState` so users can call this
function without that type.
2021-06-21 09:31:23 +02:00
Andrew Lamb
258a6b1956
chore: remove more dead code ( #1760 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-18 21:28:22 +00:00
kodiakhq[bot]
1d8469951f
Merge branch 'main' into smaller-cache
2021-06-18 18:50:10 +00:00
Andrew Lamb
de67bd3efe
refactor: Remove PartitionChunk::table_schema ( #1756 )
...
* refactor: Remove PartitionChunk::table_schema
* docs: update comments
2021-06-18 16:13:16 +00:00
Andrew Lamb
9beeca3e7c
refactor: Unify schema handling in query crate ( #1755 )
...
* refactor: Unify schema handling in query crate
* fix: doclink
2021-06-18 14:10:57 +00:00
Andrew Lamb
1c13d676b4
refactor: Rename query::PartitionChunk --> query::QueryChunk ( #1754 )
2021-06-18 13:24:09 +00:00
Marko Mikulicic
b612c3af4e
chore: Switch to smaller cache dep
2021-06-18 09:43:28 +02:00
Andrew Lamb
ec43a87909
chore: Update itertools deps ( #1750 )
2021-06-17 17:56:44 +00:00
Raphael Taylor-Davies
f6dbc8d6f2
refactor: add ChunkAddr to describe location of chunk in catalog ( #1745 )
...
* refactor: add ChunkPath to describe location of chunk in catalog
* refactor: rename ChunkPath to ChunkAddr
* chore: further renames
* chore: even more renames
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-17 12:04:37 +00:00
Marco Neumann
87b2a1eaea
docs: add note about why we write parquets during transactions
2021-06-16 11:01:14 +02:00
Marco Neumann
e056d97cf6
test: always test transaction aborts
2021-06-16 11:01:14 +02:00
Marco Neumann
ec053f674c
feat: make DB catalog work w/ transaction aborts
2021-06-16 11:01:14 +02:00
Marco Neumann
caaf95c6ec
refactor: remove lock from `TestCatalogState`
2021-06-16 10:51:15 +02:00
Marco Neumann
c8c412f6fe
refactor: rework catalog state interface
...
This now allows not only for copy-based transaction handling but also
for eager exec and rollbacks. This will be useful to properly implement
transaction aborts for the "real" catalog.
2021-06-16 10:51:15 +02:00
Marco Neumann
2596de072e
feat: make sure DB catalog can correctly add and remove parquet files
...
Note that this does NOT yet allow it to correctly abort transactions.
2021-06-16 10:50:47 +02:00
Raphael Taylor-Davies
bf54ab51f2
refactor: split lifecycle into separate crate ( #1730 )
2021-06-15 15:57:47 +00:00
Raphael Taylor-Davies
f96e05d26a
refactor: traitify lifecycle policy ( #1729 )
...
* refactor: traitify lifecycle policy
* chore: docs
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-15 14:00:06 +00:00
Andrew Lamb
b756e09904
refactor: Rename parquet_file::Chunk --> ParquetChunk ( #1722 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-15 11:21:49 +00:00
kodiakhq[bot]
09f2ae1616
Merge branch 'main' into crepererum/issue1595
2021-06-15 11:12:01 +00:00
Marco Neumann
adc3a059ca
refactor: improve server background task logging
...
- rename `name` to `db_name`
- add `table_name` to error-detection logs
- use `Display` instead of `Debug` fmt for errors, which results in
nicer outputs and follows the rest of the stack
This is for #1725 .
2021-06-15 10:28:12 +02:00
Marco Neumann
dcfaa81969
feat: info-log server ID during init
...
Add a info log when the server ID is set. Because this is done where the
server ID is also stored, this automatically affects all ways to set it
(via CLI, via environment variable, via gRPC call).
Closes #1595 .
2021-06-15 10:09:53 +02:00
kodiakhq[bot]
19f684ee14
Merge branch 'main' into crepererum/issue1506
2021-06-15 07:36:49 +00:00
Marco Neumann
55fc5e564b
refactor: remove serverID and DB name args from catalog state
...
They are no longer required.
2021-06-15 09:35:41 +02:00
Marco Neumann
057c99d431
fix: tighten memory ordering
2021-06-14 17:34:57 +02:00
Marco Neumann
2ea24b6467
feat: allow to fail initializing a single DB
...
- keep errors encountered during DB init
- treat failed DB inits as existing DBs
- effectively poison failed DBs (there is no way to recover except by
restarting the server, yet)
2021-06-14 17:34:57 +02:00
Marco Neumann
0b5552f131
refactor: ensure that DBs are reserved before doing expensive IO
2021-06-14 17:34:57 +02:00
Marco Neumann
233235365a
refactor: de-couple DB rules commit from name reservation
...
This allows us to put DBs in a controlled error state when we try to
load rules from a file but the rules are somewhat broken.
2021-06-14 17:34:57 +02:00
Marco Neumann
318af9b801
feat: keep error that occurred during server init
2021-06-14 17:34:57 +02:00
Marco Neumann
bf0ba6ba6c
test: rename some server init tests to better reflect their nature
2021-06-14 17:34:57 +02:00
Marco Neumann
250ccdcdcd
refactor: use `IOxMetadata` instead of path parsing for parquet chunks
2021-06-14 16:24:50 +02:00
Marco Neumann
d51e7a127c
feat: include table name, partition key, and chunk ID in `IoxMetadata`
2021-06-14 16:24:50 +02:00
Andrew Lamb
a14e9ab27c
refactor: rename mutable_buffer::Chunk --> mutable_buffer::MBChunk ( #1711 )
...
* refactor: rename mutable_buffer::Chunk --> mutable_buffer::MBChunk
* fix: fmt
2021-06-14 13:35:20 +00:00
Andrew Lamb
856751deec
feat: Lifecycle manager unloads, rather than drop, chunks when soft limit is hit ( #1701 )
...
* feat: unload chunks from memory rather than dropping them
* docs: Update server/src/db/lifecycle.rs
Co-authored-by: Marco Neumann <marco@crepererum.net>
* docs: Update comment wording
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-14 13:14:39 +00:00
kodiakhq[bot]
fc1b5ea165
Merge branch 'main' into crepererum/parquet_metadata_wrapper
2021-06-14 11:20:39 +00:00
Andrew Lamb
9d1ca95a52
refactor: Rename catalog::Chunk --> catalog::CatalogChunk ( #1702 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-14 11:20:14 +00:00
Marco Neumann
518f7c6f15
refactor: wrap upstream parquet MD into struct + clean up interface
...
This prevents users from `parquet_file::metadata` to also depend on
`parquet` directly. Furthermore they don't need to important dozend of
functions and can instead just use `IoxParquetMetaData` directly.
2021-06-14 13:17:01 +02:00
Marco Neumann
665919786e
test: fix test
2021-06-14 10:52:23 +02:00
Marco Neumann
f4693e36c0
refactor: `catalog_checkpoint_interval` => `catalog_transactions_until_checkpoint`
2021-06-14 10:34:32 +02:00
Marco Neumann
898c638630
feat: wire up catalog checkpointing
...
Closes #1381 .
2021-06-14 10:08:32 +02:00
Marco Neumann
df866f72e0
refactor: store parquet metadata in chunk
...
This will be useful for #1381 .
At the moment we parse schema and stats eagerly and store them alongside
the parquet metadata in memory. Technically this is not required since
this is basically duplicate data. In the future we might trade-off some
of this memory against CPU consumption by parsing schema and stats on
demand.
2021-06-14 10:08:31 +02:00
Edd Robinson
ff19beb0ad
refactor: export rb chunk as RBChunk
2021-06-11 18:33:10 +01:00
kodiakhq[bot]
71e2a8fbaa
Merge branch 'main' into crepererum/inline_parquet_table_struct
2021-06-11 11:22:48 +00:00
Andrew Lamb
0cbe74dbde
fix: persistence to parquet by swapping order of arguments ( #1687 )
...
* fix: fix order of arguments
* test: for persistence
2021-06-11 10:55:40 +00:00
Marco Neumann
f8a518bbed
refactor: inline `Table` into `parquet_file::chunk::Chunk`
...
Note that the resulting size estimations are different because we were
double-counting `Table`. `mem::size_of::<Self>()` is recursive for
non-boxed types since the child will be part of the parent structure.
Issue: #1295 .
2021-06-11 11:54:31 +02:00
Raphael Taylor-Davies
11b25b3aaf
refactor: swap order of partition and table in in-memory catalog ( #1678 )
...
* refactor: swap order of partition and table in in-memory catalog
* chore: review feedback
* chore: validate panic message
* chore: review feedback
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-10 16:40:30 +00:00
Marco Neumann
13bb290a7c
chore: enforce `clippy::future_not_send` for `server` + top-level crate ( #1679 )
...
* chore: enforce `clippy::future_not_send` for `server`
* chore: enforce `clippy::future_not_send` for top-level crate
2021-06-10 15:01:12 +00:00
Marco Neumann
294c304491
feat: impl catalog checkpointing infrastructure
...
This implements a way to add checkpoints to the preserved catalog and
speed up replay.
Note: This leaves the "hook it up into the actual DB" for a future PR.
Issue: #1381 .
2021-06-10 15:42:21 +02:00
kodiakhq[bot]
3ba27bdbd9
Merge branch 'main' into crepererum/clippy_future_not_send_part1
2021-06-10 07:19:31 +00:00
kodiakhq[bot]
5f863a59fd
Merge branch 'main' into crepererum/extract_server_init
2021-06-10 07:14:57 +00:00
kodiakhq[bot]
44d8fb9472
Merge branch 'main' into crepererum/clippy_future_not_send_part1
2021-06-10 07:10:11 +00:00
kodiakhq[bot]
eed73a30c5
Merge branch 'main' into ntran/dedup_within_chunk
2021-06-09 18:19:17 +00:00
Nga Tran
c1c58018fc
refactor: address review comments
2021-06-09 14:17:47 -04:00
Marco Neumann
4fe2d7af9c
chore: enforce `clippy::future_not_send` for `parquet_file`
2021-06-09 18:18:27 +02:00
Marco Neumann
d9c38dfe88
refactor: extract server init code
...
This prepares for #1624 , so the end results looks a bit cleaner.
2021-06-09 16:53:11 +02:00
kodiakhq[bot]
b49abf9b02
Merge branch 'main' into crepererum/lazy_db_loading
2021-06-09 07:23:35 +00:00
Raphael Taylor-Davies
07c4277ca7
refactor: schema merge to give more control over field merging ( #1653 )
...
* refactor: schema merge to give more control over field merging
* chore: review feedback
2021-06-09 06:30:45 +00:00
Nga Tran
3e10351538
test: add tests for the sort plan
2021-06-08 21:40:46 -04:00
Nga Tran
68e3a2121f
feat: add SortExec
2021-06-08 15:04:31 -04:00
Andrew Lamb
fd8a87484e
feat: Hook up chunk grouping into provider
2021-06-08 14:42:37 -04:00
Nga Tran
edbf1b7d5e
Merge branch 'main' into ntran/dedup_within_chunk
2021-06-08 13:18:40 -04:00
Nga Tran
40cb4f741f
feat: initial implementaton
2021-06-08 13:17:36 -04:00
Carol (Nichols || Goulding)
50a69a7f18
fix: Don't mention Kafka unless it's absolutely necessary
2021-06-07 13:01:04 -04:00
Carol (Nichols || Goulding)
2bb2c4ba47
docs: Add some doc comments about the WriteBuffer trait
2021-06-07 11:22:33 -04:00
Carol (Nichols || Goulding)
a8a4a5f29d
fix: Return the Sequence type from the write buffer, not vague WriteMetadata
2021-06-07 11:15:46 -04:00
Carol (Nichols || Goulding)
a63c12acfb
fix: Remove references to Kafka from db tests
2021-06-07 10:58:34 -04:00
Carol (Nichols || Goulding)
45a3547978
refactor: Take ownership of Entry and transform into SequencedEntry
...
Rather than cloning the data. The Entry is no longer used after this
point.
2021-06-07 09:56:23 -04:00
Carol (Nichols || Goulding)
8ab8544d4a
feat: Wire up a WriteBuffer trait implemented by a mock
...
With an unimplemented where the Kafka implementation will be.
2021-06-07 09:56:23 -04:00
Carol (Nichols || Goulding)
2418e91001
feat: Add a DatabaseRule field for an optional Kafka write buffer connection string
2021-06-07 09:56:23 -04:00
Carol (Nichols || Goulding)
b5fac8cd59
refactor: Rearrange database rule checks and SequencedEntry construction
...
There are going to be more cases here when the Kafka write buffer is
introduced that affect how the SequencedEntry is created and whether a
database being immutable is an error or not.
2021-06-07 09:37:22 -04:00
Carol (Nichols || Goulding)
7ff2c5c951
refactor: Rearrange reading of db rules and locking
2021-06-07 09:37:22 -04:00
Carol (Nichols || Goulding)
0139167c98
refactor: Extract a Sequence type
...
A sequencer id and sequence number should always go together, so convey
that with a type. Also, this removes lots of repetition of "sequence" 😅
2021-06-07 09:37:22 -04:00
Carol (Nichols || Goulding)
4d6569583e
fix: Partially restore SequencedEntry as Entry+sequencer_id+sequence_num
2021-06-04 14:40:19 -04:00
Carol (Nichols || Goulding)
f4a9a5ae56
fix: Remove write buffer
2021-06-04 14:40:17 -04:00
Andrew Lamb
42f26b609b
refactor: Move `query_tests` and `server_benchmarks` into their own crate --> smaller `server` ( #1628 )
...
* refactor: Separate query_tests into its own crate
* fix: references
* refactor: break out server benchmarks
* fix: Update query_tests/src/lib.rs
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-06-04 17:31:19 +00:00
Andrew Lamb
ff3215e6a9
feat: Implement Chunk Pruning ( #1567 )
2021-06-04 13:05:22 +00:00
Marco Neumann
195644da04
docs: document semaphore design in server
2021-06-04 12:52:13 +02:00
kodiakhq[bot]
402ef0ebde
Merge branch 'main' into crepererum/limit_cleanup_amount
2021-06-04 10:47:33 +00:00
Marco Neumann
e06d65bb2a
refactor: migrate "DBs initialized" RPC to "server status"
2021-06-04 11:33:41 +02:00
Marco Neumann
b30d7e2821
feat: move DB loading into background worker
...
Before this change we loaded databases eagerly when a serverID was
passed on startup BEFORE starting up the gRPC server. Since loading
(esp. at its current state without checkpoints and with too many small
parquet files) can take very long, K8s thinks IOx is unhealthy. With
this change we are now loading databases in the server background worker
once a serverID is available. Until then we block all DB-related
interactions including adding new databases (since without inspecting
the object store there is now way we can check if the DB already
exists).
Furthermore we now load database no matter if the serverID was passed on
startup (via CLI or environment variable) or was set later via gRPC
call. Before this change the latter case was somewhat forgotten.
2021-06-04 11:33:41 +02:00
Raphael Taylor-Davies
696ebdc4db
feat: recover failed lifecycle actions ( #1099 ) ( #1592 )
...
* feat: recover failed lifecycle actions (#1099 )
* chore: review feedback
* chore: fix logical conflicts
2021-06-03 15:46:33 +00:00
Marco Neumann
91df8a30e7
feat: limit number of files during storage cleanup
...
Since the number of parquet files can potentially be unbound (aka very
very large) and we do not want to hold the transaction lock for too
long and also want to limit memory consumption of the cleanup routine,
let's limit the number of files that we collect for cleanup.
2021-06-03 17:43:11 +02:00
Edd Robinson
e583e1fbda
Merge branch 'main' into er/feat/read_buffer/float_int
2021-06-03 14:48:36 +01:00
Andrew Lamb
eaa5b75437
refactor: Make it clear only partition_key and table name pruning happens in catalog ( #1608 )
...
* refactor: Make it clear only partition_key and table name pruning is happening in catalog
* fix: clippy
* fix: Update server/src/db/catalog.rs
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* refactor: use TableNameFilter enum rather than Option
* docs: Add docstring to the `From` implementation
* fix: Update server/src/db/catalog/partition.rs
Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Edd Robinson <me@edd.io>
2021-06-03 13:09:09 +00:00
Edd Robinson
65bfa4dd10
test: fix tests
2021-06-03 12:32:40 +01:00
Marco Neumann
27b9477aa4
test: fix flaky test
2021-06-03 11:23:29 +02:00
Marco Neumann
7b2663a38a
test: make tests faster
2021-06-03 11:23:29 +02:00
Marco Neumann
3c9fd81697
refactor: split overlong line
2021-06-03 11:23:29 +02:00
Marco Neumann
bbd73e59be
feat: jitter background clean-up job + wait on first job
2021-06-03 11:23:29 +02:00
Marco Neumann
ce412dbce2
fix: use structured error for background cleanup task reporting
2021-06-03 11:23:29 +02:00
kodiakhq[bot]
1c764c47a2
Merge branch 'main' into ntran/deduplicate
2021-06-02 17:42:36 +00:00
Nga Tran
40bd932fff
refactor: address Andrew's comment
2021-06-02 13:41:46 -04:00
Andrew Lamb
32c6ed1f34
refactor: More cleanup related to multi-table chunks ( #1604 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-02 17:00:23 +00:00
Nga Tran
e7a97f3ac1
test: merge main and add more tests for deduplicate work
2021-06-02 12:00:40 -04:00
Marco Neumann
80f4d84ce8
refactor: isolate DB loading and streamline error handling
...
There are not functional changes here (except that errors look slightly
different) but it should allow for an easier move of the DB loading into
a delayed task.
2021-06-02 13:42:24 +02:00
kodiakhq[bot]
0e09b20ca8
Merge branch 'main' into crepererum/issue1513-b
2021-06-02 07:08:29 +00:00
Nga Tran
40df7def0e
test: ttests for the deduplicate work
2021-06-01 18:06:35 -04:00
Nga Tran
60ad929721
refactor: add macro tto compare output of explains
2021-06-01 16:39:14 -04:00
Nga Tran
aa867601e5
chore: merge main with DF plan display fix
2021-06-01 16:17:41 -04:00
Nga Tran
0ad258bab3
refactor: remove comments since the time function predicates are pushed down after the recent constant folding fix in DF
2021-06-01 16:00:09 -04:00
Andrew Lamb
d8fbb7b410
refactor: Remove last vestiges of multi-table chunks from PartitionChunk API ( #1588 )
...
* refactor: Remove last vestiges of multi-table chunks from PartitionChunk API
* fix: remove test that can no longer fail
* fix: update tests + code review comments
* fix: clippy
* fix: clippy
* fix: restore test_measurement_fields_error test
2021-06-01 16:12:33 +00:00
Marco Neumann
714a082f3a
refactor: remove chunk state struct nesting
...
Inline structs that are only used for enum variants.
2021-06-01 18:00:16 +02:00
Marco Neumann
5a4562f1c9
test: test `Chunk::new_open`
2021-06-01 18:00:16 +02:00
Marco Neumann
f45e61f9ef
test: test chunk lifecycle action handling
2021-06-01 18:00:16 +02:00
Marco Neumann
50636ca011
refactor: rename `Chunk::{set_closed => freeze}` and add tests
...
This make it clearer what is actually happening. Furthermore, freezing
frozen chunks is now a no-op.
2021-06-01 18:00:16 +02:00
kodiakhq[bot]
aafc8c4746
Merge branch 'main' into crepererum/fix_catalog_replay_logging
2021-06-01 15:59:42 +00:00
Marco Neumann
98c2963c28
fix: fix confusing log message during catalog replay
2021-06-01 17:58:38 +02:00
Andrew Lamb
d3711a5591
refactor: Use ParquetExec from DataFusion to read parquet files ( #1580 )
...
* refactor: use ParquetExec to read parquet files
* fix: test
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-01 14:44:07 +00:00
Andrew Lamb
64328dcf1c
feat: cache schema on catalog chunks too ( #1575 )
2021-06-01 12:42:46 +00:00
kodiakhq[bot]
4e7b754098
Merge branch 'main' into crepererum/issue1513-a
2021-06-01 08:23:01 +00:00
Raphael Taylor-Davies
6e07a735bd
feat: don't recompute chunk size on every iteration ( #1586 )
2021-05-31 16:19:11 +00:00
Andrew Lamb
73cedd2f88
chore: remove unused dependency ( #1587 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-31 14:22:11 +00:00
Marco Neumann
991314ebe8
docs: fix `set_writing_to_object_store` docstring
2021-05-31 15:44:29 +02:00
Marco Neumann
996ce833f1
chore: fix formatting
2021-05-31 15:42:13 +02:00
Andrew Lamb
162a808a8d
refactor: Remove `table_name` from PartitionChunk API ( #1584 )
...
* refactor: Remove `table_name` from PartitionChunk API
* fix: clippy
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-31 12:05:09 +00:00
Marco Neumann
c658a627ed
refactor: change state structure for chunks
...
This is the first step towards #1513 . However it leaves all consumers
bascially unchanged and also does NOT touch state transitions. These
changes will follow in upcoming PRs.
2021-05-31 11:19:01 +02:00
Raphael Taylor-Davies
db432de137
feat: add distinct count to StatValues ( #1568 )
2021-05-28 17:41:34 +00:00
Raphael Taylor-Davies
d8f19348bf
feat: per-column dictionaries in MUB ( #1570 )
...
* feat: per-column dictionaries in MUB
* chore: fmt
* refactor: remove chunk-level dictionary
* chore: remove redundant sort
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-28 13:51:56 +00:00
kodiakhq[bot]
d70d7a63a2
Merge branch 'main' into crepererum/remove_invalid_chunk_state
2021-05-28 10:20:05 +00:00
Andrew Lamb
c6f42cf304
refactor: Remove unnecessary code ( #1573 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-28 10:12:47 +00:00
Marco Neumann
5cfede51f2
refactor: remove `ChunkState::Invalid`
...
This seems to only exist to fight the borrow checker and we can actually
live without it.
2021-05-28 11:16:06 +02:00
Andrew Lamb
3ae44a0375
refactor: Chunks can have at most one object store path ( #1574 )
...
* refactor: Chunk can have at most one path
* fix: update tests
2021-05-27 19:52:09 +00:00
Nga Tran
62147ff0d4
feat: add more explain tests
2021-05-27 12:19:41 -04:00
Andrew Lamb
f3bec93ef1
feat: Cache TableSummary in Catalog rather than computing it on demand ( #1569 )
...
* feat: Cache `TableSummary` in catalog Chunks
* refactor: use consistent table summary
2021-05-27 16:03:05 +00:00
Raphael Taylor-Davies
5d342d7779
feat: associate tracker with lifecycle action ( #1099 ) ( #1556 )
...
* feat: associate tracker with lifecycle action (#1099 )
* chore: docs
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-27 10:47:35 +00:00
Raphael Taylor-Davies
792bff07d1
feat: only store ChunkSnapshot in Closed state ( #1560 )
...
* feat: only store ChunkSnapshot in Closed state
* chore: review feedback
* feat: record MUB size as closed size
* chore: document column ordering assumption
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-27 10:36:47 +00:00
Raphael Taylor-Davies
4fcc04e6c9
chore: enable arrow prettyprint feature ( #1566 )
2021-05-27 10:28:14 +00:00
kodiakhq[bot]
efe077da8f
Merge branch 'main' into crepererum/issue1313
2021-05-26 14:46:18 +00:00
Marco Neumann
24ec1a472e
fix: do NOT delete parquet files that are reachable by time travel
2021-05-26 12:38:54 +02:00
Raphael Taylor-Davies
c03b8a3963
refactor: remove tables from ChunkSnapshot ( #1295 ) ( #1558 )
2021-05-26 10:37:40 +00:00
Marco Neumann
1fb6af2364
refactor: split DB background loop into lifecycle and cleanup
...
This should prevent one from blocking / stalling the other.
2021-05-26 11:09:30 +02:00
Marco Neumann
5983336366
refactor: rename `parquet_file::{utils => test_utils}`
2021-05-26 11:09:29 +02:00
Marco Neumann
dd6bbeec42
feat: add background task to clean up OS
...
Closes #1313 .
2021-05-26 11:04:56 +02:00
Marco Neumann
cc78b5317d
feat: add method to get all parquet files from catalog state
2021-05-26 11:02:40 +02:00
kodiakhq[bot]
166851d952
Merge branch 'main' into crepererum/in_file_metadata
2021-05-26 07:39:53 +00:00
Marko Mikulicic
bae5e5aee3
feat: Add simpler RoutingConfig
2021-05-25 21:51:54 +02:00
Marco Neumann
19a2733d30
feat: preserve transaction metadata in parquets
2021-05-25 09:56:12 +02:00
Marco Neumann
fe8e6301fe
refactor: move `read_schema_from_parquet_metadata` back to `parquet_file::metadata`
...
Let us pool all metadata handling in a single module, which makes it
easier to review.
2021-05-25 09:37:53 +02:00
Marko Mikulicic
a4215f0a56
fix: Fix 'acive' jemalloc stat misreporting
2021-05-25 02:55:27 +02:00
Nga Tran
018e1e0246
chore: add a comment to trick github to check semantic
2021-05-24 17:25:14 -04:00
Nga Tran
40a5d7d4ba
chore: Merge branch 'main' into tran/pushdown_parquet
2021-05-24 16:31:06 -04:00
Nga Tran
e72ae81a8e
feat: support predicate pushdown for parquet files
2021-05-24 16:22:52 -04:00
kodiakhq[bot]
db96286ed7
Merge branch 'main' into er/refactor/scalar_comp
2021-05-24 17:02:14 +00:00
Andrew Lamb
c464ffadad
refactor: remove special case timestamp_range in parquet chunk ( #1543 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-24 16:19:44 +00:00
Andrew Lamb
14ba25f86d
chore: Update datafusion and use released version of arrow crates ( #1546 )
...
* chore: Update datafusion and use released version of arrow crate
* fix: Update for change in API
2021-05-24 15:37:22 +00:00
Edd Robinson
abe64c6edc
test: uncomment tests to fix
2021-05-24 16:18:53 +01:00
Carol (Nichols || Goulding)
5c5064bdac
fix: Set default line timestamp and default partition time to same value ( #1512 )
...
* refactor: Rearrange to allow injection of the current time in tests
* test: Failing test showing a point can be in the wrong partition
* fix: Only get the default time once per ShardedEntry creation, in router
2021-05-24 14:55:11 +00:00
Andrew Lamb
27e5b8fabf
refactor: Remove multiple table support from Parquet Chunk ( #1541 )
2021-05-24 08:40:31 -04:00
Nga Tran
1f70d1f9c8
chore: remove a couple more comments
2021-05-21 17:06:53 -04:00
Nga Tran
f113abacb5
feat: more unit & e2e tests plus cleanup and addressing review comments of Andrew and Edd
2021-05-21 16:48:43 -04:00
Nga Tran
1093542578
fix: now all tests pass. Next step is cleaning up and addressing review comments
2021-05-21 13:29:20 -04:00
Nga Tran
784ef88fcd
chore: merge main to branch and add more tests that expose a wrong result bug on unsigned int
2021-05-21 12:38:06 -04:00
Nga Tran
93afc9c213
chore: more tests
2021-05-21 11:39:12 -04:00
Raphael Taylor-Davies
5b619733d9
refactor: split lifecycle tracking from chunk state ( #1361 ) ( #1099 ) ( #1397 )
...
* refactor: split lifecycle tracking from chunk state (#1361 ) (#1099 )
* chore: namespace internal errors
* chore: fix logical conflict
* chore: don't remove moving chunk size metric
2021-05-21 09:27:44 +00:00
Nga Tran
e44a3a87db
feat: fnow predicate is actuallu pushed down to RUB but there are bugs and not working yet
2021-05-20 16:56:15 -04:00
kodiakhq[bot]
f028a356f4
Merge branch 'main' into crepererum/issue1382-c
2021-05-20 15:51:47 +00:00
kodiakhq[bot]
aac00d4fa6
Merge branch 'main' into crepererum/remove_snapshotting
2021-05-20 14:14:58 +00:00
Marco Neumann
0e37d500eb
feat: remove snapshot feature
...
The parquet files produced by this code path are only semi-specified and
will miss many important metadata aspects that we will require for data
lineage.
2021-05-20 14:59:04 +02:00
Marko Mikulicic
462a5590c6
fix: fmt
2021-05-20 14:58:50 +02:00
Marko Mikulicic
c908cf0f98
fix: review suggestion
...
Co-authored-by: Edd Robinson <me@edd.io>
2021-05-20 14:40:02 +02:00
Marko Mikulicic
aa90329c1f
feat: Add remote_template for simpler remote configuration
2021-05-20 12:45:08 +02:00
Marco Neumann
7e55544eef
fix: correctly track chunk ID counter during catalog replay
2021-05-20 10:32:40 +02:00
Marco Neumann
93251f22c7
feat: read perserved catalog during DB startup
...
Closes #1382 .
2021-05-20 10:28:31 +02:00
Marko Mikulicic
91d7189e6d
feat: Log cached connections
2021-05-20 10:27:20 +02:00
Raphael Taylor-Davies
37880ee89a
refactor: store chunk IDs only in catalog ( #1521 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-20 04:07:14 +00:00
Nga Tran
00dacb5394
feat: add tests to verify the correctness as well as the explain of the plan
2021-05-19 17:31:16 -04:00
Nga Tran
11561111d5
chore: merge main to branch
2021-05-19 15:11:15 -04:00
Nga Tran
087d61f229
feat: Part 1 of predicate push down - Send predicates to MUB, RUB, and Parquet File. Note that MUB has not handled predicates yet
2021-05-19 13:59:51 -04:00
Marko Mikulicic
ce2f8351be
fix: Cache outbound gRPC connections
2021-05-19 18:28:45 +02:00
Marco Neumann
8db26485a4
refactor: empty transaction during catalog creation
...
That involves some refactoring which we are going to need anyway for
hooking up the "read" path of the catalog into the DB startup, namely:
- make `Db::new` require a preserved catalog
- introduce a helper function that can provide that
- as a consequence, all test-creations of a Db are now async
This prepares for #1382 .
2021-05-18 17:42:07 +02:00
kodiakhq[bot]
c3cc58b2ff
Merge branch 'main' into crepererum/issue1382
2021-05-17 17:57:26 +00:00
Raphael Taylor-Davies
4f0e46bcd5
refactor: track ingest metrics in one place ( #1503 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-17 16:25:01 +00:00
Marco Neumann
18f0a7f614
docs: reference open issue
2021-05-17 14:01:51 +02:00
Marco Neumann
cdf0ada6a6
test: test preserved catalog <-> Db write wiring
2021-05-17 13:57:31 +02:00
Raphael Taylor-Davies
91a45fd380
feat: simplify shutdown ( #1502 )
...
* feat: simplify shutdown
* chore: fix lint
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-17 11:50:14 +00:00
Marco Neumann
4299371cf2
refactor: remove some code
2021-05-17 12:32:48 +02:00
Marco Neumann
840c11dab2
feat: wire up catalog preservation write path
...
Required a bit of refactoring:
- Add an extra layer between DB an catalog which is the "preserved
catalog" wrapper. This is required to make the ownership model
somewhat sane, because during the read operations the "preserved
catalog" is going to act on the in-mem catalog.
- Move "parquet file written" logic into binding `preserved catalog <->
catalog state`, so we have a single place where new parquet files are
announced. For now this only works for chunks that are already known
(i.e. the writing->written transation when coming from read buffer),
however in the next PR this will be extended to also handle totally
new parquet files during transaction playback.
**NOTE: This does NOT include the read path yet!**
Issue: #1382 .
2021-05-17 11:33:22 +02:00
Andrew Lamb
07db4932ee
refactor: rename data_types/src/chunk.rs -> data_types/src/chunk_metadata.rs ( #1500 )
2021-05-15 10:18:01 +00:00
Raphael Taylor-Davies
f9178dbb5f
feat: push metrics into catalog ( #1488 )
...
* feat: push metrics into catalog
* chore: minor cleanup
* fix: include db labels in chunk metric domains
* chore: fmt
* fix: don't allow dropping moving chunks
* chore: further tweaks
* chore: review feedback
* feat: use new_unregistered() for metric instruments instead of default
* chore: use &[KeyValue] instead of &Vec<KeyValue>
* refactor: make GauageValue non default constructible
2021-05-14 17:37:39 +00:00
kodiakhq[bot]
fdc8461c7f
Merge branch 'main' into cn/wb-clock
2021-05-14 13:00:06 +00:00
Marko Mikulicic
35c2ca17fc
fix: Add ingest_fields_total
...
ingest_lines_total count lines (which apparently are the same as points, quite confusingly)
No yaks harmed in the making of this PR.
(NOTE: the code around metric, especially dealing with happy and error paths is very painful;
to be done in another PR)
2021-05-13 17:55:07 +02:00
Nga Tran
9583636748
feat: we now can read parquet files form all kind of object stores
2021-05-12 18:05:34 -04:00
Carol (Nichols || Goulding)
8be95856ab
test: Add a test with multiple threads using a process clock
2021-05-12 13:31:26 -04:00
Carol (Nichols || Goulding)
cecb4afc58
docs: Add some documentation on the assumptions around this design
2021-05-12 13:31:26 -04:00
Carol (Nichols || Goulding)
b3fb61a0b3
refactor: Rename now_nanos to system_clock_now for clarity
2021-05-12 13:31:26 -04:00
Carol (Nichols || Goulding)
425aacc391
refactor: Extract ProcessClock into its own type
2021-05-12 13:31:26 -04:00
Carol (Nichols || Goulding)
b749353d21
refactor: Use a compare_exchange loop instead of Arc Mutex
2021-05-12 10:58:08 -04:00
Carol (Nichols || Goulding)
5dfd152549
test: Use the now_nanos helper function more in tests
2021-05-12 10:58:08 -04:00
Carol (Nichols || Goulding)
f28c9ae04c
docs: Add unit and semantic information about the process clock
2021-05-12 10:58:08 -04:00
Carol (Nichols || Goulding)
513d4731be
feat: Add a process clock to Db and use it for Sequenced Entries
...
Connects to #1157 .
2021-05-12 10:58:06 -04:00
Carol (Nichols || Goulding)
f98807936d
test: Some tests don't call await, so they don't need to be async
2021-05-12 10:57:05 -04:00
Edd Robinson
696e4e0cfd
fix: ensure metrics not overwriting
2021-05-11 20:57:31 +01:00
Raphael Taylor-Davies
4409d2c8af
feat: instrument catalog locks ( #1464 )
...
* feat: instrument catalog locks (#1355 )
* chore: add metrics test
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-11 18:59:11 +00:00
Andrew Lamb
9d0c3a2b1a
refactor: Remove multi-table per chunk code in MUB ( #1471 )
...
* refactor: Remove multi-table per chunk code in MUB
* fix: clippy
* fix: bench build
* fix: merge conflicts
2021-05-11 17:49:07 +00:00
Raphael Taylor-Davies
d1da954fe4
feat: don't store encoded strings twice in RLE dictionaries ( #1469 )
2021-05-11 15:22:25 +00:00
Edd Robinson
3622a92c8b
feat: wire in rb column metrics
2021-05-11 13:00:52 +01:00
Marco Neumann
795f5bfcb7
refactor: make `StatValues::{min,max}` optional + handle NaNs
...
This will allow us to:
- handle all-NULL columns correctly
- be in-line with Parquet (where min/max are optional)
- handle NaNs at least somewhat sane (they do not "poison" stats
anymore)
2021-05-10 17:12:25 +02:00
Andrew Lamb
f037c1281a
feat: Calculate all system tables "on demand" ( #1452 )
...
* feat: compute system.columns table on demand
* feat: compute system.chunk_columns on demand
* feat: compute system.operations on demand
* fix: fixup schemas
* fix: Log errors
* fix: clippy
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-10 14:43:55 +00:00
Marko Mikulicic
9f5350a6c5
fix: Load only databases for which a config exists
...
Closes #1450
2021-05-10 13:14:22 +02:00
Nga Tran
c6b933eb63
chore: merge main to branch
2021-05-07 18:40:17 -04:00
Nga Tran
971500681f
refactor: address Andrew's and Carol's comment
2021-05-07 17:33:19 -04:00
Nga Tran
ba015ee4df
refactor: clean up and add comments
2021-05-07 09:31:41 -04:00
Edd Robinson
eae3fec571
feat: wire up regex UDF as predicate filter expr
2021-05-07 13:44:51 +01:00
Andrew Lamb
b5ea71f45f
feat: Expose the storage usage for each column in system.chunk_columns ( #1441 )
...
* feat: Expose the storage usage for each column in system.chunk_columns
* fix: fixup logical conflicts
* refactor: move coalsce logic into the read buffer
* fix: Update system_tables to not use coalese
* fix: Improve comments
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-05-07 12:36:49 +00:00
Raphael Taylor-Davies
9320f59de0
feat: add shard sink indirection ( #1447 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-07 11:04:51 +00:00
Andrew Lamb
d7253c72c0
feat: Only calculate system.chunks table "on demand" ( #1446 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-07 10:26:56 +00:00
Carol (Nichols || Goulding)
febc1538ff
chore: Update Rust version ( #1445 )
...
* chore: Update Rust version
* refactor: Make struct constructor field orderings consistent
Sometimes I changed the struct definition, sometimes changed the struct
construction instance, depending on consistency with code around each
(other similar structs, function argument orders, etc)
More info: https://rust-lang.github.io/rust-clippy/master/index.html#inconsistent_struct_constructor
* refactor: Use flatten where appropriate
One instance is a false positive with a clippy bug.
More info:
- https://rust-lang.github.io/rust-clippy/master/index.html#filter_map_identity
- https://rust-lang.github.io/rust-clippy/master/index.html#manual_flatten
* refactor: Use Option map instead of match
More info: https://rust-lang.github.io/rust-clippy/master/index.html#manual_map
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-06 22:07:10 +00:00
Nga Tran
55bf848bd2
feat: Now we can query directly from files in object store
2021-05-06 18:02:17 -04:00
Raphael Taylor-Davies
7f6b11266d
feat: instrument catalog locks ( #1355 ) ( #1439 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-06 17:09:52 +00:00
Raphael Taylor-Davies
44de42906f
refactor: use Arc<str> instead of Arc<String> ( #1442 )
2021-05-06 17:05:08 +00:00
Raphael Taylor-Davies
49c0b8b90c
feat: pull-based metrics ( #1355 ) ( #1414 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-06 15:54:30 +00:00
Raphael Taylor-Davies
216903a949
refactor: move protobuf conversion logic to generated_types ( #1437 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-06 15:49:27 +00:00
Andrew Lamb
884baf7329
feat: add column_type and influxdb_column_type, remove row_count from system.columns ( #1415 )
...
* feat: add column_type and influxdb_column_type, remove row_count from system.columns
* fix: update tests
* fix: more test update
* fix: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* fix: fmt
* fix: copy/paste type conversion to avoid cross dependency between data_types and internal_types
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-05-06 12:59:30 +00:00
Marko Mikulicic
578dc0db25
feat: Add more logs to shed light on the curious incident with missing metrics in the nighttime
2021-05-06 14:42:48 +02:00
Raphael Taylor-Davies
10f89a3e8d
refactor: split entry out into separate crate ( #1428 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-06 11:36:23 +00:00
Nga Tran
a5c92fae8a
chore: merge main to branch
2021-05-05 13:48:42 -04:00
Raphael Taylor-Davies
411cf134e9
refactor: explode arrow_deps ( #1425 )
...
* refactor: explode arrow_deps
* chore: workaround doctest bug
2021-05-05 16:59:12 +00:00
kodiakhq[bot]
4395ede244
Merge branch 'main' into debug-chunk-metrics
2021-05-05 15:43:32 +00:00