influxdb

Commit Graph

Author	SHA1	Message	Date
Raphael Taylor-Davies	1d00fa2fd8	refactor: track memory metrics in catalog (#1995 ) * refactor: track memory metrics in catalog * chore: update comment	2021-07-14 16:23:00 +00:00
Carol (Nichols \|\| Goulding)	8070065e2f	fix: Change RUB chunk table_summaries to table_summary Because chunks now have only one table. Connects to #1718, #1613, #1295	2021-07-14 11:18:02 -04:00
Carol (Nichols \|\| Goulding)	649b467adb	fix: CatalogChunk no longer needs to record a write when created from a MUB chunk	2021-07-14 10:28:12 -04:00
Carol (Nichols \|\| Goulding)	7ccbab8c90	feat: Make a TableSummaryAndTimes to use to slowly replace TableSummary And use TableSummaryAndTimes with the mutable buffer chunks when turning them into catalog chunks. It's proving too big to switch over everything using TableSummary at once, so this will let us switch over more incrementally.	2021-07-14 10:28:12 -04:00
Edd Robinson	4dedb657f2	Merge branch 'main' into alamb/go_go_go_go	2021-07-14 14:04:13 +01:00
Raphael Taylor-Davies	f1c1620c84	feat: make persistence windows interface harder to use incorrectly (#1977 ) * feat: make persistence windows interface harder to use incorrectly * chore: review feedback * chore: update comment Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-14 13:03:18 +00:00
Edd Robinson	0e5276ed20	Merge branch 'main' into alamb/go_go_go_go	2021-07-14 13:56:35 +01:00
Marco Neumann	9cb9ae0874	chore: move write buffer into its own crate	2021-07-14 14:09:18 +02:00
Marko Mikulicic	d427fed9dc	fix: Remove bad max.request.size config param	2021-07-14 13:54:18 +02:00
Nga Tran	8fd0df04f2	feat: continue buidling and using sort_key if available	2021-07-13 16:25:58 -04:00
Andrew Lamb	781c4fa666	fix: update server tests	2021-07-13 15:44:57 -04:00
Marko Mikulicic	239c931f26	fix: Raise max message to 10M And log message size on kafka write error. Turns out the kafka partition message size limit default is 1MB, but also the client side "max request size" default is also 1MB. The error message we get from our kafka client is misleading: it says ``` KafkaError (Message production error: MessageSizeTooLarge (Broker: Message size too large)) } ``` which to my mind it seemed like if ("Broker:") the broker said "Message size too large". That was a lie; I killed the broker and the client kept saying the same error message which means it didn't even try to send the message out. TODO: make this a proper parameter. (but let's unblock)	2021-07-13 17:47:36 +02:00
kodiakhq[bot]	6a09678f34	Merge branch 'main' into crepererum/update_deps	2021-07-13 14:18:57 +00:00
Raphael Taylor-Davies	6c8b2b4fa7	feat: add integration test of compaction freezing (#1938 ) (#1975 ) * feat: add integration test of compaction freezing (#1938) * chore: update server/src/db/lifecycle/compact.rs Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com> Co-authored-by: Andrew Lamb <alamb@influxdata.com>	2021-07-13 14:11:10 +00:00
Marco Neumann	157a0cc98c	chore: update flatbuffers to 2.0	2021-07-13 15:44:45 +02:00
Marko Mikulicic	bf20641d78	chore: Log whether the write buffer is enabled	2021-07-13 14:15:52 +02:00
Raphael Taylor-Davies	5a0caeab44	feat: skip over fully persisted partitions (#1962 ) (#1973 ) * feat: skip over fully persisted partitions (#1962) * chore: review feedback	2021-07-13 10:40:45 +00:00
Andrew Lamb	d35b74c226	fix: Fix doc build warnings (#1945 ) * fix: Fix doc build warnings * refactor: add deny bare_urls to crates Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-13 08:03:42 +00:00
Paul Dix	708aebaeb3	fix: freeze open chunk when compacting (#1971 ) Closes #1938. Unfortunately, this contains only a unit test to ensure that open chunks are frozen when set_compacting is called. It would be better to have a more end-to-end integration test that ensurest his behavior, but I've confirmed by hand (with some sleeps and a hacked up end-to-end test) that this fixes it.	2021-07-13 07:44:02 +00:00
Nga Tran	5418a1fe6b	refactor: remove unused comments	2021-07-12 18:14:38 -04:00
Nga Tran	23895e6673	feat: Using sort_key to avoid resorts	2021-07-12 18:08:45 -04:00
Carol (Nichols \|\| Goulding)	6764a2d68e	fix: Write Buffer errors are known, not UnknownDatabaseErrors Fixes #1956.	2021-07-12 11:21:31 -04:00
Carol (Nichols \|\| Goulding)	3bd7486016	test: Rename a test type alias to not shadow super::Error	2021-07-12 10:46:29 -04:00
Andrew Lamb	670826daf9	refactor: make object_store construction interface consistent (#1944 ) * refactor: make object_store construction interface consistent * fix: benchmarks * fix: doc build Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-12 12:56:36 +00:00
Andrew Lamb	9534220035	feat: Add any lifecycle_action to system.chunks and API (#1947 )	2021-07-09 17:38:29 +00:00
Raphael Taylor-Davies	7af560aa99	feat: Persist lifecycle action (#1888 ) * feat: add split and persist operation * docs: Improve doc strings * refactor: use for loop rather than map * refactor: Make it clear that the lifecycle policy picks the split timestamp * fix: race condition * docs: improve comments * fix: logical merge conflict * fix: clippy Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2021-07-09 13:21:46 +00:00
Andrew Lamb	1a79bf7e99	refactor: Make aws/azure/gcs optional features and stop compiling 100 dependencies during dev (#1933 ) * feat: make aws, gcp, azure dependencies optional * fix: only run object store tests if the features are enabled * fix: clean up testing * fix: rename step * fix: add to list of jobs * fix: remove test with object store * fix: review comments	2021-07-09 11:38:30 +00:00
Andrew Lamb	3cb8f297b1	refactor: encapsulate the ObjectStore implementations in the object store crate (#1932 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-09 10:38:32 +00:00
Marco Neumann	bc958e2ff0	refactor: use Arcs to pass schemas around	2021-07-09 09:45:12 +02:00
Marco Neumann	09e611deb7	refactor: lift query schema generation up to caller Do no longer scan chunks during query planning to determine the schema (except for the lifetime jobs where we have a good reason to do so). Instead pass the schema down to from whoever is triggering the query. For real SQL queries, we then just use the the table-wide schemas introduced in #1913. Apart from avoiding schema merges we now also don't crash any longer when no chunks are left in the table (aka columns are present but all rows are gone). Fixes #1768. Fixes #1884.	2021-07-09 09:24:21 +02:00
kodiakhq[bot]	c37053ad46	Merge branch 'main' into cn/chunk-times	2021-07-08 20:58:54 +00:00
kodiakhq[bot]	a2726c7e92	Merge branch 'main' into cn/kafka-read-metrics-and-e2e-tests	2021-07-08 20:40:19 +00:00
Carol (Nichols \|\| Goulding)	22495dd355	fix: Take a TableBatch in the MBChunk constructor Thus ensuring all MBChunks will have data in them.	2021-07-08 16:39:35 -04:00
Carol (Nichols \|\| Goulding)	548c64539e	fix: Wrap lines at 100 chars	2021-07-08 16:39:33 -04:00
Carol (Nichols \|\| Goulding)	74c0a6cb00	fix: Arrange use statements so rustfmt can manage their order	2021-07-08 16:39:02 -04:00
kodiakhq[bot]	c8126784a8	Merge branch 'main' into ntran/avoid_sort_in_scan	2021-07-08 20:22:18 +00:00
Andrew Lamb	72928aab3d	refactor: Move ChunkLifecycleAction to the data_types crate (#1939 )	2021-07-08 20:18:33 +00:00
Andrew Lamb	dd3eff7748	refactor: Always use `row_count` for count of rows in system.* tables (#1937 )	2021-07-08 19:28:11 +00:00
Carol (Nichols \|\| Goulding)	c6bf0a26f4	feat: Add metrics for when ingesting from the write buffer fails So that we have some way of figuring out what might be going on.	2021-07-08 09:57:51 -04:00
Carol (Nichols \|\| Goulding)	80e1dcafe0	feat: Support reading from all Kafka partitions When reading from the Kafka write buffer, subscribe to all partitions in a topic and start from the smallest offset available, instead of assuming there will only be 1 partition per topic.	2021-07-08 09:30:59 -04:00
Carol (Nichols \|\| Goulding)	c90ef7b14b	fix: Create one consumer group per server+database This hasn't caused any problems for me yet, but seemed like a good idea because we want to be sure we don't get any of Kafka's consumer rebalancing if we have multiple partitions.	2021-07-08 09:28:34 -04:00
Carol (Nichols \|\| Goulding)	e5168936f5	feat: Better error messages through to gRPC API + e2e Kafka Read tests	2021-07-08 09:28:34 -04:00
Carol (Nichols \|\| Goulding)	c53ae41d57	fix: Remove unneeded Option from the reading mock	2021-07-08 09:28:34 -04:00
Carol (Nichols \|\| Goulding)	854c28c41a	feat: Stream messages from Kafka into the database	2021-07-08 09:28:34 -04:00
Carol (Nichols \|\| Goulding)	ee500f5bda	feat: Support configuring a write buffer for writing OR reading	2021-07-08 09:28:34 -04:00
Carol (Nichols \|\| Goulding)	63d26f6f3f	refactor: Rename KafkaBuffer to KafkaBufferProducer	2021-07-08 09:28:34 -04:00
Carol (Nichols \|\| Goulding)	e5de73133c	feat: Change write buffer connection rule to take either Writing or Reading connection info A database on one IOx server can, exclusively: - Not interact with Kafka at all - Send writes to Kafka - Read writes from Kafka Notably, a database on a particular server will never write and read from Kafka at the same time.	2021-07-08 09:28:34 -04:00
Carol (Nichols \|\| Goulding)	fd4bcc2fa5	refactor: Rename the WriteBuffer trait to be WriteBufferWriting	2021-07-08 09:28:34 -04:00
Carol (Nichols \|\| Goulding)	83e50cfba4	refactor: Rename field to not contain the type	2021-07-08 09:28:34 -04:00
kodiakhq[bot]	69e4786fc7	Merge branch 'main' into crepererum/str_arcs	2021-07-08 13:20:49 +00:00
Marco Neumann	18893e76e0	refactor: convert some table name and part. key String to Arcs This has the (somewhat nice) side effect that it shrinks the in-mem catalog a bit as well because nw `ParquetChunk` is a bit smaller making the chunk stage enum smaller as well.	2021-07-08 14:34:28 +02:00
Edd Robinson	7ff8ae4ce5	refactor: tidy up sort key rep	2021-07-08 12:48:41 +01:00
Edd Robinson	f811bf1e5e	refactor: log compaction activity	2021-07-08 12:48:41 +01:00
Andrew Lamb	33bc85ad18	feat: Infrastructure for persistence (#1925 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-08 11:14:38 +00:00
Andrew Lamb	7602bde850	chore: Update datafusion deps (#1799 ) * chore: Update datafusion deps + rework code * refactor: remove workaround as it has been contributed upstream * fix: Update query/src/exec/split.rs Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-08 10:58:32 +00:00
Marco Neumann	24056d7bfc	test: ensure that table schemas are recovered from pres. catalog	2021-07-08 10:01:42 +02:00
Marco Neumann	a746cd45c5	test: check for schema change errors	2021-07-08 09:51:49 +02:00
Marco Neumann	bd22dd38ea	docs: fix typos Co-authored-by: Andrew Lamb <alamb@influxdata.com>	2021-07-08 09:18:09 +02:00
Marco Neumann	b528ac2b55	feat: store schemas per table This way we can: - check for schema matches even for writes going into different partitions - solve #1768 and #1884 in some future PR Closes #1897.	2021-07-08 09:18:09 +02:00
Marco Neumann	5ca9760c94	test: make partioning in DB tests consistent w/ DB rules	2021-07-08 09:18:09 +02:00
Marco Neumann	ed3ebdcbd2	refactor: use sync locks w/ better metrics	2021-07-08 09:18:09 +02:00
Marco Neumann	5936452895	feat: add infra to check table-wide schemas	2021-07-08 09:18:09 +02:00
Nga Tran	5c722af0fa	fix: remove comments	2021-07-07 16:50:53 -04:00
Nga Tran	d3c4f8c249	fix: store sort key correctly inthe schema. Update tests to reflect it	2021-07-07 15:55:23 -04:00
Paul Dix	cc350bb1ea	fix: don't update last write time on failed writes Fixes #1905	2021-07-07 14:50:03 -04:00
Andrew Lamb	e6d995cbd8	chore: Update to Rust 1.53.0 (#1922 ) * chore: Update to Rust 1.53.0 * fix: Update to latest clippy standards * fix: bad refactor * fix: Update escaping * test: update test output Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-07 18:02:03 +00:00
Andrew Lamb	957c6245e3	docs: Note that rollover_partition is not automatically called (#1910 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-07 12:14:31 +00:00
Marko Mikulicic	25e3a304ed	chore: Log partition rollover (#1907 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-07 11:48:16 +00:00
Nga Tran	8dfc3bb6bc	fix: Thanks Andrew for helping fic the compile problem and avoid using Arc<Mutex>	2021-07-06 18:05:59 -04:00
Nga Tran	76789e5902	feat: store sotkey into the chunk schema of RUB	2021-07-06 17:00:35 -04:00
Marco Neumann	b6185982f7	refactor: make `ProviderBuilder` a build-time-checked builder It's safer and also avoids cloning / copying state around.	2021-07-06 18:20:05 +02:00
Marco Neumann	4f5fe62428	feat: add DB name to lifecycle logs (#1900 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-06 16:14:28 +00:00
Marco Neumann	09b7405b20	docs: spelling fixes Co-authored-by: Andrew Lamb <alamb@influxdata.com>	2021-07-06 17:46:36 +02:00
Marco Neumann	3d644b63a1	feat: add `Replay` state to DB init	2021-07-06 14:24:39 +02:00
Marco Neumann	4ca2d3e148	chore: move persistence windows related code into own crate The entire persistence windows data structures (including the checkpoints) have nothing to do with the mutable buffer per se. So lets move them into their own crate. This also makes `parquet_file` not longer depend on `mutable_buffer`.	2021-07-05 10:23:58 +02:00
Marco Neumann	cdab1bed05	feat: persist part+db checkpoint in parquets and catalog This will be required for replay on server startup.	2021-07-05 09:42:46 +02:00
kodiakhq[bot]	bcf43a3de5	Merge branch 'main' into crepererum/db_state_in_grpc	2021-07-05 07:21:48 +00:00
Nga Tran	405a6a691b	feat: intial implementation of #1886 : avoid resort if appropriate	2021-07-02 17:57:48 -04:00
Raphael Taylor-Davies	b4534883fe	refactor: remove table name from upsert_table (#1882 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-02 15:22:41 +00:00
Marco Neumann	54fbb60740	feat: expose DB state in gRPC interface	2021-07-02 11:24:36 +02:00
kodiakhq[bot]	404da38d6f	Merge branch 'main' into pd-remove-mb-size-limit-checks	2021-07-01 20:01:32 +00:00
Raphael Taylor-Davies	5b00bc69e6	refactor: use Arc<Db> in lifecycle actions (#1873 ) * refactor: use Arc<Db> in lifecycle actions * chore: review feedback	2021-07-01 19:56:33 +00:00
Paul Dix	61917c107f	chore: add test for can_move on row count	2021-07-01 15:49:44 -04:00
Paul Dix	91f5478012	feat: remove MUB size threshold Removes the MUB chunk close based on size. Also add a check in lifecycle policy to move if the MUB chunk crosses a default row count threshold.	2021-07-01 14:58:29 -04:00
Andrew Lamb	56c8c8d428	feat: Use separate executor for queries and compactions/moves (#1870 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-01 16:47:50 +00:00
Raphael Taylor-Davies	f1a100c6ae	refactor: remove now unused chunk sort order (#1854 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-01 16:39:45 +00:00
Andrew Lamb	07826306ed	fix: Always deduplicate data prior to insertion into the ReadBuffer (#1863 ) * fix: mark ReadBuffer as always deduplicated * fix: Use compact plans during merge * docs: Update server/src/db/chunk.rs Co-authored-by: Nga Tran <ntran@influxdata.com> Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com> Co-authored-by: Nga Tran <ntran@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-01 16:23:37 +00:00
Jacob Marble	0779b0d9bd	feat: add gRPC listener for new write protocol (#1842 ) * feat: add gRPC listener for new write protocol * chore: clippy happy * chore: lint * chore: cargo fmt --all * chore: cargo clippy * chore: protobuf-lint * chore: more formatting Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-01 16:15:12 +00:00
Marco Neumann	e1e3163752	refactor: rework DB init state machine Since adding new features like "sequencer replay" or init retries would make the current code too complex, a refactor is required: Config: The config struct now holds a `DatabaseState` which is a simple linear state machine representing the different stages of the database init. Init: The init module now has a fixpoint-loop which looks at the state, decides what to do based on it and repeats until either the DB is initialized or an error occured. This also makes it easier to continue the init process "in the middle", e.g. when the preserved catalog is broken or the sequencer (e.g. Kafka) could not be reached.	2021-07-01 13:47:51 +02:00
Andrew Lamb	cfa06e1497	chore: Add query tests for compacted chunks (#1861 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-30 20:59:29 +00:00
Raphael Taylor-Davies	99a15cd452	refactor: single lifecycle error enumeration (#1859 ) * refactor: single lifecycle error enumeration * fix: fmt Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2021-06-30 18:35:57 +00:00
Andrew Lamb	817a480cde	refactor: move lifecycle implementations out of db.rs and into their own modules (#1858 ) * refactor: move lifecycle implementations out of db.rs and into their own modules * fix: clippy	2021-06-30 17:24:04 +00:00
Andrew Lamb	9e1723620c	refactor: rename load_chunk_to_read_buffer to move_chunk_to_read_buffer (#1857 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-30 16:53:18 +00:00
Marco Neumann	043890369f	refactor: make `MinMaxSequence` safer to use	2021-06-30 16:37:48 +02:00
kodiakhq[bot]	983062f6fa	Merge branch 'main' into crepererum/no_catalog_on_db_creation	2021-06-30 10:04:00 +00:00
Edd Robinson	2e430ac7f0	refactor: remove table name from read_filter schema	2021-06-30 09:50:53 +01:00
Edd Robinson	62f274cc1b	refactor: remove table name from column_values	2021-06-30 09:46:54 +01:00
Edd Robinson	5737c9d962	refactor: remove table name from column_names	2021-06-30 09:43:41 +01:00
Marco Neumann	c4e054f909	feat: do NOT load preserved catalogs on late DB creation When a DB is created AFTER the server is initialized, then we can assume it is a new DB (because the rules file did not exist beforehand). We shall treat it as a new DB with no data and should not try to load some leftover / stale / whatever preserved catalog for it. How this catalog came into existence we do not know and it was certainly not properly managed by IOx. So we error if there is a catalog. Furthermore the old implementation was kinda broken since it loaded the perserved catalog "in-sync" with the gRPC call that issued the DB creation (we only have a delayed init concept for DBs that are loaded on instance startup). In production that would very likely provoke nasty timeouts. On top of that this new behavior will also be somewhat more sane when we think about sequencer (e.g. Kafka) replays. We certainly do not wanna do any replays for newly created DBs. TLDR: New behavior for DBs created via gRPC is "new empty DB". This does NOT affect DBs loaded on instance startup (aka existing DBs).	2021-06-30 10:12:38 +02:00
Marco Neumann	58310abfee	refactor: de-duplicate code in `server::db::load`	2021-06-30 10:08:25 +02:00
Marco Neumann	9d10ac9f6a	refactor: write parquet files w/o holding the transaction lock This allows to prepare writes per-tableXpartition before entering the database-exclusive section that deals with catalog transactions. Closes #1821.	2021-06-29 14:23:06 +02:00
Marco Neumann	3ebb6a3037	refactor: do not capture txn-specific information in parquet files This helps with #1821.	2021-06-29 14:22:36 +02:00
Edd Robinson	a7198ea78b	refactor: use satisfies_predicate in apply_predicate	2021-06-29 11:58:28 +01:00
kodiakhq[bot]	eda9532eb2	Merge branch 'main' into crepererum/issue1821-cleanup-lock	2021-06-29 10:48:43 +00:00
Andrew Lamb	3ee96c4618	fix: Do not sequence local writes (avoid panic under load) (#1826 ) * fix: Do not sequence local writes * fix: Update server/src/db.rs Co-authored-by: Edd Robinson <me@edd.io> * fix: review comments * fix: restore passing sequence information down to mutable buffer * fix: store min/max times even when there are no sequence numbers Co-authored-by: Edd Robinson <me@edd.io> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-29 10:39:37 +00:00
Marco Neumann	2cd5ce98be	refactor: do not pass locks around for catalog cleanup	2021-06-29 10:21:41 +02:00
Marco Neumann	730a23faa3	refactor: improve locking around the parquet file cleanup Instead of (ab)using the transaction lock to prevent the cleanup job from removing just-written parquet files, use a dedicated lock. This will later allow us to write parquet files before starting a transaction (i.e. w/o holding the transaction lock). This will help with #1821.	2021-06-29 10:20:03 +02:00
Edd Robinson	12ae9b012a	refactor: clarify intent of	2021-06-28 17:39:48 +01:00
Carol (Nichols \|\| Goulding)	0f7c47d10e	fix: Limit the number of errors per sequenced entry we'll collect	2021-06-28 09:29:17 -04:00
Carol (Nichols \|\| Goulding)	1e171e2e9a	refactor: Organize `use` statements and let rustfmt manage order	2021-06-28 09:29:15 -04:00
Carol (Nichols \|\| Goulding)	f3a3a9b267	fix: Try to write all partition_writes even if one fails, collect all errors and report at the end	2021-06-28 09:24:23 -04:00
Carol (Nichols \|\| Goulding)	4d2954ec1d	test: Write a failing tests for partition_writes being ignored after a failure	2021-06-28 09:24:23 -04:00
Marco Neumann	65e65412cc	refactor: move catalog loading code into its own module	2021-06-28 12:46:25 +02:00
Paul Dix	de236c5a6f	feat: update persistence windows to support late arrival less than 30 seconds	2021-06-25 15:34:11 -04:00
Paul Dix	435b4b6a94	feat: add persistence windows to partition and update on write This brings the persistence windows into the catalog partition. It adds a helper method on TableBatch to get the min and max times for a given write. Finally, it adds this logic to the db to update persistence windows on every write while the partition write lock is being held.	2021-06-25 15:34:11 -04:00
Raphael Taylor-Davies	3046b1692c	chore: include table name in compaction log (#1805 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-25 15:20:44 +00:00
Andrew Lamb	79446d45be	feat: Implement split_plans (#1794 ) * feat: implement split plan / planner * fix: Apply suggestions from code review Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com> * fix: resolve merge conflicts * fix: add values to panic Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>	2021-06-24 18:38:00 +00:00
Raphael Taylor-Davies	297fc12db8	feat: compact chunks (#1776 ) * feat: compact chunks * chore: review feedback * chore: clippy lints * chore: document sort key algorithm Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-24 16:49:10 +00:00
Carol (Nichols \|\| Goulding)	c0c1c3fd8e	refactor: Extract a struct to hold all the arguments needed to make a Db	2021-06-23 16:31:38 -04:00
Carol (Nichols \|\| Goulding)	f903b6eca8	fix: Create WriteBuffer outside of commit_db so committing can't fail	2021-06-23 13:56:50 -04:00
Carol (Nichols \|\| Goulding)	51e72c8821	refactor: Extract a function for creating a write buffer from database rules	2021-06-23 13:33:29 -04:00
Carol (Nichols \|\| Goulding)	57aee2f770	fix: Remove TODO that's a TODON'T	2021-06-23 13:18:28 -04:00
Carol (Nichols \|\| Goulding)	6ec3c03b0a	fix: Handle failure to create a Kafka producer rather than panicking	2021-06-23 10:51:23 -04:00
Carol (Nichols \|\| Goulding)	c66f9e5aeb	feat: Write entries to Kafka when configured as the write buffer	2021-06-23 10:48:18 -04:00
Carol (Nichols \|\| Goulding)	08f0696890	refactor: Extract a type alias for the trait's error type	2021-06-23 10:48:18 -04:00
Carol (Nichols \|\| Goulding)	250b9362a6	fix: Pass the database to the KafkaBuffer to use as the topic	2021-06-23 10:48:18 -04:00
Carol (Nichols \|\| Goulding)	93881da016	feat: Make Write Buffer store_entry async In preparation for the Kafka write buffer implementation needing to call async functions.	2021-06-23 10:48:18 -04:00
kodiakhq[bot]	59993e8b8f	Merge branch 'main' into crepererum/issue1623	2021-06-23 12:40:05 +00:00
Marco Neumann	c395409b51	feat: include UUIDv4 into parquet file names Change schema from ```text <server_id>/<db_name>/data/<part_key>/<chunk_id>/<table_name>.parquet ``` to ```text <server_id>/<db_name>/data/<table_name>/<part_key>/<chunk_id>.<uuid>.parquet ``` So parquet files will NEVER be overwritten. This is especially helpful when dealing with old catalog leftovers (i.e. a parquet file that belonged to an old but wiped catalog). It also simplifies the reasoning about file references in the future and follows what other dataset formats are usually doing (i.e. never replace files). Also use `ChunkAddr` where it makes sense.	2021-06-23 14:30:28 +02:00
kodiakhq[bot]	70817a474c	Merge branch 'main' into crepererum/issue1740-d	2021-06-23 12:29:54 +00:00
Raphael Taylor-Davies	5cd911c74a	fix: correct row count for object store chunks (#1789 )	2021-06-23 12:06:49 +00:00
kodiakhq[bot]	d94a9ea94a	Merge branch 'main' into crepererum/better_served_uninit_error	2021-06-23 08:54:48 +00:00
Marco Neumann	cf55df68b5	refactor: remove some `Arc`s around the in-mem catalog This is for #1740.	2021-06-23 10:51:22 +02:00
Marco Neumann	39eac62d5d	fix: improve "server not initialized" error We've reported "databases not loaded" which is a bit confusing for router nodes, so change the description to "server not initialized".	2021-06-23 10:47:51 +02:00
Marco Neumann	d2be641864	refactor: make checkpointing easier to use Don't mix commit+checkpoint in a single call so that the caller has to reason about the error type and which of the two operations has failed. Splitting it also makes it easier to create the correct checkpoint data.	2021-06-23 10:25:05 +02:00
Marco Neumann	4a961694ec	refactor: make caller sync mem<>OS view during catalog transactions This is for #1740. Greatly simplifies the integration of the persisted catalog into the DB.	2021-06-23 10:25:05 +02:00
kodiakhq[bot]	c3dbe4c571	Merge branch 'main' into crepererum/fix_auto_wipe	2021-06-22 13:50:53 +00:00
Marco Neumann	a98b10745f	fix: auto-wipe should still be enabled Auto-wipe broken catalogs should be enabled until #1522 is closed.	2021-06-22 15:45:32 +02:00
kodiakhq[bot]	b77bff449b	Merge branch 'main' into crepererum/issue1740-b	2021-06-22 13:27:26 +00:00
Raphael Taylor-Davies	01b0fdabb7	feat: make lifecycle partition-aware (#1767 ) * feat: make lifecycle partition-aware * chore: further docs * chore: rename to maybe_free_memory * chore: fix logical conflicts * chore: ensure only drops unpersisted chunks * chore: clippy lints * chore: fix doc Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-22 09:24:15 +00:00
Marco Neumann	d1db0dfaeb	refactor: remove type parameter from preserved catalog For #1740.	2021-06-22 10:53:10 +02:00
kodiakhq[bot]	799e2caa34	Merge branch 'main' into crepererum/issue1740-a	2021-06-22 07:19:27 +00:00
Andrew Lamb	5362c7c924	feat: enable query deduplication (#1762 )	2021-06-21 18:49:04 +00:00
Marco Neumann	ff60627500	refactor: make preserved catalog NOT own the in-mem catalog Works towards #1740.	2021-06-21 18:39:43 +02:00
Marco Neumann	881729bd23	refactor: make caller responsible to create checkpoint data This decouples the in-mem and preserved catalog a bit and works towards #1740.	2021-06-21 18:33:23 +02:00
Edd Robinson	7e3df17896	test: update benchmarks	2021-06-21 15:29:23 +01:00
Edd Robinson	ac54320821	refactor: update server with new chunk API	2021-06-21 15:12:17 +01:00
Raphael Taylor-Davies	ea04ce40dc	feat: transactional lifecycle API (#1753 ) * feat: transactional lifecycle API * chore: remove redundant upgrade * feat: lifecycle error propagation * chore: add usage doctest Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-21 13:09:53 +00:00
Marco Neumann	0d7c3ff279	docs: fix typos Co-authored-by: Andrew Lamb <alamb@influxdata.com>	2021-06-21 13:18:20 +02:00
Marco Neumann	4d3432a1e0	docs: improve `server::config` docs	2021-06-21 10:06:50 +02:00
Marco Neumann	29bbc9a384	refactor: recoverable DB init - store read and parsed DB rules even the the catalog is broken - allow wiping the catalog for DBs w/ init failures - try to bring the DB back online after successful wipes Note that this does yet allow to update rules for broken DBs or to fix DBs w/ broken rule files. However this can be implemented easily on top of this.	2021-06-21 09:31:23 +02:00
Marco Neumann	d17b5710a8	feat: add server functionality to wipe preserved catalogs	2021-06-21 09:31:23 +02:00
Marco Neumann	aba973a6e1	refactor: make catalog `wipe` a freestanding function It does not interact with the `CatalogState` so users can call this function without that type.	2021-06-21 09:31:23 +02:00
Andrew Lamb	258a6b1956	chore: remove more dead code (#1760 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-18 21:28:22 +00:00
kodiakhq[bot]	1d8469951f	Merge branch 'main' into smaller-cache	2021-06-18 18:50:10 +00:00
Andrew Lamb	de67bd3efe	refactor: Remove PartitionChunk::table_schema (#1756 ) * refactor: Remove PartitionChunk::table_schema * docs: update comments	2021-06-18 16:13:16 +00:00
Andrew Lamb	9beeca3e7c	refactor: Unify schema handling in query crate (#1755 ) * refactor: Unify schema handling in query crate * fix: doclink	2021-06-18 14:10:57 +00:00
Andrew Lamb	1c13d676b4	refactor: Rename query::PartitionChunk --> query::QueryChunk (#1754 )	2021-06-18 13:24:09 +00:00
Marko Mikulicic	b612c3af4e	chore: Switch to smaller cache dep	2021-06-18 09:43:28 +02:00
Andrew Lamb	ec43a87909	chore: Update itertools deps (#1750 )	2021-06-17 17:56:44 +00:00
Raphael Taylor-Davies	f6dbc8d6f2	refactor: add ChunkAddr to describe location of chunk in catalog (#1745 ) * refactor: add ChunkPath to describe location of chunk in catalog * refactor: rename ChunkPath to ChunkAddr * chore: further renames * chore: even more renames Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-17 12:04:37 +00:00
Marco Neumann	87b2a1eaea	docs: add note about why we write parquets during transactions	2021-06-16 11:01:14 +02:00
Marco Neumann	e056d97cf6	test: always test transaction aborts	2021-06-16 11:01:14 +02:00
Marco Neumann	ec053f674c	feat: make DB catalog work w/ transaction aborts	2021-06-16 11:01:14 +02:00
Marco Neumann	caaf95c6ec	refactor: remove lock from `TestCatalogState`	2021-06-16 10:51:15 +02:00
Marco Neumann	c8c412f6fe	refactor: rework catalog state interface This now allows not only for copy-based transaction handling but also for eager exec and rollbacks. This will be useful to properly implement transaction aborts for the "real" catalog.	2021-06-16 10:51:15 +02:00
Marco Neumann	2596de072e	feat: make sure DB catalog can correctly add and remove parquet files Note that this does NOT yet allow it to correctly abort transactions.	2021-06-16 10:50:47 +02:00
Raphael Taylor-Davies	bf54ab51f2	refactor: split lifecycle into separate crate (#1730 )	2021-06-15 15:57:47 +00:00
Raphael Taylor-Davies	f96e05d26a	refactor: traitify lifecycle policy (#1729 ) * refactor: traitify lifecycle policy * chore: docs Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-15 14:00:06 +00:00
Andrew Lamb	b756e09904	refactor: Rename parquet_file::Chunk --> ParquetChunk (#1722 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-15 11:21:49 +00:00
kodiakhq[bot]	09f2ae1616	Merge branch 'main' into crepererum/issue1595	2021-06-15 11:12:01 +00:00
Marco Neumann	adc3a059ca	refactor: improve server background task logging - rename `name` to `db_name` - add `table_name` to error-detection logs - use `Display` instead of `Debug` fmt for errors, which results in nicer outputs and follows the rest of the stack This is for #1725.	2021-06-15 10:28:12 +02:00
Marco Neumann	dcfaa81969	feat: info-log server ID during init Add a info log when the server ID is set. Because this is done where the server ID is also stored, this automatically affects all ways to set it (via CLI, via environment variable, via gRPC call). Closes #1595.	2021-06-15 10:09:53 +02:00
kodiakhq[bot]	19f684ee14	Merge branch 'main' into crepererum/issue1506	2021-06-15 07:36:49 +00:00
Marco Neumann	55fc5e564b	refactor: remove serverID and DB name args from catalog state They are no longer required.	2021-06-15 09:35:41 +02:00
Marco Neumann	057c99d431	fix: tighten memory ordering	2021-06-14 17:34:57 +02:00
Marco Neumann	2ea24b6467	feat: allow to fail initializing a single DB - keep errors encountered during DB init - treat failed DB inits as existing DBs - effectively poison failed DBs (there is no way to recover except by restarting the server, yet)	2021-06-14 17:34:57 +02:00
Marco Neumann	0b5552f131	refactor: ensure that DBs are reserved before doing expensive IO	2021-06-14 17:34:57 +02:00
Marco Neumann	233235365a	refactor: de-couple DB rules commit from name reservation This allows us to put DBs in a controlled error state when we try to load rules from a file but the rules are somewhat broken.	2021-06-14 17:34:57 +02:00
Marco Neumann	318af9b801	feat: keep error that occurred during server init	2021-06-14 17:34:57 +02:00
Marco Neumann	bf0ba6ba6c	test: rename some server init tests to better reflect their nature	2021-06-14 17:34:57 +02:00
Marco Neumann	250ccdcdcd	refactor: use `IOxMetadata` instead of path parsing for parquet chunks	2021-06-14 16:24:50 +02:00
Marco Neumann	d51e7a127c	feat: include table name, partition key, and chunk ID in `IoxMetadata`	2021-06-14 16:24:50 +02:00
Andrew Lamb	a14e9ab27c	refactor: rename mutable_buffer::Chunk --> mutable_buffer::MBChunk (#1711 ) * refactor: rename mutable_buffer::Chunk --> mutable_buffer::MBChunk * fix: fmt	2021-06-14 13:35:20 +00:00
Andrew Lamb	856751deec	feat: Lifecycle manager unloads, rather than drop, chunks when soft limit is hit (#1701 ) * feat: unload chunks from memory rather than dropping them * docs: Update server/src/db/lifecycle.rs Co-authored-by: Marco Neumann <marco@crepererum.net> * docs: Update comment wording Co-authored-by: Marco Neumann <marco@crepererum.net> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-14 13:14:39 +00:00
kodiakhq[bot]	fc1b5ea165	Merge branch 'main' into crepererum/parquet_metadata_wrapper	2021-06-14 11:20:39 +00:00
Andrew Lamb	9d1ca95a52	refactor: Rename catalog::Chunk --> catalog::CatalogChunk (#1702 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-14 11:20:14 +00:00
Marco Neumann	518f7c6f15	refactor: wrap upstream parquet MD into struct + clean up interface This prevents users from `parquet_file::metadata` to also depend on `parquet` directly. Furthermore they don't need to important dozend of functions and can instead just use `IoxParquetMetaData` directly.	2021-06-14 13:17:01 +02:00
Marco Neumann	665919786e	test: fix test	2021-06-14 10:52:23 +02:00
Marco Neumann	f4693e36c0	refactor: `catalog_checkpoint_interval` => `catalog_transactions_until_checkpoint`	2021-06-14 10:34:32 +02:00
Marco Neumann	898c638630	feat: wire up catalog checkpointing Closes #1381.	2021-06-14 10:08:32 +02:00
Marco Neumann	df866f72e0	refactor: store parquet metadata in chunk This will be useful for #1381. At the moment we parse schema and stats eagerly and store them alongside the parquet metadata in memory. Technically this is not required since this is basically duplicate data. In the future we might trade-off some of this memory against CPU consumption by parsing schema and stats on demand.	2021-06-14 10:08:31 +02:00
Edd Robinson	ff19beb0ad	refactor: export rb chunk as RBChunk	2021-06-11 18:33:10 +01:00
kodiakhq[bot]	71e2a8fbaa	Merge branch 'main' into crepererum/inline_parquet_table_struct	2021-06-11 11:22:48 +00:00
Andrew Lamb	0cbe74dbde	fix: persistence to parquet by swapping order of arguments (#1687 ) * fix: fix order of arguments * test: for persistence	2021-06-11 10:55:40 +00:00
Marco Neumann	f8a518bbed	refactor: inline `Table` into `parquet_file::chunk::Chunk` Note that the resulting size estimations are different because we were double-counting `Table`. `mem::size_of::<Self>()` is recursive for non-boxed types since the child will be part of the parent structure. Issue: #1295.	2021-06-11 11:54:31 +02:00
Raphael Taylor-Davies	11b25b3aaf	refactor: swap order of partition and table in in-memory catalog (#1678 ) * refactor: swap order of partition and table in in-memory catalog * chore: review feedback * chore: validate panic message * chore: review feedback Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-10 16:40:30 +00:00
Marco Neumann	13bb290a7c	chore: enforce `clippy::future_not_send` for `server` + top-level crate (#1679 ) * chore: enforce `clippy::future_not_send` for `server` * chore: enforce `clippy::future_not_send` for top-level crate	2021-06-10 15:01:12 +00:00
Marco Neumann	294c304491	feat: impl catalog checkpointing infrastructure This implements a way to add checkpoints to the preserved catalog and speed up replay. Note: This leaves the "hook it up into the actual DB" for a future PR. Issue: #1381.	2021-06-10 15:42:21 +02:00
kodiakhq[bot]	3ba27bdbd9	Merge branch 'main' into crepererum/clippy_future_not_send_part1	2021-06-10 07:19:31 +00:00
kodiakhq[bot]	5f863a59fd	Merge branch 'main' into crepererum/extract_server_init	2021-06-10 07:14:57 +00:00
kodiakhq[bot]	44d8fb9472	Merge branch 'main' into crepererum/clippy_future_not_send_part1	2021-06-10 07:10:11 +00:00
kodiakhq[bot]	eed73a30c5	Merge branch 'main' into ntran/dedup_within_chunk	2021-06-09 18:19:17 +00:00
Nga Tran	c1c58018fc	refactor: address review comments	2021-06-09 14:17:47 -04:00
Marco Neumann	4fe2d7af9c	chore: enforce `clippy::future_not_send` for `parquet_file`	2021-06-09 18:18:27 +02:00
Marco Neumann	d9c38dfe88	refactor: extract server init code This prepares for #1624, so the end results looks a bit cleaner.	2021-06-09 16:53:11 +02:00
kodiakhq[bot]	b49abf9b02	Merge branch 'main' into crepererum/lazy_db_loading	2021-06-09 07:23:35 +00:00
Raphael Taylor-Davies	07c4277ca7	refactor: schema merge to give more control over field merging (#1653 ) * refactor: schema merge to give more control over field merging * chore: review feedback	2021-06-09 06:30:45 +00:00
Nga Tran	3e10351538	test: add tests for the sort plan	2021-06-08 21:40:46 -04:00
Nga Tran	68e3a2121f	feat: add SortExec	2021-06-08 15:04:31 -04:00
Andrew Lamb	fd8a87484e	feat: Hook up chunk grouping into provider	2021-06-08 14:42:37 -04:00
Nga Tran	edbf1b7d5e	Merge branch 'main' into ntran/dedup_within_chunk	2021-06-08 13:18:40 -04:00
Nga Tran	40cb4f741f	feat: initial implementaton	2021-06-08 13:17:36 -04:00
Carol (Nichols \|\| Goulding)	50a69a7f18	fix: Don't mention Kafka unless it's absolutely necessary	2021-06-07 13:01:04 -04:00
Carol (Nichols \|\| Goulding)	2bb2c4ba47	docs: Add some doc comments about the WriteBuffer trait	2021-06-07 11:22:33 -04:00
Carol (Nichols \|\| Goulding)	a8a4a5f29d	fix: Return the Sequence type from the write buffer, not vague WriteMetadata	2021-06-07 11:15:46 -04:00
Carol (Nichols \|\| Goulding)	a63c12acfb	fix: Remove references to Kafka from db tests	2021-06-07 10:58:34 -04:00
Carol (Nichols \|\| Goulding)	45a3547978	refactor: Take ownership of Entry and transform into SequencedEntry Rather than cloning the data. The Entry is no longer used after this point.	2021-06-07 09:56:23 -04:00
Carol (Nichols \|\| Goulding)	8ab8544d4a	feat: Wire up a WriteBuffer trait implemented by a mock With an unimplemented where the Kafka implementation will be.	2021-06-07 09:56:23 -04:00
Carol (Nichols \|\| Goulding)	2418e91001	feat: Add a DatabaseRule field for an optional Kafka write buffer connection string	2021-06-07 09:56:23 -04:00
Carol (Nichols \|\| Goulding)	b5fac8cd59	refactor: Rearrange database rule checks and SequencedEntry construction There are going to be more cases here when the Kafka write buffer is introduced that affect how the SequencedEntry is created and whether a database being immutable is an error or not.	2021-06-07 09:37:22 -04:00
Carol (Nichols \|\| Goulding)	7ff2c5c951	refactor: Rearrange reading of db rules and locking	2021-06-07 09:37:22 -04:00
Carol (Nichols \|\| Goulding)	0139167c98	refactor: Extract a Sequence type A sequencer id and sequence number should always go together, so convey that with a type. Also, this removes lots of repetition of "sequence" 😅	2021-06-07 09:37:22 -04:00
Carol (Nichols \|\| Goulding)	4d6569583e	fix: Partially restore SequencedEntry as Entry+sequencer_id+sequence_num	2021-06-04 14:40:19 -04:00
Carol (Nichols \|\| Goulding)	f4a9a5ae56	fix: Remove write buffer	2021-06-04 14:40:17 -04:00
Andrew Lamb	42f26b609b	refactor: Move `query_tests` and `server_benchmarks` into their own crate --> smaller `server` (#1628 ) * refactor: Separate query_tests into its own crate * fix: references * refactor: break out server benchmarks * fix: Update query_tests/src/lib.rs Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com>	2021-06-04 17:31:19 +00:00
Andrew Lamb	ff3215e6a9	feat: Implement Chunk Pruning (#1567 )	2021-06-04 13:05:22 +00:00
Marco Neumann	195644da04	docs: document semaphore design in server	2021-06-04 12:52:13 +02:00
kodiakhq[bot]	402ef0ebde	Merge branch 'main' into crepererum/limit_cleanup_amount	2021-06-04 10:47:33 +00:00
Marco Neumann	e06d65bb2a	refactor: migrate "DBs initialized" RPC to "server status"	2021-06-04 11:33:41 +02:00
Marco Neumann	b30d7e2821	feat: move DB loading into background worker Before this change we loaded databases eagerly when a serverID was passed on startup BEFORE starting up the gRPC server. Since loading (esp. at its current state without checkpoints and with too many small parquet files) can take very long, K8s thinks IOx is unhealthy. With this change we are now loading databases in the server background worker once a serverID is available. Until then we block all DB-related interactions including adding new databases (since without inspecting the object store there is now way we can check if the DB already exists). Furthermore we now load database no matter if the serverID was passed on startup (via CLI or environment variable) or was set later via gRPC call. Before this change the latter case was somewhat forgotten.	2021-06-04 11:33:41 +02:00
Raphael Taylor-Davies	696ebdc4db	feat: recover failed lifecycle actions (#1099 ) (#1592 ) * feat: recover failed lifecycle actions (#1099) * chore: review feedback * chore: fix logical conflicts	2021-06-03 15:46:33 +00:00
Marco Neumann	91df8a30e7	feat: limit number of files during storage cleanup Since the number of parquet files can potentially be unbound (aka very very large) and we do not want to hold the transaction lock for too long and also want to limit memory consumption of the cleanup routine, let's limit the number of files that we collect for cleanup.	2021-06-03 17:43:11 +02:00
Edd Robinson	e583e1fbda	Merge branch 'main' into er/feat/read_buffer/float_int	2021-06-03 14:48:36 +01:00
Andrew Lamb	eaa5b75437	refactor: Make it clear only partition_key and table name pruning happens in catalog (#1608 ) * refactor: Make it clear only partition_key and table name pruning is happening in catalog * fix: clippy * fix: Update server/src/db/catalog.rs Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * refactor: use TableNameFilter enum rather than Option * docs: Add docstring to the `From` implementation * fix: Update server/src/db/catalog/partition.rs Co-authored-by: Edd Robinson <me@edd.io> Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: Edd Robinson <me@edd.io>	2021-06-03 13:09:09 +00:00
Edd Robinson	65bfa4dd10	test: fix tests	2021-06-03 12:32:40 +01:00
Marco Neumann	27b9477aa4	test: fix flaky test	2021-06-03 11:23:29 +02:00
Marco Neumann	7b2663a38a	test: make tests faster	2021-06-03 11:23:29 +02:00
Marco Neumann	3c9fd81697	refactor: split overlong line	2021-06-03 11:23:29 +02:00
Marco Neumann	bbd73e59be	feat: jitter background clean-up job + wait on first job	2021-06-03 11:23:29 +02:00
Marco Neumann	ce412dbce2	fix: use structured error for background cleanup task reporting	2021-06-03 11:23:29 +02:00
kodiakhq[bot]	1c764c47a2	Merge branch 'main' into ntran/deduplicate	2021-06-02 17:42:36 +00:00
Nga Tran	40bd932fff	refactor: address Andrew's comment	2021-06-02 13:41:46 -04:00
Andrew Lamb	32c6ed1f34	refactor: More cleanup related to multi-table chunks (#1604 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-02 17:00:23 +00:00
Nga Tran	e7a97f3ac1	test: merge main and add more tests for deduplicate work	2021-06-02 12:00:40 -04:00
Marco Neumann	80f4d84ce8	refactor: isolate DB loading and streamline error handling There are not functional changes here (except that errors look slightly different) but it should allow for an easier move of the DB loading into a delayed task.	2021-06-02 13:42:24 +02:00
kodiakhq[bot]	0e09b20ca8	Merge branch 'main' into crepererum/issue1513-b	2021-06-02 07:08:29 +00:00
Nga Tran	40df7def0e	test: ttests for the deduplicate work	2021-06-01 18:06:35 -04:00
Nga Tran	60ad929721	refactor: add macro tto compare output of explains	2021-06-01 16:39:14 -04:00
Nga Tran	aa867601e5	chore: merge main with DF plan display fix	2021-06-01 16:17:41 -04:00
Nga Tran	0ad258bab3	refactor: remove comments since the time function predicates are pushed down after the recent constant folding fix in DF	2021-06-01 16:00:09 -04:00
Andrew Lamb	d8fbb7b410	refactor: Remove last vestiges of multi-table chunks from PartitionChunk API (#1588 ) * refactor: Remove last vestiges of multi-table chunks from PartitionChunk API * fix: remove test that can no longer fail * fix: update tests + code review comments * fix: clippy * fix: clippy * fix: restore test_measurement_fields_error test	2021-06-01 16:12:33 +00:00
Marco Neumann	714a082f3a	refactor: remove chunk state struct nesting Inline structs that are only used for enum variants.	2021-06-01 18:00:16 +02:00
Marco Neumann	5a4562f1c9	test: test `Chunk::new_open`	2021-06-01 18:00:16 +02:00
Marco Neumann	f45e61f9ef	test: test chunk lifecycle action handling	2021-06-01 18:00:16 +02:00
Marco Neumann	50636ca011	refactor: rename `Chunk::{set_closed => freeze}` and add tests This make it clearer what is actually happening. Furthermore, freezing frozen chunks is now a no-op.	2021-06-01 18:00:16 +02:00
kodiakhq[bot]	aafc8c4746	Merge branch 'main' into crepererum/fix_catalog_replay_logging	2021-06-01 15:59:42 +00:00
Marco Neumann	98c2963c28	fix: fix confusing log message during catalog replay	2021-06-01 17:58:38 +02:00
Andrew Lamb	d3711a5591	refactor: Use ParquetExec from DataFusion to read parquet files (#1580 ) * refactor: use ParquetExec to read parquet files * fix: test Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-01 14:44:07 +00:00
Andrew Lamb	64328dcf1c	feat: cache schema on catalog chunks too (#1575 )	2021-06-01 12:42:46 +00:00
kodiakhq[bot]	4e7b754098	Merge branch 'main' into crepererum/issue1513-a	2021-06-01 08:23:01 +00:00
Raphael Taylor-Davies	6e07a735bd	feat: don't recompute chunk size on every iteration (#1586 )	2021-05-31 16:19:11 +00:00
Andrew Lamb	73cedd2f88	chore: remove unused dependency (#1587 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-05-31 14:22:11 +00:00
Marco Neumann	991314ebe8	docs: fix `set_writing_to_object_store` docstring	2021-05-31 15:44:29 +02:00
Marco Neumann	996ce833f1	chore: fix formatting	2021-05-31 15:42:13 +02:00
Andrew Lamb	162a808a8d	refactor: Remove `table_name` from PartitionChunk API (#1584 ) * refactor: Remove `table_name` from PartitionChunk API * fix: clippy Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-05-31 12:05:09 +00:00
Marco Neumann	c658a627ed	refactor: change state structure for chunks This is the first step towards #1513. However it leaves all consumers bascially unchanged and also does NOT touch state transitions. These changes will follow in upcoming PRs.	2021-05-31 11:19:01 +02:00
Raphael Taylor-Davies	db432de137	feat: add distinct count to StatValues (#1568 )	2021-05-28 17:41:34 +00:00
Raphael Taylor-Davies	d8f19348bf	feat: per-column dictionaries in MUB (#1570 ) * feat: per-column dictionaries in MUB * chore: fmt * refactor: remove chunk-level dictionary * chore: remove redundant sort Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-05-28 13:51:56 +00:00
kodiakhq[bot]	d70d7a63a2	Merge branch 'main' into crepererum/remove_invalid_chunk_state	2021-05-28 10:20:05 +00:00
Andrew Lamb	c6f42cf304	refactor: Remove unnecessary code (#1573 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-05-28 10:12:47 +00:00
Marco Neumann	5cfede51f2	refactor: remove `ChunkState::Invalid` This seems to only exist to fight the borrow checker and we can actually live without it.	2021-05-28 11:16:06 +02:00
Andrew Lamb	3ae44a0375	refactor: Chunks can have at most one object store path (#1574 ) * refactor: Chunk can have at most one path * fix: update tests	2021-05-27 19:52:09 +00:00
Nga Tran	62147ff0d4	feat: add more explain tests	2021-05-27 12:19:41 -04:00
Andrew Lamb	f3bec93ef1	feat: Cache TableSummary in Catalog rather than computing it on demand (#1569 ) * feat: Cache `TableSummary` in catalog Chunks * refactor: use consistent table summary	2021-05-27 16:03:05 +00:00
Raphael Taylor-Davies	5d342d7779	feat: associate tracker with lifecycle action (#1099 ) (#1556 ) * feat: associate tracker with lifecycle action (#1099) * chore: docs Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-05-27 10:47:35 +00:00
Raphael Taylor-Davies	792bff07d1	feat: only store ChunkSnapshot in Closed state (#1560 ) * feat: only store ChunkSnapshot in Closed state * chore: review feedback * feat: record MUB size as closed size * chore: document column ordering assumption Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-05-27 10:36:47 +00:00
Raphael Taylor-Davies	4fcc04e6c9	chore: enable arrow prettyprint feature (#1566 )	2021-05-27 10:28:14 +00:00
kodiakhq[bot]	efe077da8f	Merge branch 'main' into crepererum/issue1313	2021-05-26 14:46:18 +00:00
Marco Neumann	24ec1a472e	fix: do NOT delete parquet files that are reachable by time travel	2021-05-26 12:38:54 +02:00
Raphael Taylor-Davies	c03b8a3963	refactor: remove tables from ChunkSnapshot (#1295 ) (#1558 )	2021-05-26 10:37:40 +00:00
Marco Neumann	1fb6af2364	refactor: split DB background loop into lifecycle and cleanup This should prevent one from blocking / stalling the other.	2021-05-26 11:09:30 +02:00
Marco Neumann	5983336366	refactor: rename `parquet_file::{utils => test_utils}`	2021-05-26 11:09:29 +02:00
Marco Neumann	dd6bbeec42	feat: add background task to clean up OS Closes #1313.	2021-05-26 11:04:56 +02:00
Marco Neumann	cc78b5317d	feat: add method to get all parquet files from catalog state	2021-05-26 11:02:40 +02:00
kodiakhq[bot]	166851d952	Merge branch 'main' into crepererum/in_file_metadata	2021-05-26 07:39:53 +00:00
Marko Mikulicic	bae5e5aee3	feat: Add simpler RoutingConfig	2021-05-25 21:51:54 +02:00
Marco Neumann	19a2733d30	feat: preserve transaction metadata in parquets	2021-05-25 09:56:12 +02:00
Marco Neumann	fe8e6301fe	refactor: move `read_schema_from_parquet_metadata` back to `parquet_file::metadata` Let us pool all metadata handling in a single module, which makes it easier to review.	2021-05-25 09:37:53 +02:00
Marko Mikulicic	a4215f0a56	fix: Fix 'acive' jemalloc stat misreporting	2021-05-25 02:55:27 +02:00
Nga Tran	018e1e0246	chore: add a comment to trick github to check semantic	2021-05-24 17:25:14 -04:00
Nga Tran	40a5d7d4ba	chore: Merge branch 'main' into tran/pushdown_parquet	2021-05-24 16:31:06 -04:00
Nga Tran	e72ae81a8e	feat: support predicate pushdown for parquet files	2021-05-24 16:22:52 -04:00
kodiakhq[bot]	db96286ed7	Merge branch 'main' into er/refactor/scalar_comp	2021-05-24 17:02:14 +00:00
Andrew Lamb	c464ffadad	refactor: remove special case timestamp_range in parquet chunk (#1543 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-05-24 16:19:44 +00:00
Andrew Lamb	14ba25f86d	chore: Update datafusion and use released version of arrow crates (#1546 ) * chore: Update datafusion and use released version of arrow crate * fix: Update for change in API	2021-05-24 15:37:22 +00:00
Edd Robinson	abe64c6edc	test: uncomment tests to fix	2021-05-24 16:18:53 +01:00
Carol (Nichols \|\| Goulding)	5c5064bdac	fix: Set default line timestamp and default partition time to same value (#1512 ) * refactor: Rearrange to allow injection of the current time in tests * test: Failing test showing a point can be in the wrong partition * fix: Only get the default time once per ShardedEntry creation, in router	2021-05-24 14:55:11 +00:00
Andrew Lamb	27e5b8fabf	refactor: Remove multiple table support from Parquet Chunk (#1541 )	2021-05-24 08:40:31 -04:00
Nga Tran	1f70d1f9c8	chore: remove a couple more comments	2021-05-21 17:06:53 -04:00

... 4 5 6 7 8 ...

941 Commits (e3e801d29aa31b019b8e3ebaff6875617b9a01a6)