influxdb

Commit Graph

Author	SHA1	Message	Date
Nga Tran	246918feb6	feat: teach compactor to use sort_key_ids instead of sort_key (#8560 ) * feat: teach compactor to use sort_key_ids instead of sort_key * test: update the test output after chatting with Joe and know the reason of the chnanges	2023-08-24 16:16:12 +00:00
Nga Tran	3e98f7ea5c	feat: fill sort_key_ids when partition is inserted and updated (#8517 ) * feat: read null sort_key_ids * chore: clearer explanation about test strategy * chore: Apply suggestions from code review Co-authored-by: Marco Neumann <marco@crepererum.net> * test: tests that add partition with NULL sort_key_ids * feat: set sort_key_ids to empty array {} during partition insertion * feat: initial step to update sort_key_ids * chore: address review comments * chore: remove unecessary comments and tests * fix: typos * chore: remove unecessary tests * feat: continue the work of updating sort_key_ids * fix: chec duplicates for SortedColumnSet * test: tests for sort ley ids * test: fix a test * chore: remove unused comments * chore: address first half of review comments and removing tests of tests * chore: address review commnets for fetching colums in ingester --------- Co-authored-by: Marco Neumann <marco@crepererum.net> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-08-21 14:26:57 +00:00
Nga Tran	5d17a99dbb	feat: read null sort_key_ids (#8489 ) * feat: read null sort_key_ids * chore: clearer explanation about test strategy * chore: Apply suggestions from code review Co-authored-by: Marco Neumann <marco@crepererum.net> * test: tests that add partition with NULL sort_key_ids * chore: address review comments * chore: remove unecessary comments and tests * fix: typos * chore: remove unecessary tests * fix: chec duplicates for SortedColumnSet --------- Co-authored-by: Marco Neumann <marco@crepererum.net> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-08-18 14:15:27 +00:00
NGA-TRAN	9bf1c8c11c	chore: revert fill sort_key_ids	2023-08-11 11:36:27 -04:00
Nga Tran	da92a5c9e1	feat: fill catalog `sort_key_ids` for partitions with coming data (#8462 ) * feat: fill catalog sort_key_ids for partition with coming data * test: sort_key_ids has empty array for newly create partition * test: name of non-existing column * chore: add comments to ask Andrew about the code * chore: make comments clearer * chore: fix a comment to avoid failure in doc * chore: add comment for the panic if column name of sort key not found * fix: during import files the partition has to be created with empty sort key first. Then after its files are created, the partition will be uodated with sort key * chore: remove no longer needed comments after the bug in build_catalog test is fixed * chore: address review comments * refactor: Use ColumnSet type * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * chore: fix a clippy --------- Co-authored-by: Carol (Nichols \|\| Goulding) <carol.nichols@gmail.com> Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com>	2023-08-10 18:12:40 +00:00
Carol (Nichols \|\| Goulding)	92ae8e4084	refactor: Extract a convenience constructor for Deterministic transition ids	2023-08-02 10:17:23 -04:00
Carol (Nichols \|\| Goulding)	308d7f3d4b	feat: Use TransitionPartitionId everywhere in the querier	2023-08-02 10:17:22 -04:00
Carol (Nichols \|\| Goulding)	4a9e76b8b7	feat: Make parquet_file.partition_id optional in the catalog (#8339 ) * feat: Make parquet_file.partition_id optional in the catalog This will acquire a short lock on the table in postgres, per: <https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads> This allows us to persist data for new partitions and associate the Parquet file catalog records with the partition records using only the partition hash ID, rather than both that are used now. * fix: Support transition partition ID in the catalog service * fix: Use transition partition ID in import/export This commit also removes support for the `--partition-id` flag of the `influxdb_iox remote store get-table` command, which Andrew approved. The `--partition-id` filter was getting the results of the catalog gRPC service's query for Parquet files of a table and then keeping only the files whose partition IDs matched. The gRPC query is no longer returning the partition ID from the Parquet file table, and really, this command should instead be using `GetParquetFilesByPartitionId` to only request what's needed rather than filtering. * feat: Support looking up Parquet files by either kind of Partition id Regardless of which is actually stored on the Parquet file record. That is, say there's a Partition in the catalog with: Partition { id: 3, hash_id: abcdefg, } and a Parquet file that has: ParquetFile { partition_hash_id: abcdefg, } calling `list_by_partition_not_to_delete(PartitionId(3))` should still return this Parquet file because it is associated with the partition that has ID 3. This is important for the compactor, which is currently only dealing in PartitionIds, and I'd like to keep it that way for now to avoid having to change Even More in this PR. * fix: Use and set new partition ID fields everywhere they want to be --------- Co-authored-by: Dom <dom@itsallbroken.com>	2023-07-31 12:40:56 +00:00
Carol (Nichols \|\| Goulding)	c1e42651ec	feat: Abstract over which partition ID type we're using to compare and swap sort keys	2023-07-10 13:39:19 -04:00
Marco Neumann	ca31c1eade	feat: hook up tokio metrics (#8050 ) * feat: metrics for main tokio runtime * feat: instrument executor tokio runtime --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-29 11:11:44 +00:00
Carol (Nichols \|\| Goulding)	bffb2f8f9f	fix: Specialize Partition constructors to clarify appropriate usage	2023-06-22 09:01:22 -04:00
Carol (Nichols \|\| Goulding)	41420cb920	fix: Borrow transition partition ID when possible	2023-06-22 09:01:22 -04:00
Carol (Nichols \|\| Goulding)	62ba18171a	feat: Add a new hash column on the partition and parquet file tables This will hold the deterministic ID for partitions. Until all existing partitions have this value, this is optional/nullable. The row ID still exists and is used as the main foreign key in the parquet_file and skipped_compaction tables. The hash_id has a unique index so that we can look up records based on it (if it's available). If the parquet file record has a partition_hash_id value, use that to generate the object storage path instead of the partition_id.	2023-06-22 09:01:22 -04:00
Marco Neumann	64f573c13f	feat: cache partition template in querier (#7987 ) * feat: impl `Eq` for `TablePartitionTemplateOverride` * feat: `TablePartitionTemplateOverride::size` * feat: cache partition template in querier Required for #7974.	2023-06-15 10:30:56 +00:00
Marko Mikulicic	d26ad8e079	feat: Allow passing service protection limits in create db gRPC call (#7941 ) * feat: Allow passing service protection limits in create db gRPC call * fix: Move the impl into the catalog namespace trait --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-08 14:28:32 +00:00
Carol (Nichols \|\| Goulding)	bf699a8b60	fix: Remove partition ID from the metadata serialized into Parquet files (#7947 ) Nothing gets the partition ID out of the metadata. The parts of the code interacting with object storage that need the ID to create the object store path were using the partition ID from the metadata out of convenience, but I changed those places to pass in the partition ID in a separate argument instead. This will make the transition to deterministic partition IDs a bit smoother. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-08 14:03:21 +00:00
Marco Neumann	86a2c249ec	refactor: faster PG `ParquetFileRepo` (#7907 ) * refactor: remove `ParquetFileRepo::flag_for_delete` * refactor: batch update parquet files in catalog * refactor: avoid data roundtrips through postgres * refactor: do not return ID from PG when we do not need it --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-01 16:17:28 +00:00
Andrew Lamb	a48f681e56	feat(parquet): reduce and limit buffering when writing parquet files (#7880 ) * feat: limit buffering when writing parquet files ("combined solution") * chore: Run cargo hakari tasks --------- Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-31 13:27:32 +00:00
Andrew Lamb	1ff76b7bf2	chore: use workspace dependencies for `object_store`	2023-05-26 07:03:42 -04:00
Carol (Nichols \|\| Goulding)	9c0faa66f0	feat: Set a table partition template explicitly or from the namespace And use the table partition template when partitioning writes to that table.	2023-05-24 10:34:30 -04:00
Carol (Nichols \|\| Goulding)	afb3838437	feat: Optionally supply the namespace partition template when creating a namespace	2023-05-24 10:10:34 -04:00
Marco Neumann	103e814f22	refactor: clean up catalog `parquet_files` interface (#7853 ) * feat: `ParquetFileRepo::list_all` * refactor: remove `ParquetFileRepo::list_by_table` * refactor: simlify `ParquetFileRepo::list_by_table` * refactor: remove `ParquetFileRepo::count` * refactor: remove `ParquetFileRepo::update_compaction_level` * refactor: remove `ParquetFileRepo::exists` * fix: test --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-24 09:15:03 +00:00
Dom Dwyer	928a4d163e	build: remove unused dependencies from crates This commit fixes loads of crates (47!) had unused dependencies, or mis-configured dependencies (test deps as normal deps). I added the "unused_crate_dependencies" to all crates to help prevent this mess from growing again! https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html This has the minor downside of false-positives when specifying dev-dependencies for test/bench binaries - these are files in /test or /benches (not normal tests). This commit includes a workaround, importing them in lib.rs (gated by a feature flag). I think the trade-off of better dependency management is worth it!	2023-05-23 14:55:43 +02:00
Andrew Lamb	6344fe8c3f	chore: Add rationale for `clippy::future_not_send` (#7822 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-18 16:58:56 +00:00
Carol (Nichols \|\| Goulding)	7268ea5c29	refactor: Extract a test helper function to create a basic table	2023-05-15 14:31:24 -04:00
Kaya Gökalp	5fe8affb18	refactor: accept NamespaceName with Namespace create (#7774 ) Co-authored-by: Dom <dom@itsallbroken.com>	2023-05-15 10:03:55 +00:00
Carol (Nichols \|\| Goulding)	cc41216382	fix: Undo the addition of a TableInfo type; store partition_template on TableSchema	2023-05-09 14:54:59 +02:00
Carol (Nichols \|\| Goulding)	596673d515	refactor: Create a new ColumnsByName type to abstract over TableSchema columns And allow usage of just the columns when that's all that's needed without leaking the BTreeMap implementation detail everywhere	2023-05-09 14:54:58 +02:00
Carol (Nichols \|\| Goulding)	3d5df5574a	fix: Remove vestiges of shards	2023-05-08 20:24:36 -04:00
Carol (Nichols \|\| Goulding)	b0959667d5	fix: Move topic and query pool within iox catalog (#7734 ) Still insert them into the database and associate them with namespaces, but don't ever query them back out. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-04 13:45:56 +00:00
Carol (Nichols \|\| Goulding)	621caab2e9	fix: Remove unused parquet_max_sequence_number metadata	2023-05-03 10:57:27 -04:00
Carol (Nichols \|\| Goulding)	038f8e9ce0	fix: Move shard concepts into only the catalog This still inserts the shard id into the database, always set to the TRANSITION_SHARD_ID, but never reads it back out again.	2023-04-26 11:42:32 -04:00
Carol (Nichols \|\| Goulding)	f1850c9234	fix: Remove unused level_1 function and TablePartition type	2023-04-17 19:28:50 -04:00
Carol (Nichols \|\| Goulding)	a55e2e5fdb	fix: Remove unused level_0 function	2023-04-17 19:28:49 -04:00
Carol (Nichols \|\| Goulding)	5e6dbec909	fix: Remove tombstones as they aren't functional currently	2023-04-14 13:36:08 -04:00
Carol (Nichols \|\| Goulding)	a244e5b078	test: Add some tests for CatalogToCompactPartitionsSource's existing behavior	2023-04-12 11:07:43 -04:00
dependabot[bot]	66982f988b	chore(deps): Bump object_store from 0.5.5 to 0.5.6 (#7433 ) Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.5 to 0.5.6. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md) - [Commits](https://github.com/apache/arrow-rs/commits) --- updated-dependencies: - dependency-name: object_store dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dom <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-04-04 08:43:34 +00:00
dependabot[bot]	275dad704e	chore(deps): Bump futures from 0.3.27 to 0.3.28 (#7397 ) Bumps [futures](https://github.com/rust-lang/futures-rs) from 0.3.27 to 0.3.28. - [Release notes](https://github.com/rust-lang/futures-rs/releases) - [Changelog](https://github.com/rust-lang/futures-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-lang/futures-rs/compare/0.3.27...0.3.28) --- updated-dependencies: - dependency-name: futures dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-03-31 10:44:23 +00:00
Nga Tran	f780aba353	test: set max_l0_created_at to reasonable values for the tests and al… (#7286 ) * test: set max_l0_created_at to reasonable values for the tests and also verify it using both test layout and catalog function * fix: typo --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-03-21 18:57:10 +00:00
dependabot[bot]	3a9ca8879b	chore(deps): Bump futures from 0.3.26 to 0.3.27 (#7193 ) Bumps [futures](https://github.com/rust-lang/futures-rs) from 0.3.26 to 0.3.27. - [Release notes](https://github.com/rust-lang/futures-rs/releases) - [Changelog](https://github.com/rust-lang/futures-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-lang/futures-rs/compare/0.3.26...0.3.27) --- updated-dependencies: - dependency-name: futures dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-03-13 10:53:59 +00:00
dependabot[bot]	3256fcc72e	chore(deps): Bump object_store from 0.5.4 to 0.5.5 Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.4 to 0.5.5. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md) - [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.4...object_store_0.5.5) --- updated-dependencies: - dependency-name: object_store dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2023-03-03 02:00:51 +00:00
Carol (Nichols \|\| Goulding)	faae5eb438	chore: Rerun cargo hakari manage-deps	2023-02-27 11:56:15 +01:00
Marco Neumann	08578cded5	refactor: n_threads and n_target_partitions are non-zero (#7047 ) * refactor: n_threads and n_target_partitions are non-zero Zero values will just panic. Prevent that earlier. * fix: typo Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> --------- Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com>	2023-02-23 16:57:00 +00:00
Nga Tran	f69c8adc7c	feat: Compact partition with many L0 files (#7007 ) * feat: initial implementation of the split * feat: split many L0 files in groups and compact them into new and fewer L0 files * test: remove iappropriate AllAtOnce test * refactor: move file classification for initial target to its own function * fix: pop the branch from start to end * chore: address review comments * feat: support splitting to many L1 files * feat: only add extra round to compact level-n files to same level-n files if their files plus overlapped level-n-plus-1 over limit * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <alamb@influxdata.com> * chore: final cleanup and address comments * chore: run fmt --------- Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-02-16 21:17:25 +00:00
Marco Neumann	f499022511	feat: add compaction level to commit metrics (#6985 ) * feat: add compaction level to commit metrics * test: more realism	2023-02-15 09:28:19 +00:00
Dom Dwyer	2d46a364dc	feat: namespace soft-delete support This commit adds initial support for "soft" namespace deletion, where the actual records & data remain, but are no longer queryable / writeable. Soft deletion is eventually consistent - users can expect to continue writing to and reading from a bucket after issuing a soft delete call, until the various components either restart, or have their caches flushed. The components treat soft-deleted namespaces differently: * router: ignore soft deleted namespaces * ingester: accept soft deleted namespaces * compactor: accept soft deleted namespaces * querier: ignore soft deleted namespaces * various gRPC services: ignore soft deleted namespaces This ensures that the ingester & compactor do not see rows "vanishing" from the database, and continue to make forward progress. Writes for the deleted namespace that are buffered in the ingester will be persisted as normal, allowing us to support "un-delete" operations where the system is restored to a the state at which the delete was issued (rather than loosing the buffered data). Follow-on work is required to ensure GC drops the orphaned parquet files after the configured GC time, and optimisations such as not compacting parquet from soft-deleted namespaces seems like a trivial win.	2023-02-13 12:01:35 +01:00
Andrew Lamb	779fb93ce7	refactor: move test builders out of compactor2 code (#6953 ) * refactor: move test builders out of compactor2 code * fix: docs	2023-02-10 18:28:09 +00:00
dependabot[bot]	0ecde75af5	chore(deps): Bump object_store from 0.5.3 to 0.5.4 (#6900 ) Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.3 to 0.5.4. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md) - [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.3...object_store_0.5.4) --- updated-dependencies: - dependency-name: object_store dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-02-08 09:40:11 +00:00
Dom Dwyer	114bafe9a1	perf(router): cached table limit enforcement Use the namespace schema cache in the router to enforce the per-namespace table limit (service protection limit), adding O(1) overhead to the existing column limit evaluation logic. Prior to this commit, each request that would breach the table limit would be (potentially partially) applied to the catalog and return an error. Every subsequent request creating a new table continued to cause a catalog query, unnecessarily adding load proportional to request counts. After this commit, catalog requests are sent when the router instance can determine (to the best of it's ability, see below) that the request will not cause the namespace to exceed the table limit. Because this uses cached schemas, the actual state set of tables may have changed - this will cause inconsistent enforcement and spurious errors in the same way it currently does for the column limit. For more details (and to track a resolution) see: https://github.com/influxdata/influxdb_iox/issues/5957	2023-02-06 17:43:26 +01:00
dependabot[bot]	d0e6b16450	chore(deps): Bump bytes from 1.3.0 to 1.4.0 Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.3.0 to 1.4.0. - [Release notes](https://github.com/tokio-rs/bytes/releases) - [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md) - [Commits](https://github.com/tokio-rs/bytes/compare/v1.3.0...v1.4.0) --- updated-dependencies: - dependency-name: bytes dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2023-02-01 00:30:56 +00:00

1 2 3 4

182 Commits (196c589ef64f73677eb3e89e60b219f862bde19a)