influxdb

Commit Graph

Author	SHA1	Message	Date
Nga Tran	3e98f7ea5c	feat: fill sort_key_ids when partition is inserted and updated (#8517 ) * feat: read null sort_key_ids * chore: clearer explanation about test strategy * chore: Apply suggestions from code review Co-authored-by: Marco Neumann <marco@crepererum.net> * test: tests that add partition with NULL sort_key_ids * feat: set sort_key_ids to empty array {} during partition insertion * feat: initial step to update sort_key_ids * chore: address review comments * chore: remove unecessary comments and tests * fix: typos * chore: remove unecessary tests * feat: continue the work of updating sort_key_ids * fix: chec duplicates for SortedColumnSet * test: tests for sort ley ids * test: fix a test * chore: remove unused comments * chore: address first half of review comments and removing tests of tests * chore: address review commnets for fetching colums in ingester --------- Co-authored-by: Marco Neumann <marco@crepererum.net> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-08-21 14:26:57 +00:00
Nga Tran	5d17a99dbb	feat: read null sort_key_ids (#8489 ) * feat: read null sort_key_ids * chore: clearer explanation about test strategy * chore: Apply suggestions from code review Co-authored-by: Marco Neumann <marco@crepererum.net> * test: tests that add partition with NULL sort_key_ids * chore: address review comments * chore: remove unecessary comments and tests * fix: typos * chore: remove unecessary tests * fix: chec duplicates for SortedColumnSet --------- Co-authored-by: Marco Neumann <marco@crepererum.net> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-08-18 14:15:27 +00:00
dependabot[bot]	7094189004	chore(deps): Bump tokio from 1.31.0 to 1.32.0 (#8507 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.31.0 to 1.32.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.31.0...tokio-1.32.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-08-17 08:06:29 +00:00
dependabot[bot]	34b8585931	chore(deps): Bump tokio from 1.30.0 to 1.31.0 (#8482 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.30.0 to 1.31.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.30.0...tokio-1.31.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-08-14 06:32:34 +00:00
NGA-TRAN	9bf1c8c11c	chore: revert fill sort_key_ids	2023-08-11 11:36:27 -04:00
Nga Tran	da92a5c9e1	feat: fill catalog `sort_key_ids` for partitions with coming data (#8462 ) * feat: fill catalog sort_key_ids for partition with coming data * test: sort_key_ids has empty array for newly create partition * test: name of non-existing column * chore: add comments to ask Andrew about the code * chore: make comments clearer * chore: fix a comment to avoid failure in doc * chore: add comment for the panic if column name of sort key not found * fix: during import files the partition has to be created with empty sort key first. Then after its files are created, the partition will be uodated with sort key * chore: remove no longer needed comments after the bug in build_catalog test is fixed * chore: address review comments * refactor: Use ColumnSet type * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * chore: fix a clippy --------- Co-authored-by: Carol (Nichols \|\| Goulding) <carol.nichols@gmail.com> Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com>	2023-08-10 18:12:40 +00:00
dependabot[bot]	3675043585	chore(deps): Bump tokio from 1.29.1 to 1.30.0 (#8464 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.29.1 to 1.30.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.29.1...tokio-1.30.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-08-10 07:50:18 +00:00
Carol (Nichols \|\| Goulding)	4a9e76b8b7	feat: Make parquet_file.partition_id optional in the catalog (#8339 ) * feat: Make parquet_file.partition_id optional in the catalog This will acquire a short lock on the table in postgres, per: <https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads> This allows us to persist data for new partitions and associate the Parquet file catalog records with the partition records using only the partition hash ID, rather than both that are used now. * fix: Support transition partition ID in the catalog service * fix: Use transition partition ID in import/export This commit also removes support for the `--partition-id` flag of the `influxdb_iox remote store get-table` command, which Andrew approved. The `--partition-id` filter was getting the results of the catalog gRPC service's query for Parquet files of a table and then keeping only the files whose partition IDs matched. The gRPC query is no longer returning the partition ID from the Parquet file table, and really, this command should instead be using `GetParquetFilesByPartitionId` to only request what's needed rather than filtering. * feat: Support looking up Parquet files by either kind of Partition id Regardless of which is actually stored on the Parquet file record. That is, say there's a Partition in the catalog with: Partition { id: 3, hash_id: abcdefg, } and a Parquet file that has: ParquetFile { partition_hash_id: abcdefg, } calling `list_by_partition_not_to_delete(PartitionId(3))` should still return this Parquet file because it is associated with the partition that has ID 3. This is important for the compactor, which is currently only dealing in PartitionIds, and I'd like to keep it that way for now to avoid having to change Even More in this PR. * fix: Use and set new partition ID fields everywhere they want to be --------- Co-authored-by: Dom <dom@itsallbroken.com>	2023-07-31 12:40:56 +00:00
Carol (Nichols \|\| Goulding)	22c17fb970	feat: Abstract over which partition ID type we're using to list Parquet files	2023-07-10 13:40:01 -04:00
dependabot[bot]	b15c6062a9	chore(deps): Bump tokio from 1.28.2 to 1.29.0 (#8100 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.28.2 to 1.29.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.28.2...tokio-1.29.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-06-28 13:18:08 +00:00
Carol (Nichols \|\| Goulding)	62ba18171a	feat: Add a new hash column on the partition and parquet file tables This will hold the deterministic ID for partitions. Until all existing partitions have this value, this is optional/nullable. The row ID still exists and is used as the main foreign key in the parquet_file and skipped_compaction tables. The hash_id has a unique index so that we can look up records based on it (if it's available). If the parquet file record has a partition_hash_id value, use that to generate the object storage path instead of the partition_id.	2023-06-22 09:01:22 -04:00
Dom Dwyer	928a4d163e	build: remove unused dependencies from crates This commit fixes loads of crates (47!) had unused dependencies, or mis-configured dependencies (test deps as normal deps). I added the "unused_crate_dependencies" to all crates to help prevent this mess from growing again! https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html This has the minor downside of false-positives when specifying dev-dependencies for test/bench binaries - these are files in /test or /benches (not normal tests). This commit includes a workaround, importing them in lib.rs (gated by a feature flag). I think the trade-off of better dependency management is worth it!	2023-05-23 14:55:43 +02:00
Andrew Lamb	6344fe8c3f	chore: Add rationale for `clippy::future_not_send` (#7822 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-18 16:58:56 +00:00
Carol (Nichols \|\| Goulding)	7268ea5c29	refactor: Extract a test helper function to create a basic table	2023-05-15 14:31:24 -04:00
Carol (Nichols \|\| Goulding)	57bedb1c2d	refactor: Extract a test helper function to create a basic namespace	2023-05-15 14:20:38 -04:00
Kaya Gökalp	5fe8affb18	refactor: accept NamespaceName with Namespace create (#7774 ) Co-authored-by: Dom <dom@itsallbroken.com>	2023-05-15 10:03:55 +00:00
Carol (Nichols \|\| Goulding)	b0959667d5	fix: Move topic and query pool within iox catalog (#7734 ) Still insert them into the database and associate them with namespaces, but don't ever query them back out. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-04 13:45:56 +00:00
Carol (Nichols \|\| Goulding)	621caab2e9	fix: Remove unused parquet_max_sequence_number metadata	2023-05-03 10:57:27 -04:00
Carol (Nichols \|\| Goulding)	038f8e9ce0	fix: Move shard concepts into only the catalog This still inserts the shard id into the database, always set to the TRANSITION_SHARD_ID, but never reads it back out again.	2023-04-26 11:42:32 -04:00
Carol (Nichols \|\| Goulding)	53196870a5	test: Remove shard variation from more tests	2023-04-24 10:08:01 -04:00
Andrew Lamb	20e9c91866	refactor: Use workspace dependencies for `tonic`, `tonic-build`, etc (#7515 ) * refactor: Use workspace dependencies for `tonic`, `tonic-build`, etc * chore: Run cargo hakari tasks --------- Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-04-12 16:07:19 +00:00
Carol (Nichols \|\| Goulding)	faae5eb438	chore: Rerun cargo hakari manage-deps	2023-02-27 11:56:15 +01:00
Carol (Nichols \|\| Goulding)	65ba208f88	fix: Remove shard_id from the Parquet File protobuf in the catalog service	2023-02-17 13:53:03 -05:00
Carol (Nichols \|\| Goulding)	20250d883e	fix: Remove shard_id from the catalog service Partition	2023-02-17 12:56:51 -05:00
Dom Dwyer	2d46a364dc	feat: namespace soft-delete support This commit adds initial support for "soft" namespace deletion, where the actual records & data remain, but are no longer queryable / writeable. Soft deletion is eventually consistent - users can expect to continue writing to and reading from a bucket after issuing a soft delete call, until the various components either restart, or have their caches flushed. The components treat soft-deleted namespaces differently: * router: ignore soft deleted namespaces * ingester: accept soft deleted namespaces * compactor: accept soft deleted namespaces * querier: ignore soft deleted namespaces * various gRPC services: ignore soft deleted namespaces This ensures that the ingester & compactor do not see rows "vanishing" from the database, and continue to make forward progress. Writes for the deleted namespace that are buffered in the ingester will be persisted as normal, allowing us to support "un-delete" operations where the system is restored to a the state at which the delete was issued (rather than loosing the buffered data). Follow-on work is required to ensure GC drops the orphaned parquet files after the configured GC time, and optimisations such as not compacting parquet from soft-deleted namespaces seems like a trivial win.	2023-02-13 12:01:35 +01:00
Nga Tran	b8a80869d4	feat: introduce a new way of max_sequence_number for ingester, compactor and querier (#6692 ) * feat: introduce a new way of max_sequence_number for ingester, compactor and querier * chore: cleanup * feat: new column max_l0_created_at to order files for deduplication * chore: cleanup * chore: debug info for chnaging cpu.parquet * fix: update test parquet file Co-authored-by: Marco Neumann <marco@crepererum.net>	2023-01-26 10:52:47 +00:00
Carol (Nichols \|\| Goulding)	adc5c2bf06	feat: Add a gRPC API to the catalog service to get Parquet files by namespace Tests that write line protocol (that may contain writes to multiple tables) need to be able to see when new Parquet files are saved.	2023-01-11 11:41:09 -05:00
Nga Tran	49a9565240	feat: gRPC that creates namespace (#6103 ) * feat: create namespace API call in router Co-authored-by: Nga Tran <nga-tran@live.com> * chore: treat retention as ns except in CLI * fix: overflow in nanosecond calc * fix: retention test after changing it from hours to ns * chore: comment clarification in cli; better response type for error in ns API * fix: correct some rebase mistakes * chore: merge namespace create & create_with_retention; renamed ns create test helper fn & const * fix: ns autocreation test was wrong after rebase * fix: mem catalog has default 1hr retention, accidently removed in rebase * chore: remove mem catalogs default 1hr retention; make it settable in sets & router Co-authored-by: Luke Bond <luke.n.bond@gmail.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-18 13:02:12 +00:00
Nga Tran	9c4266c503	refactor: first step to remove unused retention_duration (#6113 ) * refactor: first step to remove unused retention_duration * refactor: remove retenion_duration from update catalog Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-11 15:21:06 +00:00
Carol (Nichols \|\| Goulding)	dad1ad1318	feat: Add the catalog service to ingester, querier, and compactor So that `remote get` that uses the catalog service can work no matter what kind of server you contact.	2022-10-28 10:49:26 -04:00
Carol (Nichols \|\| Goulding)	ace497d47c	fix: Rename database to namespace in the commands I just added	2022-10-27 10:40:39 -04:00
Carol (Nichols \|\| Goulding)	de2ae6f557	feat: MVP of remote store get-table command	2022-10-26 13:50:03 -04:00
Carol (Nichols \|\| Goulding)	2e83e04eab	feat: Use workspace package metadata to reduce differences and repetition	2022-10-24 13:04:09 -04:00
Dom Dwyer	cd4087e00d	style: add no todo!() or dbg!() lints Some crates had theme, some not - lets be consistent and have the compiler spot dbg!() and todo!() macro calls - they should never be in prod code!	2022-09-29 13:10:07 +02:00
Andrew Lamb	d3278ea490	fix: Update service_grpc_catalog/src/lib.rs Co-authored-by: Marco Neumann <marco@crepererum.net>	2022-09-06 07:44:08 -04:00
Juul Christiaens	8b419ecd84	refactor: changed iox_shared to iox-shared changed io_shared to iox-shared in the following files: update_catalog.rs, partition.rs, lib.rs (in the service_grpc_catalog folder) and lib.rs (in the service_grpc_object_store folder).	2022-09-04 07:59:07 -04:00
Carol (Nichols \|\| Goulding)	58f0b63cdc	refactor: Rename KafkaTopic to Topic or TopicMetadata or topic name as appropriate	2022-08-29 14:27:02 -04:00
Carol (Nichols \|\| Goulding)	74c9529062	fix: Rename KafkaPartition to ShardIndex	2022-08-29 14:07:18 -04:00
Carol (Nichols \|\| Goulding)	698f1a47ff	refactor: Rename test structures from sequencer to shard where appropriate	2022-08-29 14:06:44 -04:00
Jake Goulding	4abf21c724	refactor: Rename Sequencer (and its entourage) to Shard	2022-08-29 14:06:43 -04:00
Andrew Lamb	16ddc5efc6	chore: Update datafusion / arrow/parquet/arrow-flight and prost/tonic ecosystem (#5360 ) * chore: Update datafusion and arrow * chore: Update Cargo.lock * chore: update to Decimal128 * chore: Update tonic/prost/pbjson/etc * chore: Run cargo hakari tasks * fix: doctest in generated types Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-08-09 17:30:44 +00:00
Nga Tran	69cb3f2b19	refactor: remove min_sequence_number from Compactor and Querier, add `count_by_overlaps_with_level_0` and `count_by_overlaps_with_level_1` to catalog (#5151 ) * refactor: remove min_sequnce_number * fix: typos * fix: remove min_sequencer_number from new files from merging main * fix: add back throwing error if the compactor compacts files persisted by the ingester after the ingester sends max seq_num back to querier * test: add test_compactor_collision back but modify the input to make it work woth new changes Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-21 13:51:54 +00:00
Carol (Nichols \|\| Goulding)	61c023139b	refactor: Switch compaction levels to an enum with values rather than separate consts Bonuses: - Type checking - Validation - Less casting - Exhaustiveness checking - Less use of the numerical value	2022-07-13 11:30:36 -04:00
Marco Neumann	be53716e4d	refactor: use IDs for `parquet_file.column_set` (#4965 ) * feat: `ColumnRepo::list_by_table_id` * refactor: use IDs for `parquet_file.column_set` Closes #4959. * refactor: introduce `TableSchema::column_id_map`	2022-06-30 15:08:41 +00:00
Nga Tran	cfcc4b8426	refactor: change level 1 to level 2 preparing for next design changes (#4954 ) * refactor: change level 1 to level 2 preparing for next design changes * fix: make level-2 consistent everywhere * chore: remove unused comments * refactor: change all the name level_1 to level_2 to completely replace 1 with 2 to amke everything consistent * chore: add correspinding constants for the comapction levels in the comments Co-authored-by: Dom <dom@itsallbroken.com>	2022-06-29 14:08:58 +00:00
Marco Neumann	215f297162	refactor: parquet file metadata from catalog (#4949 ) * refactor: remove `ParquetFileWithMetadata` * refactor: remove `ParquetFileRepo::parquet_metadata` * refactor: parquet file metadata from catalog Closes #4124.	2022-06-27 15:38:39 +00:00
Marco Neumann	c3912e34e9	refactor: store per-file column set in catalog (#4908 ) * refactor: store per-file column set in catalog Together with the table-wide schema and the partition-wide sort key, this should be everything we need to read a parquet file directly into memory without peeking any file-level metadata. The querier will use this to directly load parquet files into the read buffer. WARNING: This requires a catalog wipe! Ref #4124. * refactor: use proper `ColumnSet` type	2022-06-21 10:26:12 +00:00
Marco Neumann	0fbff981ec	chore(deps): Bump sqlx to 0.6.0 and uuid to 1 (#4894 ) Closes #4889. Closes #4890. Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-17 10:28:28 +00:00
Andrew Lamb	005610b172	refactor: remove some `&` use in iox_catalog (#4862 ) * refactor: remove some `&` use in iox_catalog * fix: Update data_types/src/lib.rs	2022-06-15 11:31:49 +00:00
Dom Dwyer	b41ea1d718	refactor: PartitionKey type This commit changes the code base to use a new reference-counted PartitionKey type wrapper, instead of passing a bare String around. This allows the compiler to type check & verify usage of the partition key, instead of passing a bare string around. By reference counting the underlying string, we reduce memory usage for some use cases.	2022-06-14 14:47:56 +01:00

1 2

52 Commits (196c589ef64f73677eb3e89e60b219f862bde19a)