influxdb

Commit Graph

Author	SHA1	Message	Date
Carol (Nichols \|\| Goulding)	4a9e76b8b7	feat: Make parquet_file.partition_id optional in the catalog (#8339 ) * feat: Make parquet_file.partition_id optional in the catalog This will acquire a short lock on the table in postgres, per: <https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads> This allows us to persist data for new partitions and associate the Parquet file catalog records with the partition records using only the partition hash ID, rather than both that are used now. * fix: Support transition partition ID in the catalog service * fix: Use transition partition ID in import/export This commit also removes support for the `--partition-id` flag of the `influxdb_iox remote store get-table` command, which Andrew approved. The `--partition-id` filter was getting the results of the catalog gRPC service's query for Parquet files of a table and then keeping only the files whose partition IDs matched. The gRPC query is no longer returning the partition ID from the Parquet file table, and really, this command should instead be using `GetParquetFilesByPartitionId` to only request what's needed rather than filtering. * feat: Support looking up Parquet files by either kind of Partition id Regardless of which is actually stored on the Parquet file record. That is, say there's a Partition in the catalog with: Partition { id: 3, hash_id: abcdefg, } and a Parquet file that has: ParquetFile { partition_hash_id: abcdefg, } calling `list_by_partition_not_to_delete(PartitionId(3))` should still return this Parquet file because it is associated with the partition that has ID 3. This is important for the compactor, which is currently only dealing in PartitionIds, and I'd like to keep it that way for now to avoid having to change Even More in this PR. * fix: Use and set new partition ID fields everywhere they want to be --------- Co-authored-by: Dom <dom@itsallbroken.com>	2023-07-31 12:40:56 +00:00
Marco Neumann	004b401a05	chore: upgrade to sqlx 0.7.1 (#8266 ) There are a bunch of dependencies in `Cargo.lock` that are related to mysql. These are NOT compiled at all, and are also not part of `cargo tree`. The reason for the inclusion is a bug in cargo: https://github.com/rust-lang/cargo/issues/10801 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-19 12:18:57 +00:00
Carol (Nichols \|\| Goulding)	22c17fb970	feat: Abstract over which partition ID type we're using to list Parquet files	2023-07-10 13:40:01 -04:00
Carol (Nichols \|\| Goulding)	41420cb920	fix: Borrow transition partition ID when possible	2023-06-22 09:01:22 -04:00
Carol (Nichols \|\| Goulding)	62ba18171a	feat: Add a new hash column on the partition and parquet file tables This will hold the deterministic ID for partitions. Until all existing partitions have this value, this is optional/nullable. The row ID still exists and is used as the main foreign key in the parquet_file and skipped_compaction tables. The hash_id has a unique index so that we can look up records based on it (if it's available). If the parquet file record has a partition_hash_id value, use that to generate the object storage path instead of the partition_id.	2023-06-22 09:01:22 -04:00
Phil Bracikowski	e34ec77e8d	feat(garbage-collector): batch parquet existence checks to catalog (#7964 ) * feat(garbage-collector): batch parquet existence checks to catalog The core feature of this PR is batching the existence checks of parquet files in object store against the catalog. Before, there was 1 catalog query per each parquet file in object store. This can be a lot of requests. This PR can perform one query of at most 100 parquet file uuids against the catalog in one query. A hundred seems like a decent starting place. The batch may not reach 100 because there is also a timeout on receiving object store meta objects from the object store lister thread. That timeout is set to 100 milliseconds. If more than 100 are received, they are batched into 100 for the catalog. Additionally, this PR includes surrounding code changes to make it more idiomatic (but not perfect). It follows up some suggested work from #7652 for watching for shutdown on the threads. * fixes #7784 * use hashset instead of vec to test for contains * chore: add test for db failure path * remove ParquetFileExistsByOSID and other single field structs that are just for sql deserialization; map to uuid explicitly * fix the sqlite query by using a blob literal X'<hex>' for uuids * comment clarifications * adjust loggings to warn from debug for expected rare events Many thanks to Carol for help implementing this!	2023-06-14 07:59:00 -07:00
Phil Bracikowski	92a83270f3	fix(garbage-collector): just test parquet file exists (#7948 ) * fix(garbage-collector): just test parquet file existence The GC, when checking files in object store against the catalog, only cares if the parquet file for the given object store id exists in the catalog. It doesn't need the full parquet file. Let's not transmit it over the wire. This PR uses a SELECT 1 and boolean to test for parquet file existing. * helps #7784 * chore: use struct for from_row * chore: satisfy clippy * chore: fmt	2023-06-07 15:12:48 -07:00
Andrew Lamb	17c0d837b3	chore: Update DataFusion, arrow, object_store pins (#7942 ) * chore: Update DataFusion, arrow, object_store pins * chore: Update for hakari * chore: Update for new APIs * fix: update test --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-07 17:08:31 +00:00
dependabot[bot]	d8b06c59c4	chore(deps): Bump once_cell from 1.17.2 to 1.18.0 Bumps [once_cell](https://github.com/matklad/once_cell) from 1.17.2 to 1.18.0. - [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md) - [Commits](https://github.com/matklad/once_cell/compare/v1.17.2...v1.18.0) --- updated-dependencies: - dependency-name: once_cell dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2023-06-05 02:03:15 +00:00
Stuart Carnie	ba27b7760c	chore: remove unused file	2023-06-05 06:57:42 +10:00
Stuart Carnie	a4371254eb	chore: rustfmt	2023-06-04 06:56:28 +10:00
Andrew Lamb	1ff76b7bf2	chore: use workspace dependencies for `object_store`	2023-05-26 07:03:42 -04:00
Dom Dwyer	928a4d163e	build: remove unused dependencies from crates This commit fixes loads of crates (47!) had unused dependencies, or mis-configured dependencies (test deps as normal deps). I added the "unused_crate_dependencies" to all crates to help prevent this mess from growing again! https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html This has the minor downside of false-positives when specifying dev-dependencies for test/bench binaries - these are files in /test or /benches (not normal tests). This commit includes a workaround, importing them in lib.rs (gated by a feature flag). I think the trade-off of better dependency management is worth it!	2023-05-23 14:55:43 +02:00
Andrew Lamb	6344fe8c3f	chore: Add rationale for `clippy::future_not_send` (#7822 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-18 16:58:56 +00:00
Andrew Lamb	1ff11d0856	refactor: Change catalog configuration so it is entirely dsn based / support end to end testing without postgres (#7736 ) * refactor: Change catalog configuration so it is entirely dsn based / support end to end testing without postgres Restores code from https://github.com/influxdata/influxdb_iox/pull/7708 Revert "revert: PR #7708" This reverts commit `c9cfe05f8d`. * fix: merge * fix: Update new test	2023-05-17 13:36:25 +00:00
Carol (Nichols \|\| Goulding)	7268ea5c29	refactor: Extract a test helper function to create a basic table	2023-05-15 14:31:24 -04:00
Carol (Nichols \|\| Goulding)	57bedb1c2d	refactor: Extract a test helper function to create a basic namespace	2023-05-15 14:20:38 -04:00
Kaya Gökalp	5fe8affb18	refactor: accept NamespaceName with Namespace create (#7774 ) Co-authored-by: Dom <dom@itsallbroken.com>	2023-05-15 10:03:55 +00:00
Phil Bracikowski	8b87a10fe0	fix(garbage collector): larger list batches and another tunable (#7738 ) This PR increases the batch/page size of list operations in the gc 10x to 10,000; it introduces a new cli config for the sleep interval between batches. Previously a single sleep config was used between batches and between entirely new list operations. This PR also moves to debug some noisy logging. * tag to #7689	2023-05-04 17:47:28 +00:00
Carol (Nichols \|\| Goulding)	b0959667d5	fix: Move topic and query pool within iox catalog (#7734 ) Still insert them into the database and associate them with namespaces, but don't ever query them back out. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-04 13:45:56 +00:00
Carol (Nichols \|\| Goulding)	621caab2e9	fix: Remove unused parquet_max_sequence_number metadata	2023-05-03 10:57:27 -04:00
Dom Dwyer	c9cfe05f8d	revert: PR #7708 This reverts commit `61abb58933`.	2023-05-02 13:51:30 +02:00
Andrew Lamb	61abb58933	refactor: Change catalog configuration so it is entirely dsn based / support end to end testing without postgres (#7708 ) * refactor: Change catalog configuration so it is entirely dsn based * docs: Add documentation * chore: update docs * chore: review feedback --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-02 10:48:33 +00:00
Carol (Nichols \|\| Goulding)	cd4d961db9	fix: garbage_collector no longer uses chrono-english so it can use the workspace hack crate	2023-05-01 11:31:42 -04:00
Phil Bracikowski	b7bd66195f	chore(garbage collector): improve logging in lister (#7695 ) * follow up to #7689	2023-04-28 16:37:08 +00:00
Phil Bracikowski	fb4083d993	fix(garbage collector): respect lister limit, display source errors (#7689 ) GC object store lister adjustments: the previous take wasn't being respected because of where it was; a chunked list is now used instead. The lister specific errors will now display the source error too. * follow up to #7562 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-04-28 13:25:39 +00:00
Phil Bracikowski	8609015821	chore(garbage collector): backoff first connect to objectstore (#7652 ) * chore(garbage collector): backoff first connect to objectstore This pr replaces the initial sleep added in #7562 with expential backoff retry of connecting to objectstore. This avoids the issue of waiting the configured sleep which can be quite long in a world where the service is getting redeployed often. * for #7562 * chore: rewrite lister::perform based on pr feedback This commit redoes perform as a do..while loop, putting the call to list from the object store at the top of the loop so the infinite backoff retry and be used for each loop iteration - it might fail on more than the first time! There are 3 selects as there are 3 wait stages and each needs to check for shutdown: os list, processing the list, and sleeping on the poll interval. * chore: hoist cancellation check higher; limit listing to 1000 files Responding to PR feedback. * chore: add error info message * chore: make build. :\| * chore: linter	2023-04-27 17:51:52 +00:00
Carol (Nichols \|\| Goulding)	038f8e9ce0	fix: Move shard concepts into only the catalog This still inserts the shard id into the database, always set to the TRANSITION_SHARD_ID, but never reads it back out again.	2023-04-26 11:42:32 -04:00
dependabot[bot]	0b9240cbbe	chore(deps): Bump tokio-util from 0.7.7 to 0.7.8 (#7665 ) Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.7 to 0.7.8. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.7...tokio-util-0.7.8) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-04-26 09:24:39 +00:00
Phil Bracikowski	8e7e22d167	chore(garbage collector): info logging for deleting parquet file (#7638 ) * for influxdata/idpe#17451 * follow up to #7562	2023-04-24 16:57:18 +00:00
Phil Bracikowski	02344af813	fix(garbage collector): decrease info msg; don't loop immediately (#7622 ) * Don't print an info message for each deleted file. This can be 1000s at a time and many more in total. * Even if there are more files to delete, sleep the interval to decrease catalog load. * part of influxdata/idpe#17451	2023-04-21 23:14:33 +00:00
Phil Bracikowski	ec87f356db	chore: rearrange for linter	2023-04-21 12:24:32 -07:00
Phil Bracikowski	420e3d0a70	chore: Merge branch 'main' into pjb-17451-gc-s3-connection-refused * fix conflicts	2023-04-21 11:45:57 -07:00
Phil Bracikowski	f0b9a0b315	chore: respond to pr feedback * remove dry-run catalog method * improve info and debug messages	2023-04-21 10:59:25 -07:00
Armin Primadi	dd54d8b7fe	fix: Garbage collector hangs indefinitely on shutdown (#7567 ) * fix: Garbage collector hangs indefinitely on shutdown * style(garbage_collector): conform to linter and fmt --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-04-19 08:27:38 +00:00
Dom Dwyer	c5bb88e173	chore: remove unused dependencies Some crates import dependencies they never use.	2023-04-18 12:07:13 +02:00
Marco Neumann	808a13cf40	chore: remove `time` 0.1 & fix RUSTSEC-2020-0071 (#7568 ) `time` 0.1 suffers from [RUSTSEC-2020-0071] and many upstream crates have tried to remove it for years. The last dependency is 1. `chrono-english` 2. `chrono` (default features) 3. `chrono` (oldtime) 4. `time` 0.1 `chrono-english` doesn't seem to be super well maintained, but I couldn't find a nice replacement for it. Luckily the master branch of `chrono-english` is already fixed, so let's just directly use that. [RUSTSEC-2020-0071]: https://rustsec.org/advisories/RUSTSEC-2020-0071 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-04-17 12:36:10 +00:00
Phil Bracikowski	1d64cb1b1e	fix(garbage_collector): delay initial s3 checker loop, fix dryrun This PR makes 3 improvements. * It adds the configured sleep interval at the start of the object store checker to avoid issues with making a remote list immediately at startup. We see issues with the s3 api. * the --dry-run flag was stopping deletes of objects from object store, but the retention flagger was still making updates to the catalog. These writes to the catalog are surprising when the --dry-run flag is provided. Now, with --dry-run the catalog is not updated. The logging instead says how many records would be updated because of retention. * It decreases logging in should_delete of the checker as it will be extremely noisey when reporting files it skips. An internal environment has 3.8 million parquet files, most of which would be skipped. * related to #7363 * fixes influxdata/idpe#17451	2023-04-14 17:03:07 -07:00
dependabot[bot]	66982f988b	chore(deps): Bump object_store from 0.5.5 to 0.5.6 (#7433 ) Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.5 to 0.5.6. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md) - [Commits](https://github.com/apache/arrow-rs/commits) --- updated-dependencies: - dependency-name: object_store dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dom <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-04-04 08:43:34 +00:00
dependabot[bot]	3256fcc72e	chore(deps): Bump object_store from 0.5.4 to 0.5.5 Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.4 to 0.5.5. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md) - [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.4...object_store_0.5.5) --- updated-dependencies: - dependency-name: object_store dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2023-03-03 02:00:51 +00:00
dependabot[bot]	0cbd9f6a82	chore(deps): Bump tokio-util from 0.7.5 to 0.7.7 (#6964 ) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-02-13 10:10:53 +00:00
dependabot[bot]	c0c9b51b9e	chore(deps): Bump tokio-util from 0.7.4 to 0.7.5 (#6941 ) Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.4 to 0.7.5. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.4...tokio-util-0.7.5) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-02-10 09:42:00 +00:00
dependabot[bot]	0ecde75af5	chore(deps): Bump object_store from 0.5.3 to 0.5.4 (#6900 ) Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.3 to 0.5.4. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md) - [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.3...object_store_0.5.4) --- updated-dependencies: - dependency-name: object_store dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-02-08 09:40:11 +00:00
Nga Tran	b8a80869d4	feat: introduce a new way of max_sequence_number for ingester, compactor and querier (#6692 ) * feat: introduce a new way of max_sequence_number for ingester, compactor and querier * chore: cleanup * feat: new column max_l0_created_at to order files for deduplication * chore: cleanup * chore: debug info for chnaging cpu.parquet * fix: update test parquet file Co-authored-by: Marco Neumann <marco@crepererum.net>	2023-01-26 10:52:47 +00:00
Andrew Lamb	6caf31acf3	chore: Move garbage collection configuration into clap_blocks (#6678 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-01-25 11:31:48 +00:00
Luke Bond	11fa648116	chore: clarify cli args for iox gc (#6628 ) Signed-off-by: Luke Bond <luke.n.bond@gmail.com> Signed-off-by: Luke Bond <luke.n.bond@gmail.com>	2023-01-19 07:43:29 +00:00
dependabot[bot]	0aacef3c59	chore(deps): Bump once_cell from 1.16.0 to 1.17.0 (#6473 ) * chore(deps): Bump once_cell from 1.16.0 to 1.17.0 Bumps [once_cell](https://github.com/matklad/once_cell) from 1.16.0 to 1.17.0. - [Release notes](https://github.com/matklad/once_cell/releases) - [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md) - [Commits](https://github.com/matklad/once_cell/compare/v1.16.0...v1.17.0) --- updated-dependencies: - dependency-name: once_cell dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * chore: Change once_cell version specifier to major.minor for less churn Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Carol (Nichols \|\| Goulding) <carol.nichols@gmail.com>	2023-01-02 17:07:15 +00:00
dependabot[bot]	1d38d400f0	chore(deps): Bump object_store from 0.5.1 to 0.5.2 (#6339 ) * chore(deps): Bump object_store from 0.5.1 to 0.5.2 Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.1 to 0.5.2. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md) - [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.1...object_store_0.5.2) --- updated-dependencies: - dependency-name: object_store dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * chore: Run cargo hakari tasks Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-12-06 07:53:54 +00:00
Nga Tran	49a9565240	feat: gRPC that creates namespace (#6103 ) * feat: create namespace API call in router Co-authored-by: Nga Tran <nga-tran@live.com> * chore: treat retention as ns except in CLI * fix: overflow in nanosecond calc * fix: retention test after changing it from hours to ns * chore: comment clarification in cli; better response type for error in ns API * fix: correct some rebase mistakes * chore: merge namespace create & create_with_retention; renamed ns create test helper fn & const * fix: ns autocreation test was wrong after rebase * fix: mem catalog has default 1hr retention, accidently removed in rebase * chore: remove mem catalogs default 1hr retention; make it settable in sets & router Co-authored-by: Luke Bond <luke.n.bond@gmail.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-18 13:02:12 +00:00
Nga Tran	9c4266c503	refactor: first step to remove unused retention_duration (#6113 ) * refactor: first step to remove unused retention_duration * refactor: remove retenion_duration from update catalog Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-11 15:21:06 +00:00

1 2

73 Commits (44e266d00036005a0ece1d44d01bd2f0397ff903)