influxdb

Commit Graph

Author	SHA1	Message	Date
Marco Neumann	743a59aa64	feat: use single per-migration txn when possible (#8373 ) * test: improve `test_step_sql_statement_no_transaction` * feat: also print number of steps in "applying migration step" * feat: use single per-migration txn when possible If all steps can (and want) to run in a transaction block, then wrap the migration bookkeeping and the migration script into a single transaction. This way we avoid the dirty state altogether because its now an "all or nothing" migration. Note that we still guarantee that there is only a single migration running at the same time due to the locking mechanism. Otherwise we would potentially run into nasty transaction failures during schema modifications. This is related to #7897 but only fixes / self-heals the "dirty" state for transaction that can run in transactions. For concurrent index migrations (which we need in prod) we need to be a bit smarter and this will be done in a follow-up. However I feel that not leaving half-done migrations for the cases where it's technically possible (e.g. adding columns) is already a huge step forward. * test: make `test_migrator_uses_single_transaction_when_possible` harder * test: explain test --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-08-01 08:18:39 +00:00
Carol (Nichols \|\| Goulding)	4a9e76b8b7	feat: Make parquet_file.partition_id optional in the catalog (#8339 ) * feat: Make parquet_file.partition_id optional in the catalog This will acquire a short lock on the table in postgres, per: <https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads> This allows us to persist data for new partitions and associate the Parquet file catalog records with the partition records using only the partition hash ID, rather than both that are used now. * fix: Support transition partition ID in the catalog service * fix: Use transition partition ID in import/export This commit also removes support for the `--partition-id` flag of the `influxdb_iox remote store get-table` command, which Andrew approved. The `--partition-id` filter was getting the results of the catalog gRPC service's query for Parquet files of a table and then keeping only the files whose partition IDs matched. The gRPC query is no longer returning the partition ID from the Parquet file table, and really, this command should instead be using `GetParquetFilesByPartitionId` to only request what's needed rather than filtering. * feat: Support looking up Parquet files by either kind of Partition id Regardless of which is actually stored on the Parquet file record. That is, say there's a Partition in the catalog with: Partition { id: 3, hash_id: abcdefg, } and a Parquet file that has: ParquetFile { partition_hash_id: abcdefg, } calling `list_by_partition_not_to_delete(PartitionId(3))` should still return this Parquet file because it is associated with the partition that has ID 3. This is important for the compactor, which is currently only dealing in PartitionIds, and I'd like to keep it that way for now to avoid having to change Even More in this PR. * fix: Use and set new partition ID fields everywhere they want to be --------- Co-authored-by: Dom <dom@itsallbroken.com>	2023-07-31 12:40:56 +00:00
Marco Neumann	73339cfc57	fix: remove sqlx "used" metrics (#8336 ) PR #8327 introduced a bunch of metrics for the sqlx connection pool. One of the metrics was the "used" metrics that was supposed to count "currently in use" connection. In prod however this metric underflows to a very large integer. It seems that "acquire" callback is only used by sqlx for re-used connections (i.e. for the transition from "idle" to "used"). Now we could try to work around it but since there is no "close connection" callback, I doubt it it possible to do the accurately. Luckily though we don't really need that counter. sqlx already offers "active" (defined as idle + used) and "idle", so getting "used" is just the difference. I removed the "used" metric nevertheless because "active" and "idle" are read independently from each other (based on atomic integers) and are NOT guaranteed to be in-sync. Calculating the difference within IOx however would give the illusion that they are. So I leave this to the dashboard / alert / whatever, because there it is usually understood that metrics are samples and may be out of sync for a very short time. A nice side effect of this change is that it simplifies the code quite a bit. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-27 10:04:56 +00:00
Marco Neumann	b62e98cef1	feat: metrics for sqlx conn pools (#8327 ) To better gauge how many connections we use and especially if we hit the max connection limit, it would be helpful to actually have some metrics available for the pool usage. This change adds a few basic metrics.	2023-07-25 10:07:25 +00:00
Joe-Blount	629f9d20db	fix: update new_file_at following all compactions	2023-07-20 13:27:54 -05:00
Marco Neumann	4e88571142	feat: add batch partition getters (#8268 )	2023-07-19 15:05:41 +00:00
Marco Neumann	004b401a05	chore: upgrade to sqlx 0.7.1 (#8266 ) There are a bunch of dependencies in `Cargo.lock` that are related to mysql. These are NOT compiled at all, and are also not part of `cargo tree`. The reason for the inclusion is a bug in cargo: https://github.com/rust-lang/cargo/issues/10801 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-19 12:18:57 +00:00
Carol (Nichols \|\| Goulding)	22c17fb970	feat: Abstract over which partition ID type we're using to list Parquet files	2023-07-10 13:40:01 -04:00
Carol (Nichols \|\| Goulding)	c1e42651ec	feat: Abstract over which partition ID type we're using to compare and swap sort keys	2023-07-10 13:39:19 -04:00
Carol (Nichols \|\| Goulding)	eec31b7f00	feat: Abstract over which partition ID type we're using to get a partition from the catalog	2023-07-10 10:43:20 -04:00
Marco Neumann	9c65185068	refactor: normalize catalog metric names (#8152 ) Use the same prefix for all metrics of the same repo type. This makes reading dashboards way easier. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-05 09:18:39 +00:00
Carol (Nichols \|\| Goulding)	60d0858381	feat: Add catalog method for looking up partitions by their hash ID (#8018 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-23 14:42:50 +00:00
Carol (Nichols \|\| Goulding)	62ab8d21c2	fix: Eliminate need to have 2 separate insert statements depending on presence of hash ID I figured out that the reason inserting `Option<PartitionHashId>` was giving me a compiler error that `Encode` wasn't implemented was because I only implemented `Encode` for `&PartitionHashId` and sqlx only implements `Encode` for `Option<T: Encode>`, not `Option<T> where &T: Encode`. Using `as_ref` makes this work and gets rid of the `match` that created two different queries (one of which was wrong!) Also add tests that we can insert Parquet file records for partitions that don't have hash IDs to ensure we don't break ingest of new data for old-style partitions.	2023-06-22 09:01:22 -04:00
Carol (Nichols \|\| Goulding)	bffb2f8f9f	fix: Specialize Partition constructors to clarify appropriate usage	2023-06-22 09:01:22 -04:00
Carol (Nichols \|\| Goulding)	62ba18171a	feat: Add a new hash column on the partition and parquet file tables This will hold the deterministic ID for partitions. Until all existing partitions have this value, this is optional/nullable. The row ID still exists and is used as the main foreign key in the parquet_file and skipped_compaction tables. The hash_id has a unique index so that we can look up records based on it (if it's available). If the parquet file record has a partition_hash_id value, use that to generate the object storage path instead of the partition_id.	2023-06-22 09:01:22 -04:00
Phil Bracikowski	0c9cc2c1ed	chore(garbage-collector): increase batch size for clearing deleted files (#8009 ) This pr increases the const for the number of parquet files to remove from the catalog that are soft deleted and older than a configurable cutoff. * closes #8008 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-16 15:22:13 +00:00
Phil Bracikowski	e34ec77e8d	feat(garbage-collector): batch parquet existence checks to catalog (#7964 ) * feat(garbage-collector): batch parquet existence checks to catalog The core feature of this PR is batching the existence checks of parquet files in object store against the catalog. Before, there was 1 catalog query per each parquet file in object store. This can be a lot of requests. This PR can perform one query of at most 100 parquet file uuids against the catalog in one query. A hundred seems like a decent starting place. The batch may not reach 100 because there is also a timeout on receiving object store meta objects from the object store lister thread. That timeout is set to 100 milliseconds. If more than 100 are received, they are batched into 100 for the catalog. Additionally, this PR includes surrounding code changes to make it more idiomatic (but not perfect). It follows up some suggested work from #7652 for watching for shutdown on the threads. * fixes #7784 * use hashset instead of vec to test for contains * chore: add test for db failure path * remove ParquetFileExistsByOSID and other single field structs that are just for sql deserialization; map to uuid explicitly * fix the sqlite query by using a blob literal X'<hex>' for uuids * comment clarifications * adjust loggings to warn from debug for expected rare events Many thanks to Carol for help implementing this!	2023-06-14 07:59:00 -07:00
Carol (Nichols \|\| Goulding)	566ec68c58	refactor: Extract a test helper method for creating ParquetFileParams (#7959 ) Co-authored-by: Dom <dom@itsallbroken.com>	2023-06-09 09:44:30 +00:00
Marko Mikulicic	d26ad8e079	feat: Allow passing service protection limits in create db gRPC call (#7941 ) * feat: Allow passing service protection limits in create db gRPC call * fix: Move the impl into the catalog namespace trait --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-08 14:28:32 +00:00
Phil Bracikowski	92a83270f3	fix(garbage-collector): just test parquet file exists (#7948 ) * fix(garbage-collector): just test parquet file existence The GC, when checking files in object store against the catalog, only cares if the parquet file for the given object store id exists in the catalog. It doesn't need the full parquet file. Let's not transmit it over the wire. This PR uses a SELECT 1 and boolean to test for parquet file existing. * helps #7784 * chore: use struct for from_row * chore: satisfy clippy * chore: fmt	2023-06-07 15:12:48 -07:00
Carol (Nichols \|\| Goulding)	2becc950e1	fix: Use expect rather than returning error in a theoretically impossible case	2023-06-07 11:38:12 -04:00
Carol (Nichols \|\| Goulding)	ac26ceef91	feat: Make a place to do partition template validation - Create data_types::partition_template::ValidationError - Make creation of NamespacePartitionTemplateOverride and TablePartitionTemplateOverride fallible - Move SerializationWrapper into a module to make its inner field private to force creation through one fallible constructor; this is where the validation logic will go to be shared among all uses of partition templates	2023-06-07 11:38:12 -04:00
Carol (Nichols \|\| Goulding)	7f1d4a5bcd	fix: Create tag columns used in table partition template in a transaction with table create	2023-06-05 11:21:33 -04:00
Carol (Nichols \|\| Goulding)	bf69f17b3f	test: Add checks for tag columns being added by table creation to the catalog tests These currently fail because the implementation still only exists in the table grpc service.	2023-06-05 10:24:45 -04:00
Marco Neumann	86a2c249ec	refactor: faster PG `ParquetFileRepo` (#7907 ) * refactor: remove `ParquetFileRepo::flag_for_delete` * refactor: batch update parquet files in catalog * refactor: avoid data roundtrips through postgres * refactor: do not return ID from PG when we do not need it --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-01 16:17:28 +00:00
Marco Neumann	e1c1908a0b	refactor: add `parquet_file` PG index for querier (#7842 ) * refactor: add `parquet_file` PG index for querier Currently the `list_by_table_not_to_delete` catalog query is somewhat expensive: ```text iox_catalog_prod=> select table_id, sum((to_delete is NULL)::int) as n from parquet_file group by table_id order by n desc limit 5; table_id \| n ----------+------ 1489038 \| 7221 1489037 \| 7019 1491534 \| 5793 1491951 \| 5522 1513377 \| 5339 (5 rows) iox_catalog_prod=> EXPLAIN ANALYZE SELECT id, namespace_id, table_id, partition_id, object_store_id, min_time, max_time, to_delete, file_size_bytes, row_count, compaction_level, created_at, column_set, max_l0_created_at FROM parquet_file WHERE table_id = 1489038 AND to_delete IS NULL; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on parquet_file (cost=46050.91..47179.26 rows=283 width=200) (actual time=464.368..472.514 rows=7221 loops=1) Recheck Cond: ((table_id = 1489038) AND (to_delete IS NULL)) Heap Blocks: exact=7152 -> BitmapAnd (cost=46050.91..46050.91 rows=283 width=0) (actual time=463.341..463.343 rows=0 loops=1) -> Bitmap Index Scan on parquet_file_table_idx (cost=0.00..321.65 rows=22545 width=0) (actual time=1.674..1.674 rows=7221 loops=1) Index Cond: (table_id = 1489038) -> Bitmap Index Scan on parquet_file_deleted_at_idx (cost=0.00..45728.86 rows=1525373 width=0) (actual time=460.717..460.717 rows=4772117 loops=1) Index Cond: (to_delete IS NULL) Planning Time: 0.092 ms Execution Time: 472.907 ms (10 rows) ``` I think this may also be because PostgreSQL kinda chooses the wrong strategy, because it could just look at the existing index and filter from there: ```text iox_catalog_prod=> EXPLAIN ANALYZE SELECT id, namespace_id, table_id, partition_id, object_store_id, min_time, max_time, to_delete, file_size_bytes, row_count, compaction_level, created_at, column_set, max_l0_created_at FROM parquet_file WHERE table_id = 1489038; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using parquet_file_table_idx on parquet_file (cost=0.57..86237.78 rows=22545 width=200) (actual time=0.057..6.994 rows=7221 loops=1) Index Cond: (table_id = 1489038) Planning Time: 0.094 ms Execution Time: 7.297 ms (4 rows) ``` However PostgreSQL doesn't know the cardinalities well enough. So let's add a dedicated index to make the querier faster. * feat: new migration system * docs: explain dirty migrations	2023-05-31 10:56:32 +00:00
Dom Dwyer	9e0570f2bf	refactor: explicit submod for partition_template Move the import into the submodule itself, rather than re-exporting it at the crate level. This will make it possible to link to the specific module/logic.	2023-05-30 15:13:20 +02:00
Dom Dwyer	2094b45c10	refactor(catalog): mark metrics() as test only This method is used to enable tests - it's never intended to be used in production code to access the underlying metric registry. The Catalog trait is responsible for Catalog things, not acting as a dependency injection for metrics. The only current use of this is in test code, so no changes needed.	2023-05-24 17:38:10 +02:00
Carol (Nichols \|\| Goulding)	d91b75526f	fix: Clarify that the expect is on the Option, not the Result	2023-05-24 10:36:52 -04:00
Carol (Nichols \|\| Goulding)	efc817c2a8	fix: Remove From impl, leaving TablePartitionTemplateOverride::new as only creation mechanism This makes it clearer that you do or do not have a custom table override (in the first argument to `new`).	2023-05-24 10:36:52 -04:00
Carol (Nichols \|\| Goulding)	46f7e3e48a	fix: Handle potential for data race in catalog table insertion by re-fetching if detected	2023-05-24 10:36:52 -04:00
Carol (Nichols \|\| Goulding)	90cb4b6ed9	refactor: Extract a function for handling a table missing from the namespace cache	2023-05-24 10:36:52 -04:00
Carol (Nichols \|\| Goulding)	73b09d895f	feat: Store and handle NULL partition_template database values Treat them as the default partition template in the application, but save space and avoid having to backfill the tables by having the database values be NULL when no custom template has been specified.	2023-05-24 10:36:52 -04:00
Carol (Nichols \|\| Goulding)	a22d809cdf	test: Create an overridden namespace, and create a table from it (no override), read it back and assert the expected partitioning scheme is derived	2023-05-24 10:34:30 -04:00
Carol (Nichols \|\| Goulding)	2ab3ea03b8	test: Create a default (not overridden) namespace, read it back, assert the expected partitioning scheme is derived	2023-05-24 10:34:30 -04:00
Carol (Nichols \|\| Goulding)	9c0faa66f0	feat: Set a table partition template explicitly or from the namespace And use the table partition template when partitioning writes to that table.	2023-05-24 10:34:30 -04:00
Carol (Nichols \|\| Goulding)	604bab9508	fix: Make Table create_or_get be only create	2023-05-24 10:34:30 -04:00
Carol (Nichols \|\| Goulding)	afb3838437	feat: Optionally supply the namespace partition template when creating a namespace	2023-05-24 10:10:34 -04:00
Marco Neumann	103e814f22	refactor: clean up catalog `parquet_files` interface (#7853 ) * feat: `ParquetFileRepo::list_all` * refactor: remove `ParquetFileRepo::list_by_table` * refactor: simlify `ParquetFileRepo::list_by_table` * refactor: remove `ParquetFileRepo::count` * refactor: remove `ParquetFileRepo::update_compaction_level` * refactor: remove `ParquetFileRepo::exists` * fix: test --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-24 09:15:03 +00:00
Dom Dwyer	928a4d163e	build: remove unused dependencies from crates This commit fixes loads of crates (47!) had unused dependencies, or mis-configured dependencies (test deps as normal deps). I added the "unused_crate_dependencies" to all crates to help prevent this mess from growing again! https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html This has the minor downside of false-positives when specifying dev-dependencies for test/bench binaries - these are files in /test or /benches (not normal tests). This commit includes a workaround, importing them in lib.rs (gated by a feature flag). I think the trade-off of better dependency management is worth it!	2023-05-23 14:55:43 +02:00
Andrew Lamb	6344fe8c3f	chore: Add rationale for `clippy::future_not_send` (#7822 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-18 16:58:56 +00:00
Andrew Lamb	1ff11d0856	refactor: Change catalog configuration so it is entirely dsn based / support end to end testing without postgres (#7736 ) * refactor: Change catalog configuration so it is entirely dsn based / support end to end testing without postgres Restores code from https://github.com/influxdata/influxdb_iox/pull/7708 Revert "revert: PR #7708" This reverts commit `c9cfe05f8d`. * fix: merge * fix: Update new test	2023-05-17 13:36:25 +00:00
Jeffrey Smith II	45628ba4d3	chore: Merge remote-tracking branch 'origin/main' into smith/remove-transactions-main	2023-05-16 08:49:25 -04:00
Carol (Nichols \|\| Goulding)	7268ea5c29	refactor: Extract a test helper function to create a basic table	2023-05-15 14:31:24 -04:00
Carol (Nichols \|\| Goulding)	57bedb1c2d	refactor: Extract a test helper function to create a basic namespace	2023-05-15 14:20:38 -04:00
Jeffrey Smith II	09dc622078	chore: Merge remote-tracking branch 'origin/main' into smith/remove-transactions-main	2023-05-15 10:06:11 -04:00
Kaya Gökalp	5fe8affb18	refactor: accept NamespaceName with Namespace create (#7774 ) Co-authored-by: Dom <dom@itsallbroken.com>	2023-05-15 10:03:55 +00:00
Jeffrey Smith II	e5ee47c32e	chore: code cleanup	2023-05-11 13:21:15 -05:00
Jeffrey Smith II	a9ee1bae6c	chore: Merge remote-tracking branch 'origin/main' into smith/remove-transactions-main	2023-05-09 13:26:02 -05:00
Carol (Nichols \|\| Goulding)	e8a480f5f6	fix: Give up ownership of Column when adding to a table To enable reuse of existing allocations rather than borrowing, creating new allocations, then dropping them.	2023-05-09 14:55:00 +02:00

1 2 3 4 5 ...

346 Commits (dac0db21960c871c298924269d198a8b01849724)