influxdb

Commit Graph

Author	SHA1	Message	Date
Nga Tran	dac0db2196	feat: add sort_key_ids into sqlite catalog (#8384 )	2023-08-01 20:15:27 +00:00
Nga Tran	73f38077b6	feat: add sort_key_ids as array of bigints into catalog partition (#8375 ) * feat: add sort_key_ids as array of bigints into catalog partition * chore: add comments * chore: remove comments to avoid changing them in the future due to checksum requirement --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-08-01 14:28:30 +00:00
Marco Neumann	743a59aa64	feat: use single per-migration txn when possible (#8373 ) * test: improve `test_step_sql_statement_no_transaction` * feat: also print number of steps in "applying migration step" * feat: use single per-migration txn when possible If all steps can (and want) to run in a transaction block, then wrap the migration bookkeeping and the migration script into a single transaction. This way we avoid the dirty state altogether because its now an "all or nothing" migration. Note that we still guarantee that there is only a single migration running at the same time due to the locking mechanism. Otherwise we would potentially run into nasty transaction failures during schema modifications. This is related to #7897 but only fixes / self-heals the "dirty" state for transaction that can run in transactions. For concurrent index migrations (which we need in prod) we need to be a bit smarter and this will be done in a follow-up. However I feel that not leaving half-done migrations for the cases where it's technically possible (e.g. adding columns) is already a huge step forward. * test: make `test_migrator_uses_single_transaction_when_possible` harder * test: explain test --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-08-01 08:18:39 +00:00
Carol (Nichols \|\| Goulding)	4a9e76b8b7	feat: Make parquet_file.partition_id optional in the catalog (#8339 ) * feat: Make parquet_file.partition_id optional in the catalog This will acquire a short lock on the table in postgres, per: <https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads> This allows us to persist data for new partitions and associate the Parquet file catalog records with the partition records using only the partition hash ID, rather than both that are used now. * fix: Support transition partition ID in the catalog service * fix: Use transition partition ID in import/export This commit also removes support for the `--partition-id` flag of the `influxdb_iox remote store get-table` command, which Andrew approved. The `--partition-id` filter was getting the results of the catalog gRPC service's query for Parquet files of a table and then keeping only the files whose partition IDs matched. The gRPC query is no longer returning the partition ID from the Parquet file table, and really, this command should instead be using `GetParquetFilesByPartitionId` to only request what's needed rather than filtering. * feat: Support looking up Parquet files by either kind of Partition id Regardless of which is actually stored on the Parquet file record. That is, say there's a Partition in the catalog with: Partition { id: 3, hash_id: abcdefg, } and a Parquet file that has: ParquetFile { partition_hash_id: abcdefg, } calling `list_by_partition_not_to_delete(PartitionId(3))` should still return this Parquet file because it is associated with the partition that has ID 3. This is important for the compactor, which is currently only dealing in PartitionIds, and I'd like to keep it that way for now to avoid having to change Even More in this PR. * fix: Use and set new partition ID fields everywhere they want to be --------- Co-authored-by: Dom <dom@itsallbroken.com>	2023-07-31 12:40:56 +00:00
Marco Neumann	73339cfc57	fix: remove sqlx "used" metrics (#8336 ) PR #8327 introduced a bunch of metrics for the sqlx connection pool. One of the metrics was the "used" metrics that was supposed to count "currently in use" connection. In prod however this metric underflows to a very large integer. It seems that "acquire" callback is only used by sqlx for re-used connections (i.e. for the transition from "idle" to "used"). Now we could try to work around it but since there is no "close connection" callback, I doubt it it possible to do the accurately. Luckily though we don't really need that counter. sqlx already offers "active" (defined as idle + used) and "idle", so getting "used" is just the difference. I removed the "used" metric nevertheless because "active" and "idle" are read independently from each other (based on atomic integers) and are NOT guaranteed to be in-sync. Calculating the difference within IOx however would give the illusion that they are. So I leave this to the dashboard / alert / whatever, because there it is usually understood that metrics are samples and may be out of sync for a very short time. A nice side effect of this change is that it simplifies the code quite a bit. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-27 10:04:56 +00:00
Marco Neumann	b62e98cef1	feat: metrics for sqlx conn pools (#8327 ) To better gauge how many connections we use and especially if we hit the max connection limit, it would be helpful to actually have some metrics available for the pool usage. This change adds a few basic metrics.	2023-07-25 10:07:25 +00:00
dependabot[bot]	faa8d44492	chore(deps): Bump thiserror from 1.0.43 to 1.0.44 (#8315 ) Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.43 to 1.0.44. - [Release notes](https://github.com/dtolnay/thiserror/releases) - [Commits](https://github.com/dtolnay/thiserror/compare/1.0.43...1.0.44) --- updated-dependencies: - dependency-name: thiserror dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-24 10:18:44 +00:00
dependabot[bot]	cd31492e5b	chore(deps): Bump async-trait from 0.1.71 to 0.1.72 (#8317 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.71 to 0.1.72. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.71...0.1.72) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-24 10:07:18 +00:00
Joe-Blount	629f9d20db	fix: update new_file_at following all compactions	2023-07-20 13:27:54 -05:00
Fraser Savage	e894ea73f7	refactor(catalog): Allow kafka columns to be nullable	2023-07-20 11:18:02 +01:00
Marco Neumann	4e88571142	feat: add batch partition getters (#8268 )	2023-07-19 15:05:41 +00:00
Marco Neumann	004b401a05	chore: upgrade to sqlx 0.7.1 (#8266 ) There are a bunch of dependencies in `Cargo.lock` that are related to mysql. These are NOT compiled at all, and are also not part of `cargo tree`. The reason for the inclusion is a bug in cargo: https://github.com/rust-lang/cargo/issues/10801 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-19 12:18:57 +00:00
dependabot[bot]	e33a078128	chore(deps): Bump paste from 1.0.13 to 1.0.14 (#8244 ) Bumps [paste](https://github.com/dtolnay/paste) from 1.0.13 to 1.0.14. - [Release notes](https://github.com/dtolnay/paste/releases) - [Commits](https://github.com/dtolnay/paste/compare/1.0.13...1.0.14) --- updated-dependencies: - dependency-name: paste dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-07-17 16:10:02 +00:00
Carol (Nichols \|\| Goulding)	f20e9e6368	fix: Add index on parquet_file.partition_hash_id for lookup perf	2023-07-10 13:40:03 -04:00
Carol (Nichols \|\| Goulding)	22c17fb970	feat: Abstract over which partition ID type we're using to list Parquet files	2023-07-10 13:40:01 -04:00
Carol (Nichols \|\| Goulding)	c1e42651ec	feat: Abstract over which partition ID type we're using to compare and swap sort keys	2023-07-10 13:39:19 -04:00
Carol (Nichols \|\| Goulding)	eec31b7f00	feat: Abstract over which partition ID type we're using to get a partition from the catalog	2023-07-10 10:43:20 -04:00
Joe-Blount	c2442c31f3	chore: create partition table index for created_at	2023-07-07 16:27:05 -05:00
dependabot[bot]	8b000862e1	chore(deps): Bump pretty_assertions from 1.3.0 to 1.4.0 (#8182 ) Bumps [pretty_assertions](https://github.com/rust-pretty-assertions/rust-pretty-assertions) from 1.3.0 to 1.4.0. - [Release notes](https://github.com/rust-pretty-assertions/rust-pretty-assertions/releases) - [Changelog](https://github.com/rust-pretty-assertions/rust-pretty-assertions/blob/main/CHANGELOG.md) - [Commits](https://github.com/rust-pretty-assertions/rust-pretty-assertions/compare/v1.3.0...v1.4.0) --- updated-dependencies: - dependency-name: pretty_assertions dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-07-07 09:35:18 +00:00
dependabot[bot]	057ee40cb9	chore(deps): Bump thiserror from 1.0.41 to 1.0.43 (#8181 ) Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.41 to 1.0.43. - [Release notes](https://github.com/dtolnay/thiserror/releases) - [Commits](https://github.com/dtolnay/thiserror/compare/1.0.41...1.0.43) --- updated-dependencies: - dependency-name: thiserror dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-07-07 09:25:12 +00:00
dependabot[bot]	26a6113a37	chore(deps): Bump async-trait from 0.1.70 to 0.1.71 (#8163 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.70 to 0.1.71. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.70...0.1.71) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-06 09:58:51 +00:00
dependabot[bot]	3827257f94	chore(deps): Bump thiserror from 1.0.40 to 1.0.41 (#8149 ) Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.40 to 1.0.41. - [Release notes](https://github.com/dtolnay/thiserror/releases) - [Commits](https://github.com/dtolnay/thiserror/compare/1.0.40...1.0.41) --- updated-dependencies: - dependency-name: thiserror dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dom <dom@itsallbroken.com>	2023-07-05 09:25:14 +00:00
Marco Neumann	9c65185068	refactor: normalize catalog metric names (#8152 ) Use the same prefix for all metrics of the same repo type. This makes reading dashboards way easier. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-05 09:18:39 +00:00
dependabot[bot]	b5c9628f0f	chore(deps): Bump async-trait from 0.1.69 to 0.1.70 (#8148 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.69 to 0.1.70. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.69...0.1.70) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-05 09:05:13 +00:00
dependabot[bot]	9a03d9c9fe	chore(deps): Bump paste from 1.0.12 to 1.0.13 (#8139 ) Bumps [paste](https://github.com/dtolnay/paste) from 1.0.12 to 1.0.13. - [Release notes](https://github.com/dtolnay/paste/releases) - [Commits](https://github.com/dtolnay/paste/compare/1.0.12...1.0.13) --- updated-dependencies: - dependency-name: paste dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-07-04 07:57:41 +00:00
dependabot[bot]	b15c6062a9	chore(deps): Bump tokio from 1.28.2 to 1.29.0 (#8100 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.28.2 to 1.29.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.28.2...tokio-1.29.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-06-28 13:18:08 +00:00
Carol (Nichols \|\| Goulding)	60d0858381	feat: Add catalog method for looking up partitions by their hash ID (#8018 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-23 14:42:50 +00:00
Carol (Nichols \|\| Goulding)	62ab8d21c2	fix: Eliminate need to have 2 separate insert statements depending on presence of hash ID I figured out that the reason inserting `Option<PartitionHashId>` was giving me a compiler error that `Encode` wasn't implemented was because I only implemented `Encode` for `&PartitionHashId` and sqlx only implements `Encode` for `Option<T: Encode>`, not `Option<T> where &T: Encode`. Using `as_ref` makes this work and gets rid of the `match` that created two different queries (one of which was wrong!) Also add tests that we can insert Parquet file records for partitions that don't have hash IDs to ensure we don't break ingest of new data for old-style partitions.	2023-06-22 09:01:22 -04:00
Carol (Nichols \|\| Goulding)	bffb2f8f9f	fix: Specialize Partition constructors to clarify appropriate usage	2023-06-22 09:01:22 -04:00
Carol (Nichols \|\| Goulding)	62ba18171a	feat: Add a new hash column on the partition and parquet file tables This will hold the deterministic ID for partitions. Until all existing partitions have this value, this is optional/nullable. The row ID still exists and is used as the main foreign key in the parquet_file and skipped_compaction tables. The hash_id has a unique index so that we can look up records based on it (if it's available). If the parquet file record has a partition_hash_id value, use that to generate the object storage path instead of the partition_id.	2023-06-22 09:01:22 -04:00
Phil Bracikowski	0c9cc2c1ed	chore(garbage-collector): increase batch size for clearing deleted files (#8009 ) This pr increases the const for the number of parquet files to remove from the catalog that are soft deleted and older than a configurable cutoff. * closes #8008 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-16 15:22:13 +00:00
Phil Bracikowski	e34ec77e8d	feat(garbage-collector): batch parquet existence checks to catalog (#7964 ) * feat(garbage-collector): batch parquet existence checks to catalog The core feature of this PR is batching the existence checks of parquet files in object store against the catalog. Before, there was 1 catalog query per each parquet file in object store. This can be a lot of requests. This PR can perform one query of at most 100 parquet file uuids against the catalog in one query. A hundred seems like a decent starting place. The batch may not reach 100 because there is also a timeout on receiving object store meta objects from the object store lister thread. That timeout is set to 100 milliseconds. If more than 100 are received, they are batched into 100 for the catalog. Additionally, this PR includes surrounding code changes to make it more idiomatic (but not perfect). It follows up some suggested work from #7652 for watching for shutdown on the threads. * fixes #7784 * use hashset instead of vec to test for contains * chore: add test for db failure path * remove ParquetFileExistsByOSID and other single field structs that are just for sql deserialization; map to uuid explicitly * fix the sqlite query by using a blob literal X'<hex>' for uuids * comment clarifications * adjust loggings to warn from debug for expected rare events Many thanks to Carol for help implementing this!	2023-06-14 07:59:00 -07:00
Carol (Nichols \|\| Goulding)	566ec68c58	refactor: Extract a test helper method for creating ParquetFileParams (#7959 ) Co-authored-by: Dom <dom@itsallbroken.com>	2023-06-09 09:44:30 +00:00
Marko Mikulicic	d26ad8e079	feat: Allow passing service protection limits in create db gRPC call (#7941 ) * feat: Allow passing service protection limits in create db gRPC call * fix: Move the impl into the catalog namespace trait --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-08 14:28:32 +00:00
Phil Bracikowski	92a83270f3	fix(garbage-collector): just test parquet file exists (#7948 ) * fix(garbage-collector): just test parquet file existence The GC, when checking files in object store against the catalog, only cares if the parquet file for the given object store id exists in the catalog. It doesn't need the full parquet file. Let's not transmit it over the wire. This PR uses a SELECT 1 and boolean to test for parquet file existing. * helps #7784 * chore: use struct for from_row * chore: satisfy clippy * chore: fmt	2023-06-07 15:12:48 -07:00
Carol (Nichols \|\| Goulding)	2becc950e1	fix: Use expect rather than returning error in a theoretically impossible case	2023-06-07 11:38:12 -04:00
Carol (Nichols \|\| Goulding)	ac26ceef91	feat: Make a place to do partition template validation - Create data_types::partition_template::ValidationError - Make creation of NamespacePartitionTemplateOverride and TablePartitionTemplateOverride fallible - Move SerializationWrapper into a module to make its inner field private to force creation through one fallible constructor; this is where the validation logic will go to be shared among all uses of partition templates	2023-06-07 11:38:12 -04:00
Carol (Nichols \|\| Goulding)	7f1d4a5bcd	fix: Create tag columns used in table partition template in a transaction with table create	2023-06-05 11:21:33 -04:00
Carol (Nichols \|\| Goulding)	bf69f17b3f	test: Add checks for tag columns being added by table creation to the catalog tests These currently fail because the implementation still only exists in the table grpc service.	2023-06-05 10:24:45 -04:00
Marco Neumann	86a2c249ec	refactor: faster PG `ParquetFileRepo` (#7907 ) * refactor: remove `ParquetFileRepo::flag_for_delete` * refactor: batch update parquet files in catalog * refactor: avoid data roundtrips through postgres * refactor: do not return ID from PG when we do not need it --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-01 16:17:28 +00:00
Marco Neumann	551e838db3	refactor: remove unused PG indices (#7905 ) Similar to #7859. To test index usage, execute the following query on the writer replica: ```sql SELECT n.nspname AS namespace_name, t.relname AS table_name, pg_size_pretty(pg_relation_size(t.oid)) AS table_size, t.reltuples::bigint AS num_rows, psai.indexrelname AS index_name, pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size, CASE WHEN i.indisunique THEN 'Y' ELSE 'N' END AS "unique", psai.idx_scan AS number_of_scans, psai.idx_tup_read AS tuples_read, psai.idx_tup_fetch AS tuples_fetched FROM pg_index i INNER JOIN pg_class t ON t.oid = i.indrelid INNER JOIN pg_namespace n ON n.oid = t.relnamespace INNER JOIN pg_stat_all_indexes psai ON i.indexrelid = psai.indexrelid WHERE n.nspname = 'iox_catalog' AND t.relname = 'parquet_file' ORDER BY 1, 2, 5; ```` Data for eu-west-1 at `2023-05-31T16:30:00Z`: ```text namespace_name \| table_name \| table_size \| num_rows \| index_name \| index_size \| unique \| number_of_scans \| tuples_read \| tuples_fetched ----------------+--------------+------------+-----------+-----------------------------------+------------+--------+-----------------+----------------+---------------- iox_catalog \| parquet_file \| 38 GB \| 146489216 \| parquet_file_deleted_at_idx \| 6442 MB \| N \| 1693534991 \| 21602734184385 \| 21694365037 iox_catalog \| parquet_file \| 38 GB \| 146489216 \| parquet_file_partition_delete_idx \| 20 MB \| N \| 17854904 \| 3087700816 \| 384603858 iox_catalog \| parquet_file \| 38 GB \| 146489216 \| parquet_file_partition_idx \| 2325 MB \| N \| 1627977474 \| 12604272924323 \| 11088781876397 iox_catalog \| parquet_file \| 38 GB \| 146489216 \| parquet_file_pkey \| 8290 MB \| Y \| 480767174 \| 481021514 \| 480733966 iox_catalog \| parquet_file \| 38 GB \| 146489216 \| parquet_file_table_delete_idx \| 174 MB \| N \| 1006563 \| 24687617719 \| 385132581 iox_catalog \| parquet_file \| 38 GB \| 146489216 \| parquet_file_table_idx \| 1905 MB \| N \| 9288042 \| 351240529272 \| 27551 iox_catalog \| parquet_file \| 38 GB \| 146489216 \| parquet_location_unique \| 6076 MB \| Y \| 385294957 \| 109448 \| 109445 ```` and at `2023-06-01T13:00:00Z`: ```text namespace_name \| table_name \| table_size \| num_rows \| index_name \| index_size \| unique \| number_of_scans \| tuples_read \| tuples_fetched ----------------+--------------+------------+-----------+-----------------------------------+------------+--------+-----------------+----------------+---------------- iox_catalog \| parquet_file \| 43 GB \| 152684560 \| parquet_file_deleted_at_idx \| 6976 MB \| N \| 1693535032 \| 21602834620294 \| 21736731439 iox_catalog \| parquet_file \| 43 GB \| 152684560 \| parquet_file_partition_delete_idx \| 21 MB \| N \| 31468423 \| 7397141567 \| 677909956 iox_catalog \| parquet_file \| 43 GB \| 152684560 \| parquet_file_partition_idx \| 2464 MB \| N \| 1627977474 \| 12604272924323 \| 11088781876397 iox_catalog \| parquet_file \| 43 GB \| 152684560 \| parquet_file_pkey \| 8785 MB \| Y \| 492762975 \| 493017342 \| 492729691 iox_catalog \| parquet_file \| 43 GB \| 152684560 \| parquet_file_table_delete_idx \| 241 MB \| N \| 1136317 \| 24735561304 \| 429892231 iox_catalog \| parquet_file \| 43 GB \| 152684560 \| parquet_file_table_idx \| 2058 MB \| N \| 9288042 \| 351240529272 \| 27551 iox_catalog \| parquet_file \| 43 GB \| 152684560 \| parquet_location_unique \| 6776 MB \| Y \| 399142416 \| 124810 \| 124807 ```` Due to #7842 and #7894, the following indices are no longer used: - `parquet_file_partition_idx` - `parquet_file_table_idx`	2023-06-01 13:45:05 +00:00
Marco Neumann	e14305ac33	feat: add index for compactor (#7894 ) * fix: migration name * feat: add index for compactor	2023-05-31 12:29:00 +00:00
Marco Neumann	e1c1908a0b	refactor: add `parquet_file` PG index for querier (#7842 ) * refactor: add `parquet_file` PG index for querier Currently the `list_by_table_not_to_delete` catalog query is somewhat expensive: ```text iox_catalog_prod=> select table_id, sum((to_delete is NULL)::int) as n from parquet_file group by table_id order by n desc limit 5; table_id \| n ----------+------ 1489038 \| 7221 1489037 \| 7019 1491534 \| 5793 1491951 \| 5522 1513377 \| 5339 (5 rows) iox_catalog_prod=> EXPLAIN ANALYZE SELECT id, namespace_id, table_id, partition_id, object_store_id, min_time, max_time, to_delete, file_size_bytes, row_count, compaction_level, created_at, column_set, max_l0_created_at FROM parquet_file WHERE table_id = 1489038 AND to_delete IS NULL; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on parquet_file (cost=46050.91..47179.26 rows=283 width=200) (actual time=464.368..472.514 rows=7221 loops=1) Recheck Cond: ((table_id = 1489038) AND (to_delete IS NULL)) Heap Blocks: exact=7152 -> BitmapAnd (cost=46050.91..46050.91 rows=283 width=0) (actual time=463.341..463.343 rows=0 loops=1) -> Bitmap Index Scan on parquet_file_table_idx (cost=0.00..321.65 rows=22545 width=0) (actual time=1.674..1.674 rows=7221 loops=1) Index Cond: (table_id = 1489038) -> Bitmap Index Scan on parquet_file_deleted_at_idx (cost=0.00..45728.86 rows=1525373 width=0) (actual time=460.717..460.717 rows=4772117 loops=1) Index Cond: (to_delete IS NULL) Planning Time: 0.092 ms Execution Time: 472.907 ms (10 rows) ``` I think this may also be because PostgreSQL kinda chooses the wrong strategy, because it could just look at the existing index and filter from there: ```text iox_catalog_prod=> EXPLAIN ANALYZE SELECT id, namespace_id, table_id, partition_id, object_store_id, min_time, max_time, to_delete, file_size_bytes, row_count, compaction_level, created_at, column_set, max_l0_created_at FROM parquet_file WHERE table_id = 1489038; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using parquet_file_table_idx on parquet_file (cost=0.57..86237.78 rows=22545 width=200) (actual time=0.057..6.994 rows=7221 loops=1) Index Cond: (table_id = 1489038) Planning Time: 0.094 ms Execution Time: 7.297 ms (4 rows) ``` However PostgreSQL doesn't know the cardinalities well enough. So let's add a dedicated index to make the querier faster. * feat: new migration system * docs: explain dirty migrations	2023-05-31 10:56:32 +00:00
Dom Dwyer	9e0570f2bf	refactor: explicit submod for partition_template Move the import into the submodule itself, rather than re-exporting it at the crate level. This will make it possible to link to the specific module/logic.	2023-05-30 15:13:20 +02:00
Dom Dwyer	2094b45c10	refactor(catalog): mark metrics() as test only This method is used to enable tests - it's never intended to be used in production code to access the underlying metric registry. The Catalog trait is responsible for Catalog things, not acting as a dependency injection for metrics. The only current use of this is in test code, so no changes needed.	2023-05-24 17:38:10 +02:00
Carol (Nichols \|\| Goulding)	d91b75526f	fix: Clarify that the expect is on the Option, not the Result	2023-05-24 10:36:52 -04:00
Carol (Nichols \|\| Goulding)	efc817c2a8	fix: Remove From impl, leaving TablePartitionTemplateOverride::new as only creation mechanism This makes it clearer that you do or do not have a custom table override (in the first argument to `new`).	2023-05-24 10:36:52 -04:00
Carol (Nichols \|\| Goulding)	46f7e3e48a	fix: Handle potential for data race in catalog table insertion by re-fetching if detected	2023-05-24 10:36:52 -04:00
Carol (Nichols \|\| Goulding)	90cb4b6ed9	refactor: Extract a function for handling a table missing from the namespace cache	2023-05-24 10:36:52 -04:00
Carol (Nichols \|\| Goulding)	73b09d895f	feat: Store and handle NULL partition_template database values Treat them as the default partition template in the application, but save space and avoid having to backfill the tables by having the database values be NULL when no custom template has been specified.	2023-05-24 10:36:52 -04:00

1 2 3 4 5 ...

443 Commits (dac0db21960c871c298924269d198a8b01849724)