I figured out that the reason inserting `Option<PartitionHashId>` was
giving me a compiler error that `Encode` wasn't implemented was because
I only implemented `Encode` for `&PartitionHashId` and sqlx only
implements `Encode` for `Option<T: Encode>`, not `Option<T> where &T:
Encode`. Using `as_ref` makes this work and gets rid of the `match` that
created two different queries (one of which was wrong!)
Also add tests that we can insert Parquet file records for partitions
that don't have hash IDs to ensure we don't break ingest of new data for
old-style partitions.
This will hold the deterministic ID for partitions.
Until all existing partitions have this value, this is optional/nullable.
The row ID still exists and is used as the main foreign key in the
parquet_file and skipped_compaction tables.
The hash_id has a unique index so that we can look up records based on
it (if it's available).
If the parquet file record has a partition_hash_id value, use that to
generate the object storage path instead of the partition_id.
This pr increases the const for the number of parquet files to remove
from the catalog that are soft deleted and older than a configurable
cutoff.
* closes#8008
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat(garbage-collector): batch parquet existence checks to catalog
The core feature of this PR is batching the existence checks of parquet
files in object store against the catalog. Before, there was 1 catalog
query per each parquet file in object store. This can be a lot of
requests.
This PR can perform one query of at most 100 parquet file uuids against
the catalog in one query. A hundred seems like a decent starting place.
The batch may not reach 100 because there is also a timeout on receiving
object store meta objects from the object store lister thread. That
timeout is set to 100 milliseconds. If more than 100 are received, they
are batched into 100 for the catalog.
Additionally, this PR includes surrounding code changes to make it more
idiomatic (but not perfect). It follows up some suggested work from
#7652 for watching for shutdown on the threads.
* fixes#7784
* use hashset instead of vec to test for contains
* chore: add test for db failure path
* remove ParquetFileExistsByOSID and other single field structs that are
just for sql deserialization; map to uuid explicitly
* fix the sqlite query by using a blob literal X'<hex>' for uuids
* comment clarifications
* adjust loggings to warn from debug for expected rare events
Many thanks to Carol for help implementing this!
* feat: Allow passing service protection limits in create db gRPC call
* fix: Move the impl into the catalog namespace trait
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* fix(garbage-collector): just test parquet file existence
The GC, when checking files in object store against the catalog, only
cares if the parquet file for the given object store id exists in the
catalog. It doesn't need the full parquet file. Let's not transmit it
over the wire.
This PR uses a SELECT 1 and boolean to test for parquet file existing.
* helps #7784
* chore: use struct for from_row
* chore: satisfy clippy
* chore: fmt
- Create data_types::partition_template::ValidationError
- Make creation of NamespacePartitionTemplateOverride and
TablePartitionTemplateOverride fallible
- Move SerializationWrapper into a module to make its inner field
private to force creation through one fallible constructor; this is
where the validation logic will go to be shared among all uses of
partition templates
* refactor: remove `ParquetFileRepo::flag_for_delete`
* refactor: batch update parquet files in catalog
* refactor: avoid data roundtrips through postgres
* refactor: do not return ID from PG when we do not need it
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: add `parquet_file` PG index for querier
Currently the `list_by_table_not_to_delete` catalog query is somewhat
expensive:
```text
iox_catalog_prod=> select table_id, sum((to_delete is NULL)::int) as n from parquet_file group by table_id order by n desc limit 5;
table_id | n
----------+------
1489038 | 7221
1489037 | 7019
1491534 | 5793
1491951 | 5522
1513377 | 5339
(5 rows)
iox_catalog_prod=> EXPLAIN ANALYZE SELECT id, namespace_id, table_id, partition_id, object_store_id,
min_time, max_time, to_delete, file_size_bytes,
row_count, compaction_level, created_at, column_set, max_l0_created_at
FROM parquet_file
WHERE table_id = 1489038 AND to_delete IS NULL;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on parquet_file (cost=46050.91..47179.26 rows=283 width=200) (actual time=464.368..472.514 rows=7221 loops=1)
Recheck Cond: ((table_id = 1489038) AND (to_delete IS NULL))
Heap Blocks: exact=7152
-> BitmapAnd (cost=46050.91..46050.91 rows=283 width=0) (actual time=463.341..463.343 rows=0 loops=1)
-> Bitmap Index Scan on parquet_file_table_idx (cost=0.00..321.65 rows=22545 width=0) (actual time=1.674..1.674 rows=7221 loops=1)
Index Cond: (table_id = 1489038)
-> Bitmap Index Scan on parquet_file_deleted_at_idx (cost=0.00..45728.86 rows=1525373 width=0) (actual time=460.717..460.717 rows=4772117 loops=1)
Index Cond: (to_delete IS NULL)
Planning Time: 0.092 ms
Execution Time: 472.907 ms
(10 rows)
```
I think this may also be because PostgreSQL kinda chooses the wrong
strategy, because it could just look at the existing index and filter
from there:
```text
iox_catalog_prod=> EXPLAIN ANALYZE SELECT id, namespace_id, table_id, partition_id, object_store_id,
min_time, max_time, to_delete, file_size_bytes,
row_count, compaction_level, created_at, column_set, max_l0_created_at
FROM parquet_file
WHERE table_id = 1489038;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using parquet_file_table_idx on parquet_file (cost=0.57..86237.78 rows=22545 width=200) (actual time=0.057..6.994 rows=7221 loops=1)
Index Cond: (table_id = 1489038)
Planning Time: 0.094 ms
Execution Time: 7.297 ms
(4 rows)
```
However PostgreSQL doesn't know the cardinalities well enough. So
let's add a dedicated index to make the querier faster.
* feat: new migration system
* docs: explain dirty migrations
Move the import into the submodule itself, rather than re-exporting it
at the crate level.
This will make it possible to link to the specific module/logic.
This method is used to enable tests - it's never intended to be used in
production code to access the underlying metric registry. The Catalog
trait is responsible for Catalog things, not acting as a dependency
injection for metrics.
The only current use of this is in test code, so no changes needed.
Treat them as the default partition template in the application, but
save space and avoid having to backfill the tables by having the
database values be NULL when no custom template has been specified.
Remove unused Postgres indices. This lower database load but also gives
us room to install actually useful indices (see #7842).
To detect which indices are used, I've used the following query (on the
actual write/master replicate in eu-central-1):
```sql
SELECT
n.nspname AS namespace_name,
t.relname AS table_name,
pg_size_pretty(pg_relation_size(t.oid)) AS table_size,
t.reltuples::bigint AS num_rows,
psai.indexrelname AS index_name,
pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
CASE WHEN i.indisunique THEN 'Y' ELSE 'N' END AS "unique",
psai.idx_scan AS number_of_scans,
psai.idx_tup_read AS tuples_read,
psai.idx_tup_fetch AS tuples_fetched
FROM
pg_index i
INNER JOIN pg_class t ON t.oid = i.indrelid
INNER JOIN pg_namespace n ON n.oid = t.relnamespace
INNER JOIN pg_stat_all_indexes psai ON i.indexrelid = psai.indexrelid
WHERE
n.nspname = 'iox_catalog' AND t.relname = 'parquet_file'
ORDER BY 1, 2, 5;
```
At `2023-05-23T16:00:00Z`:
```text
namespace_name | table_name | table_size | num_rows | index_name | index_size | unique | number_of_scans | tuples_read | tuples_fetched
----------------+--------------+------------+-----------+--------------------------------------------------+------------+--------+-----------------+----------------+----------------
iox_catalog | parquet_file | 31 GB | 120985000 | parquet_file_deleted_at_idx | 5398 MB | N | 1693383413 | 21036174283392 | 21336337964
iox_catalog | parquet_file | 31 GB | 120985000 | parquet_file_partition_created_idx | 11 GB | N | 34190874 | 4749070532 | 61934212
iox_catalog | parquet_file | 31 GB | 120985000 | parquet_file_partition_idx | 2032 MB | N | 1612961601 | 9935669905489 | 8611676799872
iox_catalog | parquet_file | 31 GB | 120985000 | parquet_file_pkey | 7135 MB | Y | 453927041 | 454181262 | 453894565
iox_catalog | parquet_file | 31 GB | 120985000 | parquet_file_shard_compaction_delete_created_idx | 14 GB | N | 0 | 0 | 0
iox_catalog | parquet_file | 31 GB | 120985000 | parquet_file_shard_compaction_delete_idx | 8767 MB | N | 2 | 30717 | 4860
iox_catalog | parquet_file | 31 GB | 120985000 | parquet_file_table_idx | 1602 MB | N | 9136844 | 341839537275 | 27551
iox_catalog | parquet_file | 31 GB | 120985000 | parquet_location_unique | 4989 MB | Y | 332341872 | 3123 | 3123
```
At `2023-05-24T09:50:00Z` (i.e. nearly 18h later):
```text
namespace_name | table_name | table_size | num_rows | index_name | index_size | unique | number_of_scans | tuples_read | tuples_fetched
----------------+--------------+------------+-----------+--------------------------------------------------+------------+--------+-----------------+----------------+----------------
iox_catalog | parquet_file | 31 GB | 123869328 | parquet_file_deleted_at_idx | 5448 MB | N | 1693485804 | 21409285169862 | 21364369704
iox_catalog | parquet_file | 31 GB | 123869328 | parquet_file_partition_created_idx | 11 GB | N | 34190874 | 4749070532 | 61934212
iox_catalog | parquet_file | 31 GB | 123869328 | parquet_file_partition_idx | 2044 MB | N | 1615214409 | 10159380553599 | 8811036969123
iox_catalog | parquet_file | 31 GB | 123869328 | parquet_file_pkey | 7189 MB | Y | 455128165 | 455382386 | 455095624
iox_catalog | parquet_file | 31 GB | 123869328 | parquet_file_shard_compaction_delete_created_idx | 14 GB | N | 0 | 0 | 0
iox_catalog | parquet_file | 31 GB | 123869328 | parquet_file_shard_compaction_delete_idx | 8849 MB | N | 2 | 30717 | 4860
iox_catalog | parquet_file | 31 GB | 123869328 | parquet_file_table_idx | 1618 MB | N | 9239071 | 348304417343 | 27551
iox_catalog | parquet_file | 31 GB | 123869328 | parquet_location_unique | 5043 MB | Y | 343484617 | 3123 | 3123
```
The cluster currently is under load and all components are running.
Conclusion:
- `parquet_file_deleted_at_idx`: Used, likely by the GC. We could
probably shrink this index by binning `deleted_at` (within the index,
not within the actual database table), but let's do this in a later PR.
- `parquet_file_partition_created_idx`: Unused and huge (`created_at` is
NOT binned). So let's remove it.
- `parquet_file_partition_idx`: Used, likely by the compactor and
querier because we currently don't have a better index (see #7842 as
well). This includes deleted files as well which is somewhat
pointless. May become obsolete after #7842, not touching for now.
- `parquet_file_pkey`: Primary key. We should probably use the object
store UUID as a primary key BTW, which would also make the GC faster.
Not touching for now.
- `parquet_file_shard_compaction_delete_created_idx`: Huge unused index.
Shards don't exist anymore. Delete it.
- `parquet_file_shard_compaction_delete_idx`: Same as
`parquet_file_shard_compaction_delete_created_idx`.
- `parquet_file_table_idx`: Used but is somewhat too large because it
contains deleted files. Might become obsolete after #7842, don't touch
for now.
- `parquet_location_unique`: See note `parquet_file_pkey`, it's
pointless to have two IDs here. Not touching for now but this is a
potential future improvement.
So we remove:
- `parquet_file_partition_created_idx`
- `parquet_file_shard_compaction_delete_created_idx`
- `parquet_file_shard_compaction_delete_idx`
This commit fixes loads of crates (47!) had unused dependencies, or
mis-configured dependencies (test deps as normal deps).
I added the "unused_crate_dependencies" to all crates to help prevent
this mess from growing again!
https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html
This has the minor downside of false-positives when specifying
dev-dependencies for test/bench binaries - these are files in /test or
/benches (not normal tests). This commit includes a workaround,
importing them in lib.rs (gated by a feature flag). I think the
trade-off of better dependency management is worth it!
- the table is unused
- there are no foreign keys or triggers based on this table
- the design is generally not scalable (N*M entries) and tombstones
should rather have
a timestamp so we can check if a parquet file includes that
information or not (or some other form of serialization mechanism)
- it's currently empty in prod (an never was filled w/ data in any
cluster)
* refactor: Change catalog configuration so it is entirely dsn based / support end to end testing without postgres
Restores code from https://github.com/influxdata/influxdb_iox/pull/7708
Revert "revert: PR #7708"
This reverts commit c9cfe05f8d.
* fix: merge
* fix: Update new test