This method is used to enable tests - it's never intended to be used in
production code to access the underlying metric registry. The Catalog
trait is responsible for Catalog things, not acting as a dependency
injection for metrics.
The only current use of this is in test code, so no changes needed.
Treat them as the default partition template in the application, but
save space and avoid having to backfill the tables by having the
database values be NULL when no custom template has been specified.
Move the application types to the top, which puts all the sqlx
conversion gobbledygook at the end because it's an internal
implementation detail I'm about to refactor
Git probably isn't going to display this in a super obvious way, but
this commit is only moving code around, not changing any of it
Remove unused Postgres indices. This lower database load but also gives
us room to install actually useful indices (see #7842).
To detect which indices are used, I've used the following query (on the
actual write/master replicate in eu-central-1):
```sql
SELECT
n.nspname AS namespace_name,
t.relname AS table_name,
pg_size_pretty(pg_relation_size(t.oid)) AS table_size,
t.reltuples::bigint AS num_rows,
psai.indexrelname AS index_name,
pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
CASE WHEN i.indisunique THEN 'Y' ELSE 'N' END AS "unique",
psai.idx_scan AS number_of_scans,
psai.idx_tup_read AS tuples_read,
psai.idx_tup_fetch AS tuples_fetched
FROM
pg_index i
INNER JOIN pg_class t ON t.oid = i.indrelid
INNER JOIN pg_namespace n ON n.oid = t.relnamespace
INNER JOIN pg_stat_all_indexes psai ON i.indexrelid = psai.indexrelid
WHERE
n.nspname = 'iox_catalog' AND t.relname = 'parquet_file'
ORDER BY 1, 2, 5;
```
At `2023-05-23T16:00:00Z`:
```text
namespace_name | table_name | table_size | num_rows | index_name | index_size | unique | number_of_scans | tuples_read | tuples_fetched
----------------+--------------+------------+-----------+--------------------------------------------------+------------+--------+-----------------+----------------+----------------
iox_catalog | parquet_file | 31 GB | 120985000 | parquet_file_deleted_at_idx | 5398 MB | N | 1693383413 | 21036174283392 | 21336337964
iox_catalog | parquet_file | 31 GB | 120985000 | parquet_file_partition_created_idx | 11 GB | N | 34190874 | 4749070532 | 61934212
iox_catalog | parquet_file | 31 GB | 120985000 | parquet_file_partition_idx | 2032 MB | N | 1612961601 | 9935669905489 | 8611676799872
iox_catalog | parquet_file | 31 GB | 120985000 | parquet_file_pkey | 7135 MB | Y | 453927041 | 454181262 | 453894565
iox_catalog | parquet_file | 31 GB | 120985000 | parquet_file_shard_compaction_delete_created_idx | 14 GB | N | 0 | 0 | 0
iox_catalog | parquet_file | 31 GB | 120985000 | parquet_file_shard_compaction_delete_idx | 8767 MB | N | 2 | 30717 | 4860
iox_catalog | parquet_file | 31 GB | 120985000 | parquet_file_table_idx | 1602 MB | N | 9136844 | 341839537275 | 27551
iox_catalog | parquet_file | 31 GB | 120985000 | parquet_location_unique | 4989 MB | Y | 332341872 | 3123 | 3123
```
At `2023-05-24T09:50:00Z` (i.e. nearly 18h later):
```text
namespace_name | table_name | table_size | num_rows | index_name | index_size | unique | number_of_scans | tuples_read | tuples_fetched
----------------+--------------+------------+-----------+--------------------------------------------------+------------+--------+-----------------+----------------+----------------
iox_catalog | parquet_file | 31 GB | 123869328 | parquet_file_deleted_at_idx | 5448 MB | N | 1693485804 | 21409285169862 | 21364369704
iox_catalog | parquet_file | 31 GB | 123869328 | parquet_file_partition_created_idx | 11 GB | N | 34190874 | 4749070532 | 61934212
iox_catalog | parquet_file | 31 GB | 123869328 | parquet_file_partition_idx | 2044 MB | N | 1615214409 | 10159380553599 | 8811036969123
iox_catalog | parquet_file | 31 GB | 123869328 | parquet_file_pkey | 7189 MB | Y | 455128165 | 455382386 | 455095624
iox_catalog | parquet_file | 31 GB | 123869328 | parquet_file_shard_compaction_delete_created_idx | 14 GB | N | 0 | 0 | 0
iox_catalog | parquet_file | 31 GB | 123869328 | parquet_file_shard_compaction_delete_idx | 8849 MB | N | 2 | 30717 | 4860
iox_catalog | parquet_file | 31 GB | 123869328 | parquet_file_table_idx | 1618 MB | N | 9239071 | 348304417343 | 27551
iox_catalog | parquet_file | 31 GB | 123869328 | parquet_location_unique | 5043 MB | Y | 343484617 | 3123 | 3123
```
The cluster currently is under load and all components are running.
Conclusion:
- `parquet_file_deleted_at_idx`: Used, likely by the GC. We could
probably shrink this index by binning `deleted_at` (within the index,
not within the actual database table), but let's do this in a later PR.
- `parquet_file_partition_created_idx`: Unused and huge (`created_at` is
NOT binned). So let's remove it.
- `parquet_file_partition_idx`: Used, likely by the compactor and
querier because we currently don't have a better index (see #7842 as
well). This includes deleted files as well which is somewhat
pointless. May become obsolete after #7842, not touching for now.
- `parquet_file_pkey`: Primary key. We should probably use the object
store UUID as a primary key BTW, which would also make the GC faster.
Not touching for now.
- `parquet_file_shard_compaction_delete_created_idx`: Huge unused index.
Shards don't exist anymore. Delete it.
- `parquet_file_shard_compaction_delete_idx`: Same as
`parquet_file_shard_compaction_delete_created_idx`.
- `parquet_file_table_idx`: Used but is somewhat too large because it
contains deleted files. Might become obsolete after #7842, don't touch
for now.
- `parquet_location_unique`: See note `parquet_file_pkey`, it's
pointless to have two IDs here. Not touching for now but this is a
potential future improvement.
So we remove:
- `parquet_file_partition_created_idx`
- `parquet_file_shard_compaction_delete_created_idx`
- `parquet_file_shard_compaction_delete_idx`
* refactor: consolidate pruning code
Let's have a single chunk pruning implementation in our code, not two.
Also removes a bit of crust from `QueryChunk` since it is technically no
longer responsible for pruning (this part has been pushed into the
querier for early pruning and bits for the `iox_query_influxrpc` for
some RPC shenanigans).
* test: regression test for incident
* fix: chunk pruning
* docs: add some test notes
* fix(ingester): re-transmit schema over flight if it changes
Fixes https://github.com/influxdata/idpe/issues/17408 .
So a `[Sendable]RecordBatchStream` contains `RecordBatch`es of the SAME
schema. When the ingester crafts a response for a specific partition,
this is also almost always the case however when there's a persist job
running (I think) it may have multiple snapshots for a partition. These
snapshots may have different schemas (since the ingester only creates
columns if the contain any data). Now the current implementation munches
all these snapshots into a single stream, and hands them over to arrow
flight which has a high-perf encode routine (i.e. it does not re-check
every single schema) so it sends the schema once and then sends the data
for every batch (the data only, schema data is NOT repeated). On the
receiver side (= querier) we decode that data and get confused why on
earth some batches have a different column count compared to the schema.
For the OG ingester I carefully crafted the response to ensure that we
do not run into this problem, but apparently a number of rewrites and
refactors broke that. So here is the fix:
- remove the stream that isn't really as stream (and cannot error)
- for each partition go over the `RecordBatch`es and chunk them
according to the schema (because this check is likely cheaper than
re-transmitting the schema for every `RecordBatch`)
- adjust a bunch of testing code to cope with this
* refactor: nicify code
* test: adjust test
This commit fixes loads of crates (47!) had unused dependencies, or
mis-configured dependencies (test deps as normal deps).
I added the "unused_crate_dependencies" to all crates to help prevent
this mess from growing again!
https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html
This has the minor downside of false-positives when specifying
dev-dependencies for test/bench binaries - these are files in /test or
/benches (not normal tests). This commit includes a workaround,
importing them in lib.rs (gated by a feature flag). I think the
trade-off of better dependency management is worth it!