Includes the whole migration set again:
1. **create helper index:** This may or may not exist in prod depending
on when #8540 landed. This is OK, spend a bit of work here.
2. **actual migration:** The bit that originally failed, but now w/ an
extra step to migrate empty keys (because the `unnest`+`array_agg`
logic only works for non-empty keys and is a no-op for empty ones).
This is idempotent (i.e. a no-op) if we already finished the
migration.
3. **remove helper index:** Remove the helper index from (1).
4. **enfore `NOT NULL`:** This is a no-op if this was already applied.
Closes#8542.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: fill catalog sort_key_ids for partition with coming data
* test: sort_key_ids has empty array for newly create partition
* test: name of non-existing column
* chore: add comments to ask Andrew about the code
* chore: make comments clearer
* chore: fix a comment to avoid failure in doc
* chore: add comment for the panic if column name of sort key not found
* fix: during import files the partition has to be created with empty sort key first. Then after its files are created, the partition will be uodated with sort key
* chore: remove no longer needed comments after the bug in build_catalog test is fixed
* chore: address review comments
* refactor: Use ColumnSet type
* chore: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* chore: fix a clippy
---------
Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* fix: make concurrent index creations idempotent
`CREATE INDEX CONCURRENTLY` may fail and leave an invalid index behind.
See:
<https://www.postgresql.org/docs/current/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY>
Invalid indexes will NOT be used for queries, however they is also useless and
technically the migration wasn't successful. Since #8407 we detect this
situation. #8394 allows us to fix these migrations. #8434 makes sure
that we don't mess this up (esp. that the "other checksum" pragmas are
correct).
For #7897.
* fix: multi-line processing in `check_linear_migrations.sh`
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
We need to fix our non-idempotent `CREATE INDEX CONCURRENTLY` migrations
so they are self-healing. This is an essential part of #7897.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* catalog.get_in_skipped_compaction() should handle for multiple partitions
* add the ability to perform transformation on sets of partitions (rather than filtering one by one). Start with the transformation to remove skipped partitions, in the scheduler.
* move the env var and cli flag setting, for when to ignore skipped partitions, to the scheduler config.
* refactor: use `Box<[...]>` instead of `Vec<...>`
We are not planning to modify the vector, so storing a capacity and a
length is somewhat pointless.
* feat: add printout test for PG migrations
* refactor: use dedicated checksum type
* feat: checksum string roundtrips
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Forgot to update the docs in #8373.
This will be updated again after I've finished #7897.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: add sort_key_ids as array of bigints into catalog partition
* chore: add comments
* chore: remove comments to avoid changing them in the future due to checksum requirement
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: improve `test_step_sql_statement_no_transaction`
* feat: also print number of steps in "applying migration step"
* feat: use single per-migration txn when possible
If all steps can (and want) to run in a transaction block, then wrap the
migration bookkeeping and the migration script into a single
transaction. This way we avoid the dirty state altogether because its
now an "all or nothing" migration.
Note that we still guarantee that there is only a single migration
running at the same time due to the locking mechanism. Otherwise we
would potentially run into nasty transaction failures during schema
modifications.
This is related to #7897 but only fixes / self-heals the "dirty" state
for transaction that can run in transactions. For concurrent index
migrations (which we need in prod) we need to be a bit smarter and this
will be done in a follow-up. However I feel that not leaving half-done
migrations for the cases where it's technically possible (e.g. adding
columns) is already a huge step forward.
* test: make `test_migrator_uses_single_transaction_when_possible` harder
* test: explain test
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Make parquet_file.partition_id optional in the catalog
This will acquire a short lock on the table in postgres, per:
<https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads>
This allows us to persist data for new partitions and associate the
Parquet file catalog records with the partition records using only the
partition hash ID, rather than both that are used now.
* fix: Support transition partition ID in the catalog service
* fix: Use transition partition ID in import/export
This commit also removes support for the `--partition-id` flag of the
`influxdb_iox remote store get-table` command, which Andrew approved.
The `--partition-id` filter was getting the results of the catalog gRPC
service's query for Parquet files of a table and then keeping only the
files whose partition IDs matched. The gRPC query is no longer returning
the partition ID from the Parquet file table, and really, this command
should instead be using `GetParquetFilesByPartitionId` to only request
what's needed rather than filtering.
* feat: Support looking up Parquet files by either kind of Partition id
Regardless of which is actually stored on the Parquet file record.
That is, say there's a Partition in the catalog with:
Partition {
id: 3,
hash_id: abcdefg,
}
and a Parquet file that has:
ParquetFile {
partition_hash_id: abcdefg,
}
calling `list_by_partition_not_to_delete(PartitionId(3))` should still
return this Parquet file because it is associated with the partition
that has ID 3.
This is important for the compactor, which is currently only dealing in
PartitionIds, and I'd like to keep it that way for now to avoid having
to change Even More in this PR.
* fix: Use and set new partition ID fields everywhere they want to be
---------
Co-authored-by: Dom <dom@itsallbroken.com>
PR #8327 introduced a bunch of metrics for the sqlx connection pool. One
of the metrics was the "used" metrics that was supposed to count
"currently in use" connection. In prod however this metric underflows to
a very large integer. It seems that "acquire" callback is only used by sqlx for
re-used connections (i.e. for the transition from "idle" to "used").
Now we could try to work around it but since there is no "close
connection" callback, I doubt it it possible to do the accurately.
Luckily though we don't really need that counter. sqlx already offers
"active" (defined as idle + used) and "idle", so getting "used" is just
the difference. I removed the "used" metric nevertheless because
"active" and "idle" are read independently from each other (based on atomic
integers) and are NOT guaranteed to be in-sync. Calculating the
difference within IOx however would give the illusion that they are. So
I leave this to the dashboard / alert / whatever, because there it is
usually understood that metrics are samples and may be out of sync for a
very short time.
A nice side effect of this change is that it simplifies the code quite a
bit.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
To better gauge how many connections we use and especially if we hit the
max connection limit, it would be helpful to actually have some metrics
available for the pool usage. This change adds a few basic metrics.
There are a bunch of dependencies in `Cargo.lock` that are related to
mysql. These are NOT compiled at all, and are also not part of `cargo
tree`. The reason for the inclusion is a bug in cargo:
https://github.com/rust-lang/cargo/issues/10801
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>