* chore(cli): add `--partition-template` to namespace create
* chore: fix typo in doc for `PartitionTemplateConfig`
chore: add max limit 8 for partition template in doc
* chore: add e2e tests
* chore: fmt
* chore: add more e2e tests for namespace create with partition template
* chore: show doc comments in cli help interface
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: add sort_key_ids as array of bigints into catalog partition
* chore: add comments
* chore: remove comments to avoid changing them in the future due to checksum requirement
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This allows routers to be configured to mark downstreams as healthy/
unhealthy with a requirement for the number of probe requests
which can/must be collected to transition the health checkers circuit
state to healthy/unhealthy.
* test: improve `test_step_sql_statement_no_transaction`
* feat: also print number of steps in "applying migration step"
* feat: use single per-migration txn when possible
If all steps can (and want) to run in a transaction block, then wrap the
migration bookkeeping and the migration script into a single
transaction. This way we avoid the dirty state altogether because its
now an "all or nothing" migration.
Note that we still guarantee that there is only a single migration
running at the same time due to the locking mechanism. Otherwise we
would potentially run into nasty transaction failures during schema
modifications.
This is related to #7897 but only fixes / self-heals the "dirty" state
for transaction that can run in transactions. For concurrent index
migrations (which we need in prod) we need to be a bit smarter and this
will be done in a follow-up. However I feel that not leaving half-done
migrations for the cases where it's technically possible (e.g. adding
columns) is already a huge step forward.
* test: make `test_migrator_uses_single_transaction_when_possible` harder
* test: explain test
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Update the implementation of the MOVING_AVERAGE function to be a
user-defined window function allowing the values to be calculated
for the entire window in one go.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update datafusion pin
* fix: Update for change in API
* chore: Update plan
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This commit breaks the ordered list of persisting buffers into its own
type (PersistingList) for clarity, and implements a cache within it of
the merged set of schemas across all persisting buffer FSMs, and
row/timestamp summaries.
This cleans up the code, and prevents N persisting schemas from being
merged at query time (for every query!), instead schemas and statistics
are incrementally maintained, pushing the computation to persist time
rather than query time.
* fix: selectively merge L1 to L2 when L0s still exist
* fix: avoid grouping files that undo previous splits
* chore: add test case for new fixes
* chore: insta test churn
* chore: lint cleanup
* feat: Make parquet_file.partition_id optional in the catalog
This will acquire a short lock on the table in postgres, per:
<https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads>
This allows us to persist data for new partitions and associate the
Parquet file catalog records with the partition records using only the
partition hash ID, rather than both that are used now.
* fix: Support transition partition ID in the catalog service
* fix: Use transition partition ID in import/export
This commit also removes support for the `--partition-id` flag of the
`influxdb_iox remote store get-table` command, which Andrew approved.
The `--partition-id` filter was getting the results of the catalog gRPC
service's query for Parquet files of a table and then keeping only the
files whose partition IDs matched. The gRPC query is no longer returning
the partition ID from the Parquet file table, and really, this command
should instead be using `GetParquetFilesByPartitionId` to only request
what's needed rather than filtering.
* feat: Support looking up Parquet files by either kind of Partition id
Regardless of which is actually stored on the Parquet file record.
That is, say there's a Partition in the catalog with:
Partition {
id: 3,
hash_id: abcdefg,
}
and a Parquet file that has:
ParquetFile {
partition_hash_id: abcdefg,
}
calling `list_by_partition_not_to_delete(PartitionId(3))` should still
return this Parquet file because it is associated with the partition
that has ID 3.
This is important for the compactor, which is currently only dealing in
PartitionIds, and I'd like to keep it that way for now to avoid having
to change Even More in this PR.
* fix: Use and set new partition ID fields everywhere they want to be
---------
Co-authored-by: Dom <dom@itsallbroken.com>
Allows the router to optionally enable and start the gossip subsystem
(disabled by default).
No code uses the gossip system, so no application-level messages are
exchanged, but this allows the gossip subsystem to run and exchange
control frames / perform discovery / etc.
Allows the ingester to optionally enable and start the gossip subsystem
(disabled by default).
No code uses the gossip system, so no application-level messages are
exchanged, but this allows the gossip subsystem to run and exchange
control frames / perform discovery / etc.