chore(doc): fix typos and format
chore(doc): fix format
chore(doc): rename file
chore(doc): add new file to doc README
chore(doc): format
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: use `Box<[...]>` instead of `Vec<...>`
We are not planning to modify the vector, so storing a capacity and a
length is somewhat pointless.
* feat: add printout test for PG migrations
* refactor: use dedicated checksum type
* feat: checksum string roundtrips
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This benchmark covers two axis of performance for calls to the
namespace cache's `put_schema()` stack. These are the cost of adding
varying numbers of new columns to an existing table in the namespace, as
well as adding new tables with their own set of columns to an existing
namespace.
Not all use cases will involve sending messages (some will only want to
subscribe to messages) which might result in someone dropping the handle
they're not expecting to use.
Forgot to update the docs in #8373.
This will be updated again after I've finished #7897.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Adds the supporting types required to integrate the generic gossip crate
into a schema-specific broadcast primitive.
This commit implements the two "halves":
* GossipMessageDispatcher: async processing of incoming gossip msgs
* Handle: the send-side handle for async sending of gossip msgs
These types are responsible for converting into/from the serialised
bytes sent over the gossip primitive into application-level / protobuf
types.
* chore(cli): add `--partition-template` to namespace create
* chore: fix typo in doc for `PartitionTemplateConfig`
chore: add max limit 8 for partition template in doc
* chore: add e2e tests
* chore: fmt
* chore: add more e2e tests for namespace create with partition template
* chore: show doc comments in cli help interface
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: add sort_key_ids as array of bigints into catalog partition
* chore: add comments
* chore: remove comments to avoid changing them in the future due to checksum requirement
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This allows routers to be configured to mark downstreams as healthy/
unhealthy with a requirement for the number of probe requests
which can/must be collected to transition the health checkers circuit
state to healthy/unhealthy.
The generated types emit types that depend on prost (through Message
derives), and therefore all users of generated_types already depend on
prost.
It would be wrong for users of the generated_types crate to use a
different version of prost than what is used in generated_types.
By re-exporting prost, users can just depend on generated_types, and
always use the right prost version. prost prost prost. prost.
Define the gossip message types used to disseminate schema changes to
other peers.
Currently there are two types defined: an initial "create" operation,
intended to be followed by "update" operations where appropriate. Both
messages are trivial CRDTs in that they are effectively add-only column
sets (a monotonic type) with other fields required to be immutable (as
they currently are in IOx).
* test: improve `test_step_sql_statement_no_transaction`
* feat: also print number of steps in "applying migration step"
* feat: use single per-migration txn when possible
If all steps can (and want) to run in a transaction block, then wrap the
migration bookkeeping and the migration script into a single
transaction. This way we avoid the dirty state altogether because its
now an "all or nothing" migration.
Note that we still guarantee that there is only a single migration
running at the same time due to the locking mechanism. Otherwise we
would potentially run into nasty transaction failures during schema
modifications.
This is related to #7897 but only fixes / self-heals the "dirty" state
for transaction that can run in transactions. For concurrent index
migrations (which we need in prod) we need to be a bit smarter and this
will be done in a follow-up. However I feel that not leaving half-done
migrations for the cases where it's technically possible (e.g. adding
columns) is already a huge step forward.
* test: make `test_migrator_uses_single_transaction_when_possible` harder
* test: explain test
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Update the implementation of the MOVING_AVERAGE function to be a
user-defined window function allowing the values to be calculated
for the entire window in one go.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update datafusion pin
* fix: Update for change in API
* chore: Update plan
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This commit breaks the ordered list of persisting buffers into its own
type (PersistingList) for clarity, and implements a cache within it of
the merged set of schemas across all persisting buffer FSMs, and
row/timestamp summaries.
This cleans up the code, and prevents N persisting schemas from being
merged at query time (for every query!), instead schemas and statistics
are incrementally maintained, pushing the computation to persist time
rather than query time.
* fix: selectively merge L1 to L2 when L0s still exist
* fix: avoid grouping files that undo previous splits
* chore: add test case for new fixes
* chore: insta test churn
* chore: lint cleanup
* feat: Make parquet_file.partition_id optional in the catalog
This will acquire a short lock on the table in postgres, per:
<https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads>
This allows us to persist data for new partitions and associate the
Parquet file catalog records with the partition records using only the
partition hash ID, rather than both that are used now.
* fix: Support transition partition ID in the catalog service
* fix: Use transition partition ID in import/export
This commit also removes support for the `--partition-id` flag of the
`influxdb_iox remote store get-table` command, which Andrew approved.
The `--partition-id` filter was getting the results of the catalog gRPC
service's query for Parquet files of a table and then keeping only the
files whose partition IDs matched. The gRPC query is no longer returning
the partition ID from the Parquet file table, and really, this command
should instead be using `GetParquetFilesByPartitionId` to only request
what's needed rather than filtering.
* feat: Support looking up Parquet files by either kind of Partition id
Regardless of which is actually stored on the Parquet file record.
That is, say there's a Partition in the catalog with:
Partition {
id: 3,
hash_id: abcdefg,
}
and a Parquet file that has:
ParquetFile {
partition_hash_id: abcdefg,
}
calling `list_by_partition_not_to_delete(PartitionId(3))` should still
return this Parquet file because it is associated with the partition
that has ID 3.
This is important for the compactor, which is currently only dealing in
PartitionIds, and I'd like to keep it that way for now to avoid having
to change Even More in this PR.
* fix: Use and set new partition ID fields everywhere they want to be
---------
Co-authored-by: Dom <dom@itsallbroken.com>