* chore: test changes and additions in preparation for functional changes
* feat: move vertical splitting to RoundInfo calculation, align splits to L1 files
* chore: insta test churn
* feat: detect non-linear data distribution in vertical splitting
* chore: add tests for non-linear data distribution
* chore: insta churn
* chore: cleanup & comment additions
* chore: some variable renaming
* feat: fill catalog sort_key_ids for partition with coming data
* test: sort_key_ids has empty array for newly create partition
* test: name of non-existing column
* chore: add comments to ask Andrew about the code
* chore: make comments clearer
* chore: fix a comment to avoid failure in doc
* chore: add comment for the panic if column name of sort key not found
* fix: during import files the partition has to be created with empty sort key first. Then after its files are created, the partition will be uodated with sort key
* chore: remove no longer needed comments after the bug in build_catalog test is fixed
* chore: address review comments
* refactor: Use ColumnSet type
* chore: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* chore: fix a clippy
---------
Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
This commit implements a PartitionProvider decorator that
probabilistically determines if a partition is going to be a an
"old-style" row-addressed partition created prior to #7963, or a
"new-style" hash-addressed partition created after using a fast,
space-efficient, compressed bloom filter.
If a partition is identified as a new-style, hash-addressed partition,
the PartitionData is immediately initialised using the deterministic
hash ID without performing a catalog query at all.
If a partition is identified as an old-style, row-addressed partition, a
catalog query is performed to resolve the row ID as it would without
this filter.
A new-style, hash-addressed partition may sometimes be incorrectly
identified as a row-addressed partition, causing a spurious catalog
query, which is then correctly identified as a hash-addressed partition.
This is tuned to happen ~1-0.1% of the time, eliminating 99% to 99.9% of
unnecessary catalog queries.
* feat: Make parquet_file.partition_id optional in the catalog
This will acquire a short lock on the table in postgres, per:
<https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads>
This allows us to persist data for new partitions and associate the
Parquet file catalog records with the partition records using only the
partition hash ID, rather than both that are used now.
* fix: Support transition partition ID in the catalog service
* fix: Use transition partition ID in import/export
This commit also removes support for the `--partition-id` flag of the
`influxdb_iox remote store get-table` command, which Andrew approved.
The `--partition-id` filter was getting the results of the catalog gRPC
service's query for Parquet files of a table and then keeping only the
files whose partition IDs matched. The gRPC query is no longer returning
the partition ID from the Parquet file table, and really, this command
should instead be using `GetParquetFilesByPartitionId` to only request
what's needed rather than filtering.
* feat: Support looking up Parquet files by either kind of Partition id
Regardless of which is actually stored on the Parquet file record.
That is, say there's a Partition in the catalog with:
Partition {
id: 3,
hash_id: abcdefg,
}
and a Parquet file that has:
ParquetFile {
partition_hash_id: abcdefg,
}
calling `list_by_partition_not_to_delete(PartitionId(3))` should still
return this Parquet file because it is associated with the partition
that has ID 3.
This is important for the compactor, which is currently only dealing in
PartitionIds, and I'd like to keep it that way for now to avoid having
to change Even More in this PR.
* fix: Use and set new partition ID fields everywhere they want to be
---------
Co-authored-by: Dom <dom@itsallbroken.com>
This adds some computational overhead during the merging of new
namespace schema with what's in the router's local cache, but will allow
gossiping of changes.
Cache the row count & timestamp min/max values within the partition FSM
/ buffer, and make them available through the Queryable trait.
This allows the PartitionData to read the row count of a buffer (either
"hot" for writes, a "snapshot" of immutable RecordBatch, or "persisting"
for in-flight persisting data).
These values will enable early partition pruning.
Time has a special meaning and can be partitioned on by the strftime
formatter. It should not be used as a tag value part in a custom
partitioning template.
There are a bunch of dependencies in `Cargo.lock` that are related to
mysql. These are NOT compiled at all, and are also not part of `cargo
tree`. The reason for the inclusion is a bug in cargo:
https://github.com/rust-lang/cargo/issues/10801
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Although callers could manually extend the sequence number set by continually
adding in an iterator loop or a fold expression, this enables other
combinator patterns when dealing with collections of sequence number
sets.
Now that sequence numbers are internal to the ingester and the WAL,
there's no need for them to be a signed integer. As noted by
[#7260](https://github.com/influxdata/influxdb_iox/issues/7260) this was
a quirk related to the kafka-based IOx and Postgres only supported
signed integers.
This will hold the deterministic ID for partitions.
Until all existing partitions have this value, this is optional/nullable.
The row ID still exists and is used as the main foreign key in the
parquet_file and skipped_compaction tables.
The hash_id has a unique index so that we can look up records based on
it (if it's available).
If the parquet file record has a partition_hash_id value, use that to
generate the object storage path instead of the partition_id.
* feat(garbage-collector): batch parquet existence checks to catalog
The core feature of this PR is batching the existence checks of parquet
files in object store against the catalog. Before, there was 1 catalog
query per each parquet file in object store. This can be a lot of
requests.
This PR can perform one query of at most 100 parquet file uuids against
the catalog in one query. A hundred seems like a decent starting place.
The batch may not reach 100 because there is also a timeout on receiving
object store meta objects from the object store lister thread. That
timeout is set to 100 milliseconds. If more than 100 are received, they
are batched into 100 for the catalog.
Additionally, this PR includes surrounding code changes to make it more
idiomatic (but not perfect). It follows up some suggested work from
#7652 for watching for shutdown on the threads.
* fixes#7784
* use hashset instead of vec to test for contains
* chore: add test for db failure path
* remove ParquetFileExistsByOSID and other single field structs that are
just for sql deserialization; map to uuid explicitly
* fix the sqlite query by using a blob literal X'<hex>' for uuids
* comment clarifications
* adjust loggings to warn from debug for expected rare events
Many thanks to Carol for help implementing this!
The template length should always return a value > 0 because templates
must have at least one part. Before this change, `len` would have
returned 0 if there was no override because of the `unwrap_or_default`.
Instead, use the `parts` method, which takes care of the fallback to the
hardcoded default template, whose len will always be 1.
Partition templates should not contain more than 8 parts, which when
combined with a per-part byte limit, bounds the maximum size of a
partition key.
This commit causes the router to refuse to service a write request that
contains > 8 parts in the template - this causes a panic, as it's a
broken system invariant and should be an unreachable state. Templates
are pre-validated at creation time to contain no more than 8 parts, and
are immutable:
https://github.com/influxdata/influxdb_iox/pull/7930
Allow a table template override to report the number of template parts
within it.
This ignores the lint wanting an "is_empty()" method too, because it's
misleading and redundant - a template MUST never be empty.
This commit ensures all partition key parts are less than or equal to
200 bytes long.
If a string exceeds the 200 byte limit, it is truncated (avoiding
splitting unicode code-points or graphemes) and then a single "#"
sentinel value is appended. When reversed from the string, these column
values are indicated to be suitable for prefix-matching only - a
property that is encoded into the type system.
This commit takes a conservative approach of not splitting graphemes as
outlined in the module documentation, but this could be relaxed in the
future if needed.