Commit Graph

597 Commits (be9064c75fc912c93859f180a17afa260c6ac242)

Author SHA1 Message Date
dependabot[bot] 77f340f964
chore(deps): Bump thiserror from 1.0.47 to 1.0.48 (#8658)
Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.47 to 1.0.48.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.47...1.0.48)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-04 09:21:22 +00:00
Nga Tran 93f3ec6999
feat: teach querier to use sort_key_ids (#8604)
* feat: teach querier to use sort_key_ids

* chore: add an assert to capture bugs

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-01 14:45:42 +00:00
Marco Neumann 1f3ee8bf91
refactor: prep work for #8349 (#8626)
* refactor: make projection masks unsigned

* fix: buffer alignment

* feat: more precise serialization error

* refactor: make `client_util` tower helper public

This can be used for #8349 to set tracing headers.

* fix: impl `Eq` for `TimestampMinMax`

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-31 16:12:50 +00:00
Fraser Savage 1007989cd8
refactor(proto): Impl From<Table> to proto Table 2023-08-31 13:19:09 +01:00
Dom b68d108baf
Merge branch 'main' into dom/gossip-parquet-proto 2023-08-30 10:55:46 +01:00
Dom Dwyer 4c2945719a
feat: proto serialisation of ParquetFile
Adds conversion functions to serialise a ParquetFile into a protobuf
representation, and back again.

Adds randomised testing to assert round-trip equality.
2023-08-29 12:35:45 +02:00
Dom Dwyer bd4a3fbbb8
refactor: impl Hash for NamespaceSchema
Allow the NamespaceSchema to be hashed (including the underlying proto
types it contains).
2023-08-29 12:19:41 +02:00
Dom Dwyer fc694effda
refactor: AsRef bytes for NamespaceName
The NamespaceName is a wrapper over a str (conceptually), which allows
for cheap use of the underlying bytes.
2023-08-29 12:18:33 +02:00
Carol (Nichols || Goulding) 12b8095c46
feat: Upgrade to Rust 1.72.0 (#8589)
* feat: Upgrade to Rust 1.72.0

* fix: Allow a warning about an error we're intentionally creating

This is a test for an error. This lint warns that this code will cause
an error. Thanks lint, that's what we wanted!

* chore: rustfmt 1.72

* fix: Remove unnecessary hashes in raw string literals

Thanks Clippy!
https://rust-lang.github.io/rust-clippy/master/index.html#/needless_raw_string_hashes

Note that there are a number of false negatives with this lint; see
https://github.com/rust-lang/rust-clippy/issues/11420

* fix: Remove unnecessary explicit iteration

Looks like clippy::explicit_iter_loop was improved.
https://rust-lang.github.io/rust-clippy/master/index.html#/explicit_iter_loop

* fix: Allow clippy::manual_try_fold in a few places

Some of these might not be possible to rewrite with try_fold, or at
least not trivially. I don't feel confident enough to change these, in
any case. I think the lint is good to have on for future code though, so
that new code can be written with try_fold.

* fix: Remove useless creation of vectors when an array will do

Mostly in tests. Also fix some long lines.

Thanks Clippy!
https://rust-lang.github.io/rust-clippy/master/index.html#/useless_vec

* fix: Allow a single range in a vec init, which is actually what we want

Looks like Clippy's trying to catch a common mistake here, but for realz
we actually want `Vec<Range<usize>>` not `Vec<usize>`

https://rust-lang.github.io/rust-clippy/master/index.html#/single_range_in_vec_init

* fix: Remove a useless conversion

This looks like removing explicit iteration, but it's actually caught by
useless_conversion.

https://rust-lang.github.io/rust-clippy/master/index.html#/useless_conversion

* fix: Remove redundant pattern matching

Thanks Clippy!
https://rust-lang.github.io/rust-clippy/master/index.html#/redundant_pat

* fix: Allow an unwrap on a literal None in a test

This matches with the other tests better, and also when I tried to
remove the `unwrap_or_default` it changed the JSON sent from something
with an empty value to `null`, so I think the `or_default` part is
actually changing from one `None` to another `None`.

https://rust-lang.github.io/rust-clippy/master/index.html#/unnecessary_literal_unwrap
2023-08-29 05:57:38 +00:00
Joe-Blount 1df5948c97
feat: Add Compaction Regions (#8559)
* feat: add CompactRanges RoundInfo type

* chore: insta test updates for adding CompactRange

* feat: simplify/improve ManySmallFiles logic, now that its problem set is simpler

* chore: insta test updates for ManySmallFiles improvement

* chore: upgrade files more aggressively

* chore: insta updates from more aggressive file upgrades

* chore: addressing review comments
2023-08-28 12:59:12 +00:00
Nga Tran 2eb74ddb87
chore: revert teaching compactor to use sort_key_ids (#8574) 2023-08-25 13:21:12 +00:00
Nga Tran 246918feb6
feat: teach compactor to use sort_key_ids instead of sort_key (#8560)
* feat: teach compactor to use sort_key_ids instead of sort_key

* test: update the test output after chatting with Joe and know the reason of the chnanges
2023-08-24 16:16:12 +00:00
Dom Dwyer 17b8eaef0f
test: assert error conditions for proto decoding
Test error cases too.
2023-08-24 11:04:19 +02:00
Dom Dwyer e932132946
feat: proto serialisation of TransitionPartitionId
Adds serialisation and deserialisation of TransitionPartitionId to the
protobuf representation, and a randomised round-trip property test.
2023-08-23 15:52:02 +02:00
Dom Dwyer 6a68b6edf0
test(partition): less proptest discards
Generate non-empty strings as inputs to proptest tests instead of
generating random strings and filtering.
2023-08-23 15:52:01 +02:00
Dom Dwyer 55631a4f83
refactor: ColumnsByName::from_iter()
Allow a ColumnsByName to be constructed by collecting a set of (name,
column_schema) tuples.
2023-08-22 12:45:21 +02:00
Joe-Blount 53915f0653
feat: move vertical splitting & detect non-linear data (#8506)
* chore: test changes and additions in preparation for functional changes

* feat: move vertical splitting to RoundInfo calculation, align splits to L1 files

* chore: insta test churn

* feat: detect non-linear data distribution in vertical splitting

* chore: add tests for non-linear data distribution

* chore: insta churn

* chore: cleanup & comment additions

* chore: some variable renaming
2023-08-21 18:22:25 +00:00
Nga Tran 3e98f7ea5c
feat: fill sort_key_ids when partition is inserted and updated (#8517)
* feat: read null sort_key_ids

* chore: clearer explanation about test strategy

* chore: Apply suggestions from code review

Co-authored-by: Marco Neumann <marco@crepererum.net>

* test: tests that add partition with NULL sort_key_ids

* feat: set sort_key_ids to empty array {} during partition insertion

* feat: initial step to update sort_key_ids

* chore: address review comments

* chore: remove unecessary comments and tests

* fix: typos

* chore: remove unecessary tests

* feat: continue the work of updating sort_key_ids

* fix: chec duplicates for SortedColumnSet

* test: tests for sort ley ids

* test: fix a test

* chore: remove unused comments

* chore: address first half of review comments and removing tests of tests

* chore: address review commnets for fetching colums in ingester

---------

Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-21 14:26:57 +00:00
Nga Tran 5d17a99dbb
feat: read null sort_key_ids (#8489)
* feat: read null sort_key_ids

* chore: clearer explanation about test strategy

* chore: Apply suggestions from code review

Co-authored-by: Marco Neumann <marco@crepererum.net>

* test: tests that add partition with NULL sort_key_ids

* chore: address review comments

* chore: remove unecessary comments and tests

* fix: typos

* chore: remove unecessary tests

* fix: chec duplicates for SortedColumnSet

---------

Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-18 14:15:27 +00:00
dependabot[bot] d2c71bfe67
chore(deps): Bump thiserror from 1.0.46 to 1.0.47 (#8519)
Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.46 to 1.0.47.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.46...1.0.47)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-18 09:02:48 +00:00
dependabot[bot] fff313b80c
chore(deps): Bump thiserror from 1.0.44 to 1.0.46 (#8496)
Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.44 to 1.0.46.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.44...1.0.46)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-16 10:54:47 +00:00
NGA-TRAN 9bf1c8c11c chore: revert fill sort_key_ids 2023-08-11 11:36:27 -04:00
Nga Tran da92a5c9e1
feat: fill catalog `sort_key_ids` for partitions with coming data (#8462)
* feat: fill catalog sort_key_ids for partition with coming data

* test: sort_key_ids has empty array for newly create partition

* test: name of non-existing column

* chore: add comments to ask Andrew about the code

* chore: make comments clearer

* chore: fix a comment to avoid failure in doc

* chore: add comment for the panic if column name of sort key not found

* fix: during import files the partition has to be created with empty sort key first. Then after its files are created, the partition will be uodated with sort key

* chore: remove no longer needed comments after the bug in build_catalog test is fixed

* chore: address review comments

* refactor: Use ColumnSet type

* chore: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* chore: fix a clippy

---------

Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2023-08-10 18:12:40 +00:00
Dom Dwyer 03f7025211
perf: minimise partition catalog queries
This commit implements a PartitionProvider decorator that
probabilistically determines if a partition is going to be a an
"old-style" row-addressed partition created prior to #7963, or a
"new-style" hash-addressed partition created after using a fast,
space-efficient, compressed bloom filter.

If a partition is identified as a new-style, hash-addressed partition,
the PartitionData is immediately initialised using the deterministic
hash ID without performing a catalog query at all.

If a partition is identified as an old-style, row-addressed partition, a
catalog query is performed to resolve the row ID as it would without
this filter.

A new-style, hash-addressed partition may sometimes be incorrectly
identified as a row-addressed partition, causing a spurious catalog
query, which is then correctly identified as a hash-addressed partition.
This is tuned to happen ~1-0.1% of the time, eliminating 99% to 99.9% of
unnecessary catalog queries.
2023-08-03 16:40:38 +02:00
Carol (Nichols || Goulding) 641324b261
docs: Explain TransitionPartitionId more thoroughly
Co-authored-by: Dom <dom@itsallbroken.com>
2023-08-02 10:17:24 -04:00
Carol (Nichols || Goulding) 92ae8e4084
refactor: Extract a convenience constructor for Deterministic transition ids 2023-08-02 10:17:23 -04:00
Dom Dwyer 6ea8c99c01
refactor: accessor for table partition proto
Allow the Table partition template protobuf to be accessed (if
specified).
2023-08-02 13:36:35 +02:00
Dom Dwyer e3ec091881
refactor: accessor for namespace partition proto
Allow the Namespace partition template protobuf to be accessed (if
specified).
2023-08-02 13:36:34 +02:00
Dom Dwyer 2ebd2e2236
feat: ColumnSchema instantiation from gossip
Implement converting a Column received via gossip into a ColumnSchema.
2023-08-02 13:36:24 +02:00
Carol (Nichols || Goulding) 4a9e76b8b7
feat: Make parquet_file.partition_id optional in the catalog (#8339)
* feat: Make parquet_file.partition_id optional in the catalog

This will acquire a short lock on the table in postgres, per:
<https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads>

This allows us to persist data for new partitions and associate the
Parquet file catalog records with the partition records using only the
partition hash ID, rather than both that are used now.

* fix: Support transition partition ID in the catalog service

* fix: Use transition partition ID in import/export

This commit also removes support for the `--partition-id` flag of the
`influxdb_iox remote store get-table` command, which Andrew approved.

The `--partition-id` filter was getting the results of the catalog gRPC
service's query for Parquet files of a table and then keeping only the
files whose partition IDs matched. The gRPC query is no longer returning
the partition ID from the Parquet file table, and really, this command
should instead be using `GetParquetFilesByPartitionId` to only request
what's needed rather than filtering.

* feat: Support looking up Parquet files by either kind of Partition id

Regardless of which is actually stored on the Parquet file record.

That is, say there's a Partition in the catalog with:

Partition {
    id: 3,
    hash_id: abcdefg,
}

and a Parquet file that has:

ParquetFile {
    partition_hash_id: abcdefg,
}

calling `list_by_partition_not_to_delete(PartitionId(3))` should still
return this Parquet file because it is associated with the partition
that has ID 3.

This is important for the compactor, which is currently only dealing in
PartitionIds, and I'd like to keep it that way for now to avoid having
to change Even More in this PR.

* fix: Use and set new partition ID fields everywhere they want to be

---------

Co-authored-by: Dom <dom@itsallbroken.com>
2023-07-31 12:40:56 +00:00
Fraser Savage 5453ad8ba4
feat(router): Include table/column diff for namespace schema cache update
This adds some computational overhead during the merging of new
namespace schema with what's in the router's local cache, but will allow
gossiping of changes.
2023-07-27 13:37:47 +01:00
Dom Dwyer b4b7822f2b
perf: cache summary statistics in partition FSM
Cache the row count & timestamp min/max values within the partition FSM
/ buffer, and make them available through the Queryable trait.

This allows the PartitionData to read the row count of a buffer (either
"hot" for writes, a "snapshot" of immutable RecordBatch, or "persisting"
for in-flight persisting data).

These values will enable early partition pruning.
2023-07-25 14:44:37 +02:00
Fraser Savage c834ec171f
test(router): Custom partition template API create using `time` tag value is rejected
This removes the double negative from the error message and adds
coverage at the router's gRPC API level for the rejection of the bad
TagValue value.
2023-07-24 13:07:04 +01:00
Fraser Savage aac4166bf0
fix: Reject `time` as a tag value for custom partition templates
Time has a special meaning and can be partitioned on by the strftime
formatter. It should not be used as a tag value part in a custom
partitioning template.
2023-07-24 12:49:13 +01:00
dependabot[bot] faa8d44492
chore(deps): Bump thiserror from 1.0.43 to 1.0.44 (#8315)
Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.43 to 1.0.44.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.43...1.0.44)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-24 10:18:44 +00:00
Marco Neumann 004b401a05
chore: upgrade to sqlx 0.7.1 (#8266)
There are a bunch of dependencies in `Cargo.lock` that are related to
mysql. These are NOT compiled at all, and are also not part of `cargo
tree`. The reason for the inclusion is a bug in cargo:

https://github.com/rust-lang/cargo/issues/10801

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-19 12:18:57 +00:00
dependabot[bot] e33a078128
chore(deps): Bump paste from 1.0.13 to 1.0.14 (#8244)
Bumps [paste](https://github.com/dtolnay/paste) from 1.0.13 to 1.0.14.
- [Release notes](https://github.com/dtolnay/paste/releases)
- [Commits](https://github.com/dtolnay/paste/compare/1.0.13...1.0.14)

---
updated-dependencies:
- dependency-name: paste
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-17 16:10:02 +00:00
Carol (Nichols || Goulding) cf046d0b3e
refactor: Extract a from implementation for creating TransitionPartitionId 2023-07-17 10:34:01 -04:00
Carol (Nichols || Goulding) c2606ff3ac
test: Add and use methods creating arbitrary TransitionPartitionId and PartitionHashIds 2023-07-17 09:56:55 -04:00
Carol (Nichols || Goulding) 158c5119d1
fix: Make TransitionPartitionId and PartitionHashId sortable 2023-07-17 09:56:55 -04:00
kodiakhq[bot] 5fa861abab
Merge branch 'main' into savage/individually-sequence-partitions-within-writes 2023-07-10 12:48:37 +00:00
dependabot[bot] 057ee40cb9
chore(deps): Bump thiserror from 1.0.41 to 1.0.43 (#8181)
Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.41 to 1.0.43.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.41...1.0.43)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-07 09:25:12 +00:00
Fraser Savage 54a8f7d007
feat(data_types): Add `Extend<SequenceNumberSet>` for `SequenceNumberSet`
Although callers could manually extend the sequence number set by continually
adding in an iterator loop or a fold expression, this enables other
combinator patterns when dealing with collections of sequence number
sets.
2023-07-05 14:23:18 +01:00
kodiakhq[bot] 70a6e60415
Merge branch 'main' into savage/use-u64-for-sequence-number 2023-07-05 12:55:44 +00:00
Marco Neumann 35d93f9475
fix: include `PartitionHashId` in size estimations (#8153)
As for the other types: size estimations are conservative, so we assume
the value behind the `Arc` is owned by the estimating party.
2023-07-05 10:42:39 +00:00
dependabot[bot] 3827257f94
chore(deps): Bump thiserror from 1.0.40 to 1.0.41 (#8149)
Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.40 to 1.0.41.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.40...1.0.41)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom <dom@itsallbroken.com>
2023-07-05 09:25:14 +00:00
dependabot[bot] 9a03d9c9fe
chore(deps): Bump paste from 1.0.12 to 1.0.13 (#8139)
Bumps [paste](https://github.com/dtolnay/paste) from 1.0.12 to 1.0.13.
- [Release notes](https://github.com/dtolnay/paste/releases)
- [Commits](https://github.com/dtolnay/paste/compare/1.0.12...1.0.13)

---
updated-dependencies:
- dependency-name: paste
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-04 07:57:41 +00:00
dependabot[bot] 647541fc12
chore(deps): Bump croaring from 0.8.1 to 0.9.0 (#8088)
Bumps [croaring](https://github.com/saulius/croaring-rs) from 0.8.1 to 0.9.0.
- [Release notes](https://github.com/saulius/croaring-rs/releases)
- [Commits](https://github.com/saulius/croaring-rs/commits)

---
updated-dependencies:
- dependency-name: croaring
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-27 08:10:39 +00:00
Fraser Savage 62cb6594c8
refactor(ingester): Use unsigned sequence number, remove its `Sqlx::Type`
Now that sequence numbers are internal to the ingester and the WAL,
there's no need for them to be a signed integer. As noted by
[#7260](https://github.com/influxdata/influxdb_iox/issues/7260) this was
a quirk related to the kafka-based IOx and Postgres only supported
signed integers.
2023-06-23 16:39:11 +01:00
Carol (Nichols || Goulding) 0d9f89ae48
test: Add verification of deterministic and collision-resistant properties of PartitionHashId 2023-06-22 09:01:22 -04:00