Commit Graph

12607 Commits (614ff35998623026bd78b682dbce79827942e639)

Author SHA1 Message Date
Joe-Blount 614ff35998
Merge pull request #7958 from influxdata/jrb_46_compactor_scratchpad_cleaning
fix: clean compaction output from scratchpad
2023-06-12 12:05:42 -05:00
Joe-Blount 45099fa526
Merge branch 'main' into jrb_46_compactor_scratchpad_cleaning 2023-06-12 12:00:06 -05:00
kodiakhq[bot] f58e647d3c
Merge pull request #7962 from influxdata/savage/inspect-wal-contents
feat(cli): Add `influxdb_iox debug wal inspect` command
2023-06-12 14:27:41 +00:00
kodiakhq[bot] c2c614c765
Merge branch 'main' into savage/inspect-wal-contents 2023-06-12 14:22:05 +00:00
Fraser Savage 51f45fd710
refactor(cli): Simplify and improve sequence range prop test
Co-authored-by: Dom <dom@itsallbroken.com>
2023-06-12 15:14:08 +01:00
Marco Neumann 453a361d3c
feat: catalog parquet file cache TTL (#7975)
Avoid that the querier accesses files that were flagged for deletion a
long time ago. This would happen if the following conditions hold:

- we have very long-running querier pods (e.g. over holidays)
- the table doesn't receive any writes (or the partition if we ever
  change the cache granularity), hence the querier is never informed
  that its state is out-of-date
- a compactor runs a cold compaction, and by doing so flags a file for
  deletion
- the GC finally wants to delete it

This is mostly a safety measure to prevent weird internal server errors
that should nearly never happen. On the other hand I do not want to hunt
Heisenbugs.
2023-06-12 14:02:47 +00:00
dependabot[bot] 792b991778
chore(deps): Bump rustix from 0.37.19 to 0.37.20 (#7969)
Bumps [rustix](https://github.com/bytecodealliance/rustix) from 0.37.19 to 0.37.20.
- [Release notes](https://github.com/bytecodealliance/rustix/releases)
- [Commits](https://github.com/bytecodealliance/rustix/compare/v0.37.19...v0.37.20)

---
updated-dependencies:
- dependency-name: rustix
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-12 11:37:11 +00:00
Fraser Savage 71e47b59ab
refactor(wal): Make more use of combinators for WAL segment reading logic 2023-06-12 12:27:20 +01:00
Marco Neumann bfc3c0d934
chore: update CI and prod image to Debian 12 / bookworm (#7972)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-12 09:33:23 +00:00
dependabot[bot] 41386078fe
chore(deps): Bump chrono from 0.4.24 to 0.4.26 (#7971)
* chore(deps): Bump chrono from 0.4.24 to 0.4.26

Bumps [chrono](https://github.com/chronotope/chrono) from 0.4.24 to 0.4.26.
- [Release notes](https://github.com/chronotope/chrono/releases)
- [Changelog](https://github.com/chronotope/chrono/blob/main/CHANGELOG.md)
- [Commits](https://github.com/chronotope/chrono/compare/v0.4.24...v0.4.26)

---
updated-dependencies:
- dependency-name: chrono
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Dom <dom@itsallbroken.com>
2023-06-12 09:12:50 +00:00
Fraser Savage e809aadfe5
chore(cli): Fix error typo in `wal inspect` test
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2023-06-12 10:12:18 +01:00
dependabot[bot] 5db690f214
chore(deps): Bump clap from 4.3.2 to 4.3.3 (#7968)
Bumps [clap](https://github.com/clap-rs/clap) from 4.3.2 to 4.3.3.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v4.3.2...v4.3.3)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-12 08:57:03 +00:00
dependabot[bot] 19b0fbc81c
chore(deps): Bump log from 0.4.18 to 0.4.19 (#7966)
Bumps [log](https://github.com/rust-lang/log) from 0.4.18 to 0.4.19.
- [Release notes](https://github.com/rust-lang/log/releases)
- [Changelog](https://github.com/rust-lang/log/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/log/compare/0.4.18...0.4.19)

---
updated-dependencies:
- dependency-name: log
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-12 08:38:44 +00:00
Joe-Blount 59fd5cd7b4 chore: retain written files in shadow mode 2023-06-09 13:22:49 -05:00
Fraser Savage 73c0c28bd0
feat(cli): Add `influxdb_iox debug wal inspect` command
This commit adds an `inspect` command to read through the sequenced
operations in a WAL file and debug pretty print their contents to
stdout, optionally filtering by a sequence number range.
2023-06-09 18:16:57 +01:00
Fraser Savage fa69994358
refactor(wal): Implement `Iterator` for ClosedSegmentFileReader
The ClosedSegmentFileReader is pretty much an iterator anyways, this
just enables using all the juicy combinators with it more easily.
2023-06-09 17:30:53 +01:00
Martin Hilton c8a7a8ec91
fix(service_grpc_flight): invalid token status code (#7960)
Historically the authz crate didn't distinguish between an invalid
token and a valid token without the required permissions. Recently
errors were added to distinguish these cases. This means that an
invalid token now returns an "Internal" error if supplied an invalid
token. Detect this case and return a "Permission Denied" error,
which is the error type that was previously returned in this case.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-09 12:09:51 +00:00
Carol (Nichols || Goulding) 566ec68c58
refactor: Extract a test helper method for creating ParquetFileParams (#7959)
Co-authored-by: Dom <dom@itsallbroken.com>
2023-06-09 09:44:30 +00:00
Carol (Nichols || Goulding) 9524e7e478
docs: Remove TODO comment that's TODONE (#7956)
* docs: Remove TODO comment that's TODONE

* docs: Oops, turns out the TODO comment was this enum's documentation

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-08 19:22:22 +00:00
Andrew Lamb fd8855fc98
chore: use upstream FlightSQL metadata implementation (#7949)
* chore: use upstream FlightSQL metadata implementation

* fix: update doc strings

* fix: Remove incorrect comment about table_type
2023-06-08 18:39:11 +00:00
Dom 5bce4477b7
Merge pull request #7953 from influxdata/dom/partition-key-dedupe
perf: partition key generation dedupe
2023-06-08 16:34:41 +01:00
Dom 93fe5949e9
Merge branch 'main' into dom/partition-key-dedupe 2023-06-08 16:12:31 +01:00
Joe-Blount 9171e1521f fix: clean compaction output from scratchpad 2023-06-08 09:35:33 -05:00
kodiakhq[bot] 64fa17b3be
Merge pull request #7937 from influxdata/savage/sequence-per-partition
refactor(wal): Associate sequence numbers to table ID in `SequencedWalOp`s
2023-06-08 14:34:16 +00:00
kodiakhq[bot] e7effc62b5
Merge branch 'main' into savage/sequence-per-partition 2023-06-08 14:28:44 +00:00
Marko Mikulicic d26ad8e079
feat: Allow passing service protection limits in create db gRPC call (#7941)
* feat: Allow passing service protection limits in create db gRPC call

* fix: Move the impl into the catalog namespace trait

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-08 14:28:32 +00:00
Dom 4ab407ef1b
Merge pull request #7957 from influxdata/dom/remove-replication
refactor: remove unused replication proto
2023-06-08 15:22:49 +01:00
Fraser Savage 309310ac4c
refactor(ingester): Make line protocol in `wal_sink` test more readable
Co-authored-by: Dom <dom@itsallbroken.com>
2023-06-08 15:13:59 +01:00
Dom Dwyer ee4f633dba
refactor: remove unused replication proto
This was from an earlier design.
2023-06-08 16:04:49 +02:00
Carol (Nichols || Goulding) bf699a8b60
fix: Remove partition ID from the metadata serialized into Parquet files (#7947)
Nothing gets the partition ID out of the metadata. The parts of the code
interacting with object storage that need the ID to create the object
store path were using the partition ID from the metadata out of
convenience, but I changed those places to pass in the partition ID in a
separate argument instead.

This will make the transition to deterministic partition IDs a bit
smoother.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-08 14:03:21 +00:00
Dom ddc9966127
Merge pull request #7954 from influxdata/dom/no-format-panic-2
fix: panic when using %#z time formatter
2023-06-08 13:45:08 +01:00
Dom Dwyer 60d3ae403f
fix: panic when using %#z time formatter
Props to proptesting for this one - the prop_arbitrary_strftime_format()
randomly generated the formatting sequence "%#z" which turns out to be
an undocumented way of causing a panic in chrono:

    088b69372e/src/format/mod.rs (L673)

In fact, the docs actually list is as a usable sequence!
2023-06-08 14:28:03 +02:00
Fraser Savage fad34c375e
refactor(wal): Use TableId type for look-aside map key
This adds a little extra layer of type safety and should be optimised
by the compiler. This commit also makes sure the ingester's WAL sink
tests assert the behaviour for partitioned sequence numbering on an
operation that hits multiple tables & thus partitions.
2023-06-08 11:39:23 +01:00
Fraser Savage 6daec564d0
Merge branch 'main' into savage/sequence-per-partition 2023-06-08 10:24:50 +01:00
dependabot[bot] 5acca21f12
chore(deps): Bump serde from 1.0.163 to 1.0.164 (#7951)
Bumps [serde](https://github.com/serde-rs/serde) from 1.0.163 to 1.0.164.
- [Release notes](https://github.com/serde-rs/serde/releases)
- [Commits](https://github.com/serde-rs/serde/compare/v1.0.163...v1.0.164)

---
updated-dependencies:
- dependency-name: serde
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom <dom@itsallbroken.com>
2023-06-08 09:20:19 +00:00
Dom Dwyer 08ecb7fba3
perf: partition key generation dedupe
This commit changes the partitioner to skip generating partition keys
for successive rows that would generate identical partition keys.

Often successive rows in a batch will map to the same partition key -
for example, if multiple measurements are taken at the same time, then
the strftime formatter will output the same partition key part for each
row.

This commit changes the partitioner to only generate the first key
string in such a batch of identical keys. This is cheap to pre-compute,
as we only allow tag & time columns to be partitioned, both of which are
64-bit integers (dictionary key & timestamp respectively), making it
cheaper to check equality than to allocate & generate the partition key
string and check that.

Combined with the default YYYY-MM-DD precision reduction optimisation in
a prior commit, this optimisation is particularly effective for writes
with timestamps that span a single day (the typical case).

This change doubles the rows/s throughput for a modest 1,000 line batch,
with improvements across the board. I'd expect the performance benefit
to increase as the batch size increases, and/or as more partition
template parts are added.
2023-06-08 11:18:51 +02:00
Fraser Savage d1031c5ec6
docs(wal): Explicitly call out transitive relation between table and partition in a write
Co-authored-by: Dom <dom@itsallbroken.com>
2023-06-08 10:17:47 +01:00
Dom Dwyer 60cbf53087
refactor: strftime last value equality matcher
Allows the StftimeFormatter to perform an equality match against a
timestamp and the last rendered timestamp, potentially after applying
the precision reduction optimisation if appropriate.
2023-06-08 11:15:13 +02:00
Dom c791a179a8
Merge pull request #7950 from influxdata/dependabot/cargo/hashbrown-0.14.0
chore(deps): Bump hashbrown from 0.13.2 to 0.14.0
2023-06-08 10:14:45 +01:00
CircleCI[bot] 01732fd7b0 chore: Run cargo hakari tasks 2023-06-08 08:45:49 +00:00
dependabot[bot] 4531be115d
chore(deps): Bump hashbrown from 0.13.2 to 0.14.0
Bumps [hashbrown](https://github.com/rust-lang/hashbrown) from 0.13.2 to 0.14.0.
- [Changelog](https://github.com/rust-lang/hashbrown/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/hashbrown/compare/v0.13.2...v0.14.0)

---
updated-dependencies:
- dependency-name: hashbrown
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-06-08 08:44:40 +00:00
Dom 3b91cc1e15
Merge pull request #7931 from influxdata/dependabot/cargo/hashbrown-0.14.0
chore(deps): Bump hashbrown from 0.13.2 to 0.14.0
2023-06-08 09:42:06 +01:00
Dom be75ba23e0
Merge branch 'main' into dependabot/cargo/hashbrown-0.14.0 2023-06-08 09:32:01 +01:00
Phil Bracikowski 92a83270f3
fix(garbage-collector): just test parquet file exists (#7948)
* fix(garbage-collector): just test parquet file existence

The GC, when checking files in object store against the catalog, only
cares if the parquet file for the given object store id exists in the
catalog. It doesn't need the full parquet file. Let's not transmit it
over the wire.

This PR uses a SELECT 1 and boolean to test for parquet file existing.

* helps #7784

* chore: use struct for from_row

* chore: satisfy clippy

* chore: fmt
2023-06-07 15:12:48 -07:00
Andrew Lamb 17c0d837b3
chore: Update DataFusion, arrow, object_store pins (#7942)
* chore: Update DataFusion, arrow, object_store pins

* chore: Update for hakari

* chore: Update for new APIs

* fix: update test

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-07 17:08:31 +00:00
Carol (Nichols || Goulding) 6f75053712
fix: Update hashbrown drain_filter to extract_if
See <https://github.com/rust-lang/hashbrown/pull/374>
2023-06-07 11:52:37 -04:00
dependabot[bot] fcd9d9e3e6
chore(deps): Bump hashbrown from 0.13.2 to 0.14.0
Bumps [hashbrown](https://github.com/rust-lang/hashbrown) from 0.13.2 to 0.14.0.
- [Changelog](https://github.com/rust-lang/hashbrown/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/hashbrown/compare/v0.13.2...v0.14.0)

---
updated-dependencies:
- dependency-name: hashbrown
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-06-07 11:52:16 -04:00
kodiakhq[bot] 202101c719
Merge pull request #7930 from influxdata/cn/custom-partition-validation
feat: Validate custom partition templates on their creation
2023-06-07 15:48:20 +00:00
Carol (Nichols || Goulding) 2becc950e1
fix: Use expect rather than returning error in a theoretically impossible case 2023-06-07 11:38:12 -04:00
Carol (Nichols || Goulding) d0db1194e2
feat: Validate custom partition templates on their creation
Make sure custom partition templates have:

- At least one part
- No more than 8 parts
- Only nonempty, valid strftime formats
2023-06-07 11:38:12 -04:00