Commit Graph

9603 Commits (3fcca070f01652f2e8ebca7f02c05f5fbe29d862)

Author SHA1 Message Date
Dom Dwyer afcb96ae47 perf(ingester): deferred sort key lookup queries
This commit carries the SortKey in the PartitionData, and configures the
ingester to use deferred sort key lookups, smearing the lookups across a
fixed period of time after initialising the PartitionData, instead of
querying for the sort key at persist time.

This allows large numbers of PartitionData to be initialised without
causing a equally large spike in catalog load to resolve the sort key -
instead this load is spread out randomly to reduce peak query rps.
2022-10-06 16:39:54 +02:00
Dom Dwyer c022ab6786 feat: deferred partition sort key fetcher
Adds a new DeferredSortKey type that fetches a partition's sort key from
the catalog in the background, or on-demand if not yet pre-fetched.

From the caller's perspective, little has changed compared to reading it
from the catalog directly - the sort key is always returned when calling
get(), regardless of the mechanism, and retries are handled
transparently. Internally the sort key MAY have been pre-fetched in the
background between the DeferredSortKey being initialised, and the call
to get().

The background task waits a (uniformly) random duration of time before
issuing the catalog query to pre-fetch the sort key. This allows large
numbers of DeferredSortKey to (randomly) smear the lookup queries over a
large duration of time. This allows a large number of DeferredSortKey to
be initialised in a short period of time, without creating an equally
large spike in queries against the catalog in the same time period.
2022-10-06 16:37:04 +02:00
kodiakhq[bot] ace30b9d1d
Merge pull request #5798 from influxdata/dom/namespace-name
refactor: carry namespace name in NamespaceData
2022-10-06 14:06:37 +00:00
kodiakhq[bot] ffa1704d96
Merge branch 'main' into dom/namespace-name 2022-10-06 13:58:47 +00:00
Marco Neumann c4c83e0840
fix: query error propagation (#5801)
- treat OOM protection as "resource exhausted"
- use `DataFusionError` in more places instead of opaque `Box<dyn Error>`
- improve conversion from/into `DataFusionError` to preserve more
  semantics

Overall, this improves our error handling. DF can now return errors like
"resource exhausted" and gRPC should now automatically generate a
sensible status code for it.

Fixes #5799.
2022-10-06 08:54:01 +00:00
Dom Dwyer abb9122e2c refactor: carry namespace name in NamespaceData
Changes the ingester's NamespaceData to carry a ref-counted string
identifier as well as the ID.

The backing storage for the name in NamespaceData is shared with the
index map in ShardData, so it is effectively free!
2022-10-05 13:03:16 +02:00
kodiakhq[bot] e81dad972f
Merge pull request #5791 from influxdata/dom/remove-partition-queries
refactor: reference buffer tree nodes by ID
2022-10-05 10:54:19 +00:00
Dom c48aef27b4
Merge branch 'main' into dom/remove-partition-queries 2022-10-05 11:46:33 +01:00
dependabot[bot] c9a2445fd4
chore(deps): Bump handlebars from 4.3.4 to 4.3.5 (#5797)
* chore(deps): Bump handlebars from 4.3.4 to 4.3.5

Bumps [handlebars](https://github.com/sunng87/handlebars-rust) from 4.3.4 to 4.3.5.
- [Release notes](https://github.com/sunng87/handlebars-rust/releases)
- [Changelog](https://github.com/sunng87/handlebars-rust/blob/v4.3.5/CHANGELOG.md)
- [Commits](https://github.com/sunng87/handlebars-rust/compare/v4.3.4...v4.3.5)

---
updated-dependencies:
- dependency-name: handlebars
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-05 09:48:11 +00:00
Dom cc238c6a8f
Merge branch 'main' into dom/remove-partition-queries 2022-10-05 10:40:31 +01:00
dependabot[bot] 9bbbf86116
chore(deps): Bump sqlparser from 0.24.0 to 0.25.0 (#5795)
Bumps [sqlparser](https://github.com/sqlparser-rs/sqlparser-rs) from 0.24.0 to 0.25.0.
- [Release notes](https://github.com/sqlparser-rs/sqlparser-rs/releases)
- [Changelog](https://github.com/sqlparser-rs/sqlparser-rs/blob/main/CHANGELOG.md)
- [Commits](https://github.com/sqlparser-rs/sqlparser-rs/compare/v0.24.0...v0.25.0)

---
updated-dependencies:
- dependency-name: sqlparser
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-05 09:04:18 +00:00
Andrew Lamb a11aafe25b
chore: Update SQL repl to refer to `namespace` rather than `database` (#5788) 2022-10-04 12:53:17 +00:00
Dom Dwyer 1a7eb47b81 refactor: persist() passes all necessary IDs
This commit changes the persist() call so that it passes through all
relevant IDs so that the impl can locate the partition in the buffer
tree - this will enable elimination of many queries against the catalog
in the future.

This commit also cleans up the persist() impl, deferring queries until
the result will be used to avoid unnecessary load, improves logging &
error handling, and documents a TOCTOU bug in code:

    https://github.com/influxdata/influxdb_iox/issues/5777
2022-10-04 14:28:01 +02:00
Dom Dwyer f9bf86927d refactor: ref PartitionData by key & ID
Changes the TableData to hold a map of partition key -> PartitionData,
and partition ID -> PartitionData simultaneously. This allows for cheap
lookups when the caller holds an ID.

This commit also manages to internalise the partition map within the
TableData - one less pub / peeking!

This commit also switches from a BTreeMap to a HashMap as the backing
collection, as maintaining key ordering doesn't appear to be necessary.
2022-10-04 14:28:01 +02:00
Dom Dwyer 0847cc5458 refactor: PartitionData::id() -> partition_id()
Consistent naming is consistent - all the others are thing_id().
2022-10-04 14:28:01 +02:00
Dom Dwyer 66e05b5ea7 refactor: ref NamespaceData by name & ID
Changes the ShardData to hold a map of namespace name -> NamespaceData,
and namespace ID -> NamespaceData simultaneously.

This allows for cheap lookups when the caller holds an ID, and is part
of preparatory work to transition away from using string names in the
ingester for tables.

This commit also switches from a BTreeMap to a HashMap as the backing
collection, as maintaining key ordering doesn't appear to be necessary.
2022-10-04 14:28:01 +02:00
Dom Dwyer 9c0e4e98c4 refactor: ref TableData by name & ID
Changes the NamespaceData to hold a map of table name -> TableData, and
table ID -> TableData simultaneously.

This allows for cheap lookups when the caller holds an ID, and is part
of preparatory work to transition away from using string names in the
ingester for tables.

This commit also switches from a BTreeMap to a HashMap as the backing
collection, as maintaining key ordering doesn't appear to be necessary.
2022-10-04 14:28:01 +02:00
kodiakhq[bot] 75178e4591
Merge pull request #5786 from influxdata/dom/fix-mem-counting
fix(ingester): incorrect memory tracking of failed writes
2022-10-03 10:31:43 +00:00
Dom Dwyer 7efd81a63a docs: comment write record ordering 2022-10-03 12:23:30 +02:00
Dom Dwyer b23ad31711 fix: spurious memory accounting for failed write
Fixes a case where the ingester may incorrectly record a write as having
been buffered in memory, when in fact the buffering failed.

This could cause the effective buffer size to be reduced over time as
more and more data is spuriously "added" to the buffer, but never
released back to the memory tracker as it is never persisted.
2022-10-03 12:13:43 +02:00
Dom Dwyer 20451921d0 test: MockLifecycleHandle captures calls
Changes the NoopLifecycleHandle to MockLifecycleCall, and adds code
causing it to log all calls made to the log_write() method.

This will allow tests to assert calls and their values in DML buffering
tests.
2022-10-03 12:13:43 +02:00
dependabot[bot] e7a3254378
chore(deps): Bump ordered-float from 3.1.0 to 3.2.0 (#5784)
Bumps [ordered-float](https://github.com/reem/rust-ordered-float) from 3.1.0 to 3.2.0.
- [Release notes](https://github.com/reem/rust-ordered-float/releases)
- [Commits](https://github.com/reem/rust-ordered-float/compare/v3.1.0...v3.2.0)

---
updated-dependencies:
- dependency-name: ordered-float
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-03 09:09:00 +00:00
Dom 12231654ce
Merge pull request #5785 from influxdata/dependabot/cargo/smallvec-1.10.0
chore(deps): Bump smallvec from 1.9.0 to 1.10.0
2022-10-03 09:59:58 +01:00
dependabot[bot] 3ff48152c9
chore(deps): Bump smallvec from 1.9.0 to 1.10.0
Bumps [smallvec](https://github.com/servo/rust-smallvec) from 1.9.0 to 1.10.0.
- [Release notes](https://github.com/servo/rust-smallvec/releases)
- [Commits](https://github.com/servo/rust-smallvec/compare/v1.9.0...v1.10.0)

---
updated-dependencies:
- dependency-name: smallvec
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-03 01:56:00 +00:00
Stuart Carnie b862ae6476
feat: `EXPLAIN` statement (#5763) 2022-10-03 00:38:35 +00:00
Andrew Lamb 82d5c7f336
feat: support parallel, chunked upload via `influxdb_iox write` of line protocol, gzip'd line protocol, and parquet (#5757)
* feat: Upload in small chunks and in parallel

* fix: doclink

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: Update influxdb_iox_client/src/client/write.rs

* fix: fixup error handling and fmt

* fix: Make default chunk sizes the same and add docs

* fix: clippy

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2022-09-30 21:02:38 +00:00
Jake Goulding 627f617284 fix: Skip creating an ArgGroup for the All-in-One `Config`
As this type is flattened into other types also called `Config`, the
reused name would cause a conflict.
2022-09-30 16:59:29 -04:00
Jake Goulding b2377a117a fix: Restore --help flag 2022-09-30 16:59:28 -04:00
Jake Goulding d0f8f0aa60 fix: Remove variadic space-separated support for --write-buffer-connection-config
We don't actually appear to use the space support anywhere, preferring
the comma version.
2022-09-30 16:59:28 -04:00
Carol (Nichols || Goulding) 576d629ce4 fix: Remove leading `--` from long option names 2022-09-30 16:59:28 -04:00
Carol (Nichols || Goulding) c94c7eabbc fix: Replace deprecated parse(try_from_str(..)) with value_parser
See <https://github.com/clap-rs/clap/pull/3742>
2022-09-30 16:59:03 -04:00
Carol (Nichols || Goulding) 50f84906e2 fix: Remove multiple_values = true; it's now implied because of Vec
See <https://docs.rs/clap/4.0.2/clap/_derive/index.html#arg-types>

> clap assumes some intent based on the type used:
>
> ...
>
> Vec<T>	0.. occurrences of argument	.action(ArgAction::Append).required(false).num_args(1..)
2022-09-30 16:59:03 -04:00
Carol (Nichols || Goulding) 73d7105f20 fix: Update from clap ArgEnum to ValueEnum
See <https://github.com/clap-rs/clap/pull/4127>
2022-09-30 16:59:03 -04:00
Carol (Nichols || Goulding) 3c11d3640f fix: Update use of clap::StructOpt to clap::Parser
StructOpt is now fully part of Clap.

https://docs.rs/clap/latest/clap/_faq/index.html#how-does-clap-compare-to-structopt
2022-09-30 16:59:03 -04:00
Carol (Nichols || Goulding) e76b93d47b fix: Remove deprecated takes_value attribute
See <https://github.com/clap-rs/clap/issues/2688>
2022-09-30 16:48:26 -04:00
dependabot[bot] 199e47721a chore(deps): Bump clap from 3.2.22 to 4.0.7
Bumps [clap](https://github.com/clap-rs/clap) from 3.2.22 to 4.0.7.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v3.2.22...v4.0.7)
2022-09-30 16:46:56 -04:00
kodiakhq[bot] fc0634792b
Merge pull request #5780 from influxdata/dom/test-cleanup
test: refactor ingester helpers
2022-09-30 16:43:18 +00:00
kodiakhq[bot] 27d98479f1
Merge branch 'main' into dom/test-cleanup 2022-09-30 16:35:16 +00:00
Nga Tran 2f08a64f16
feat: not split output files in the first step of cold compaction (#5781)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-30 16:08:03 +00:00
Nga Tran d171697fd7
feat: always pick cold partitions in next cycle even if it has been pa… (#5772)
* fix: always pick cold partitions in next cycle even if it has been partially compacted recently

* fix: comment

* fix: test output

* refactor: using var instead of literal

* fix: consider deleted L0s for recent writes

* chore: cleanup

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-30 15:54:00 +00:00
Dom Dwyer 7dd28f4230 test: simplify PartitionProvider mock
The PartitionKey is now part of the PartitionData, so there is no need
to specify the redundant ID when configuring the mock.
2022-09-30 16:32:39 +02:00
Dom Dwyer c33499764d test: share populate_catalog() across tests
Parametrises test_util::populate_catalog() and exports for re-use in
ingester tests.
2022-09-30 16:32:37 +02:00
Dom Dwyer fc47f6ab8f test: re-use test_utils::make_op
Share the make_op helper across all tests in the Ingester.
2022-09-30 16:32:36 +02:00
Dom Dwyer f0885612e9 test: shared mock LifecycleHandle impl
Moves the NoopLifecycleHandle to the Ingester's test_utils to share it
across multiple components.
2022-09-30 16:32:34 +02:00
kodiakhq[bot] d7677c1b1d
Merge pull request #5778 from influxdata/dom/lifecycle-ids
refactor: LifecycleStats tracks Namespace/TableId
2022-09-30 13:56:21 +00:00
Dom 4c3697c5c7
Merge branch 'main' into dom/lifecycle-ids 2022-09-30 14:31:07 +01:00
Dom Dwyer e84186763f refactor: LifecycleStats tracks Namespace/TableId
Changes the lifecycle handle to also track the namespace + table ID in
addition to the existing shard ID.

Adds asserts to ensure the values never vary for a given partition.
2022-09-30 15:29:39 +02:00
dependabot[bot] b1390368fb
chore(deps): Bump tikv-jemalloc-sys (#5774)
Bumps [tikv-jemalloc-sys](https://github.com/tikv/jemallocator) from 0.5.1+5.3.0-patched to 0.5.2+5.3.0-patched.
- [Release notes](https://github.com/tikv/jemallocator/releases)
- [Changelog](https://github.com/tikv/jemallocator/blob/main/CHANGELOG.md)
- [Commits](https://github.com/tikv/jemallocator/commits)

---
updated-dependencies:
- dependency-name: tikv-jemalloc-sys
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-30 11:22:38 +00:00
dependabot[bot] c187392be6
chore(deps): Bump libc from 0.2.133 to 0.2.134 (#5773)
Bumps [libc](https://github.com/rust-lang/libc) from 0.2.133 to 0.2.134.
- [Release notes](https://github.com/rust-lang/libc/releases)
- [Commits](https://github.com/rust-lang/libc/compare/0.2.133...0.2.134)

---
updated-dependencies:
- dependency-name: libc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom <dom@itsallbroken.com>
2022-09-30 11:14:18 +00:00
Andrew Lamb 04ae0aee80
refactor: Remove protobuf based write service (#5750)
* refactor: Remove grpc WriteService

* fix: update end to end test

* fix: Update generated_types/protos/influxdata/pbdata/v1/influxdb_pb_data_protocol.proto

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-30 10:55:03 +00:00