Commit Graph

173 Commits (65ba208f88bec8d8b4addb2163c366b1d829ae72)

Author SHA1 Message Date
Carol (Nichols || Goulding) fb5aa25c5b
fix: Separate most_recent_n into filtering by shard and not 2023-02-17 12:56:51 -05:00
Nga Tran ae58831467
test: add a test that have over 2 times ax limit files per plan (#7017)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-17 10:42:31 +00:00
Carol (Nichols || Goulding) 1d4f8d2c8d
test: Ingester integration tests that can have a little a internal state
As a treat.
2023-02-16 11:06:44 -05:00
Carol (Nichols || Goulding) 2fe9d9647f
refactor: Change the types returned from the IngesterRpcInterface 2023-02-16 10:02:17 -05:00
Dom Dwyer 2d46a364dc
feat: namespace soft-delete support
This commit adds initial support for "soft" namespace deletion, where
the actual records & data remain, but are no longer queryable /
writeable.

Soft deletion is eventually consistent - users can expect to continue
writing to and reading from a bucket after issuing a soft delete call,
until the various components either restart, or have their caches
flushed.

The components treat soft-deleted namespaces differently:

    * router: ignore soft deleted namespaces
    * ingester: accept soft deleted namespaces
    * compactor: accept soft deleted namespaces
    * querier: ignore soft deleted namespaces
    * various gRPC services: ignore soft deleted namespaces

This ensures that the ingester & compactor do not see rows "vanishing"
from the database, and continue to make forward progress.

Writes for the deleted namespace that are buffered in the ingester will
be persisted as normal, allowing us to support "un-delete" operations
where the system is restored to a the state at which the delete was
issued (rather than loosing the buffered data).

Follow-on work is required to ensure GC drops the orphaned parquet files
after the configured GC time, and optimisations such as not compacting
parquet from soft-deleted namespaces seems like a trivial win.
2023-02-13 12:01:35 +01:00
dependabot[bot] 0cbd9f6a82
chore(deps): Bump tokio-util from 0.7.5 to 0.7.7 (#6964)
---
updated-dependencies:
- dependency-name: tokio-util
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-02-13 10:10:53 +00:00
dependabot[bot] c0c9b51b9e
chore(deps): Bump tokio-util from 0.7.4 to 0.7.5 (#6941)
Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.4 to 0.7.5.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.4...tokio-util-0.7.5)

---
updated-dependencies:
- dependency-name: tokio-util
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-02-10 09:42:00 +00:00
Dom d44b6d412f
Merge branch 'main' into dom/always-requeue 2023-02-09 10:21:32 +00:00
dependabot[bot] 0ecde75af5
chore(deps): Bump object_store from 0.5.3 to 0.5.4 (#6900)
Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.3 to 0.5.4.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md)
- [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.3...object_store_0.5.4)

---
updated-dependencies:
- dependency-name: object_store
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-08 09:40:11 +00:00
Dom Dwyer 776ec41384
fix(ingester2): always maintain persist interval
This fixes an issue where persistence that does not ever complete blocks
the periodic enqueuing of persist tasks - this leads to the amount of
buffered data in the buffer tree increasing, and the persist queue depth
stays the same instead of draining the buffer.

This is an issue as the queue depth is designed to act as the
back-pressure of the ingester - once the depth exceeds a configurable
limit, further writes are rejected until the queue has drained
sufficiently (50%).

After this commit, stalled persistence (i.e. object store outage) will
not prevent the queue depth from growing, which should enable the
saturation protection to kick in.
2023-02-07 17:48:07 +01:00
Dom Dwyer 4ffd7fcc68
test: persist timer & wal rotation
Adds a unit test covering WAL rotation, buffer persistence & WAL file
deletion.
2023-02-07 15:52:11 +01:00
Raphael Taylor-Davies d3601a59f8
chore: update DataFusion, upgrade `arrow` `arrow-flight` and `parquet` to `32.0.0` (#6756)
* chore: update DataFusion

* fix: test

* chore: format

* chore: clippy

* chore: update arrow

* chore: arrow upgrade fallout

* chore: Run cargo hakari tasks

* chore: remove failing warm compaction test

* fix: flight error propagation

* chore: update parquet size

* fix: Update error message

* chore: Update parquet metadata test

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-06 11:35:39 +00:00
Carol (Nichols || Goulding) ae944668c1
fix: Remove unneeded let underscore for awaited future
See <https://rust-lang.github.io/rust-clippy/master/index.html#let_underscore_future>

This might be a false positive of the lint because we are awaiting the
future, but it's not needed as nothing is must_use here, so we can avoid
the lint by removing this.
2023-02-03 13:06:19 -05:00
Carol (Nichols || Goulding) 30fea67701
fix: Move variables within format strings. Thanks clippy!
Changes made automatically using `cargo clippy --fix`.
2023-02-03 13:06:17 -05:00
Dom Dwyer 67903a4bf2
feat(metrics): ingester2 WAL replay
Adds two metrics:

    * Number of files replayed (counted at the start of, not completion)
    * Number of applied ops

This will help identify when WAL replay is happening (an indication of
an ungraceful shutdown & potential temporary read unavailability).
2023-02-02 14:52:09 +01:00
dependabot[bot] 1a9c27cd9a
chore(deps): Bump uuid from 1.2.2 to 1.3.0
Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.2.2 to 1.3.0.
- [Release notes](https://github.com/uuid-rs/uuid/releases)
- [Commits](https://github.com/uuid-rs/uuid/compare/1.2.2...1.3.0)

---
updated-dependencies:
- dependency-name: uuid
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-02-01 09:32:39 +00:00
dependabot[bot] d0e6b16450
chore(deps): Bump bytes from 1.3.0 to 1.4.0
Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.3.0 to 1.4.0.
- [Release notes](https://github.com/tokio-rs/bytes/releases)
- [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md)
- [Commits](https://github.com/tokio-rs/bytes/compare/v1.3.0...v1.4.0)

---
updated-dependencies:
- dependency-name: bytes
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-02-01 00:30:56 +00:00
dependabot[bot] 875b6a3e99
chore(deps): Bump futures from 0.3.25 to 0.3.26 (#6766)
Bumps [futures](https://github.com/rust-lang/futures-rs) from 0.3.25 to 0.3.26.
- [Release notes](https://github.com/rust-lang/futures-rs/releases)
- [Changelog](https://github.com/rust-lang/futures-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/futures-rs/compare/0.3.25...0.3.26)

---
updated-dependencies:
- dependency-name: futures
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-31 11:33:50 +00:00
Dom Dwyer a029de95f4
refactor: panic with unresolvable table IDs
Include the table ID in the panic message.
2023-01-31 11:47:33 +01:00
Dom Dwyer 6e540bc8d6
refactor: panic with unresolvable namespace IDs
Include the namespace ID in the panic message.
2023-01-31 11:46:40 +01:00
Dom Dwyer 0d9b773693
refactor: panic with unresolvable partition IDs
Include the partition ID in the panic message.
2023-01-31 11:43:32 +01:00
dependabot[bot] 6f032b1d57
chore(deps): Bump async-trait from 0.1.63 to 0.1.64 (#6769)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.63 to 0.1.64.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.63...0.1.64)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-01-31 10:18:27 +00:00
dependabot[bot] ed7d02a225
chore(deps): Bump tokio from 1.24.2 to 1.25.0
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.24.2 to 1.25.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/commits/tokio-1.25.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-01-30 01:57:27 +00:00
Marko Mikulicic 0bc7d90ee3 chore: Avoid defining transition shard numbers in multiple crates 2023-01-27 18:30:34 +01:00
Marko Mikulicic aa9789049a
fix(iox): Use a transition shard id that doesn't overlap with legacy (#6733) 2023-01-27 14:23:40 +00:00
Nga Tran b8a80869d4
feat: introduce a new way of max_sequence_number for ingester, compactor and querier (#6692)
* feat: introduce a new way of max_sequence_number for ingester, compactor and querier

* chore: cleanup

* feat: new column max_l0_created_at to order files for deduplication

* chore: cleanup

* chore: debug info for chnaging cpu.parquet

* fix: update test parquet file

Co-authored-by: Marco Neumann <marco@crepererum.net>
2023-01-26 10:52:47 +00:00
kodiakhq[bot] 98c60f9dc5
Merge branch 'main' into cn/one-test 2023-01-25 15:49:51 +00:00
Carol (Nichols || Goulding) 4658510102
fix: For Ingester2, persist a particular namespace on demand and share MiniClusters
This should hopefully help CI from running out of Postgres
connections 😬

The old architecture will still need to be non-shared and persist
everything.
2023-01-25 10:36:56 -05:00
Dom Dwyer df87ca3f17
refactor: appropriate queue wait histogram buckets
Changes the bucket values for the queue wait duration metric to be more
appropriately scaled.
2023-01-25 16:31:49 +01:00
Dom Dwyer 7b69c84ceb
feat: export persist config metrics
Export the configured maximum persist parallelism, and the maximum queue
depth, so they can be used to compute % saturation in alerts /
dashboards.
2023-01-25 14:57:09 +01:00
Dom Dwyer b775288c92
refactor: fix duration metric units in description
It's seconds, not nanoseconds.
2023-01-24 15:49:16 +01:00
Dom Dwyer d198756a29
feat(metrics): instrument DmlSink::apply()
Record latency histograms for DmlSink::apply() calls, configuring
ingester2 to report the overall write path latency, and separately the
buffer apply latency.
2023-01-24 15:07:17 +01:00
Dom Dwyer 28d575d90f
feat(tracing): emit spans for write path
Emit tracing spans for each component of the write path in ingester2.
2023-01-24 15:07:16 +01:00
Dom Dwyer c9a1c7435b
feat(metrics): instrumented query execution
Instrument the query path in ingester2, capturing the query latency +
counts, broken down by success/error.
2023-01-24 15:07:16 +01:00
Dom Dwyer 3541243fcb
feat(metrics): persist duration histograms
Adds metrics to track the distribution duration spent actively
persisting a batch of partition data (compacting, generating parquet,
uploading, DB entries, etc) and another tracking the duration of time an
entry spent in the persist queue.

Together these provide a measurement of the latency of persist requests,
and as they contain event counters, they also provide the throughput and
number of outstanding jobs.
2023-01-24 15:05:56 +01:00
Dom Dwyer 0637540aad
feat(metrics): cumulative persist job count
Tracks the cumulative number of persist jobs enqueued on a single
ingester (the total amount, so including now-completed jobs).
2023-01-24 15:05:56 +01:00
dependabot[bot] 0114e7ee50
chore(deps): Bump async-trait from 0.1.61 to 0.1.63 (#6660)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.61 to 0.1.63.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.61...0.1.63)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-01-23 08:41:27 +00:00
Andrew Lamb 5b6d261396
refactor: remove iox_arrow_flight use in ingester2 (#6623)
* refactor: remove iox_arrow_flight use in ingester2

* fix: Update ingester2/src/server/grpc/query.rs

Co-authored-by: Dom <dom@itsallbroken.com>

* chore: remove unused Error enums

Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-19 15:27:23 +00:00
Andrew Lamb 8410998408
chore: Update datafusion to Jan 17, 2023 (2 / 2) and arrow/parquet `30.0.1` (#6604)
* chore: Update datafusion to Jan 9, 2023 (2 / 2) and arrow/parquet `30.0.1`

* chore: Update for changes in arrow ipc

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2023-01-18 15:51:24 +00:00
Dom Dwyer 4b3a5c0c2b
refactor(persist): pluggable completion observer
Changes the persist system to call into an abstract
PersistCompletionObserver after the persist task has completed, but
before releasing the job permit / notifying the enqueuer.

This call happens synchronously, driven by the persist worker to
completion. A sync construct can easily be made async (by enqueuing work
into a channel), but not the other way around, so this gives the best
flexibility.

This trait allows pluggable logic to be inserted into the persist
system, without tightly coupling it to the implementer's logic (for
example, replication). One or more observers may be chained together to
construct an arbitrary sequence of actors.

This commit uses a no-op observer, causing no functional change to the
system.
2023-01-17 11:28:32 +01:00
Dom Dwyer b4c1980e58
chore: remove old TODOs
These have been done!
2023-01-16 18:55:17 +01:00
Dom Dwyer fa62c00002
test: persist concurrent sort key catalog updates
Adds an integration test of the persist system, covering:

    * Node A starts a persist operation
    * Node B starts a persist operation for the same partition
    * Node A completes, setting the catalog sort key to a new value
    * Node B attempts to update the catalog, observing the new sort key
    * Node B re-compacts the data, re-uploads, and drives to completion

This scenario is/was tracked in:

    https://github.com/influxdata/influxdb_iox/issues/6439
2023-01-16 18:55:17 +01:00
Dom Dwyer 091b428d4f
refactor(persist): decouple Context & worker logic
The persist::Context struct carries the data to be persisted, a
reference to the partition from which it came, and various cached fields
to avoid re-acquiring the partition read lock all the time.

Prior to this commit, the Context also had the full persist logic as
methods, invoked by the persist worker. This tightly couples the data &
logic - it's fairly clear a worker should implement the work, and
operate on the data - not commingling the two. I even knew the mess I
was making when I wrote it, but effectively copy-pasted it from
ingester1 because deadlines.

This commit decouples the persist logic from the Context.
2023-01-16 18:36:17 +01:00
Dom Dwyer 1f5294c096
test: persistence system integration test
This test ensures the persistence system as a whole works in the happy
path.
2023-01-16 13:34:33 +01:00
Dom Dwyer 6413362c72
refactor: use system-wide ingester ID
The query API exposes a unique-per-instance UUID to allow callers to
detect a crash of the ingester process - this was initialised directly
in the query RPC handler.

This commit turns the bare UUID into a type, and initialises it in the
top-level initialisation of the ingester, plumbing it down into the
query RPC handler.

This allows the UUID to be reused by other components/handlers.
2023-01-13 16:46:38 +01:00
Dom Dwyer 8dc18a9838
perf: remove double-ref Partition map
The ingester no longer needs to access a specific PartitionData by ID
(they are addressed either via an iterator over the BufferTree, or
shared by Arc reference).

This allows us to remove the extra map maintaining ID -> PartitionData
references, and the shared access lock protecting it.
2023-01-13 14:05:30 +01:00
Dom f7ff877582
Merge branch 'main' into cn/ingester-persist-tick 2023-01-13 12:31:45 +00:00
Carol (Nichols || Goulding) 02c7ed58a2
fix: Use up the first interval tick so we wait first and then persist 2023-01-12 14:55:58 -05:00
Carol (Nichols || Goulding) 0554194923
docs: Explain persist-on-demand use case and potential limitations 2023-01-12 11:52:11 -05:00
Carol (Nichols || Goulding) e1395f4f35
fix: Move PersistNow to grpc/PersistHandler 2023-01-12 11:09:33 -05:00