Commit Graph

8024 Commits (6b6dbb02865869bbff02ab314f79727c474ebcb4)

Author SHA1 Message Date
Dom Dwyer 6b6dbb0286 build: remove iox_gitops_adapter from build
Broken release builds since:

    https://github.com/influxdata/influxdb_iox/pull/4675
2022-05-24 16:30:19 +01:00
Marco Neumann 9c1ffc2b0d
test: panic handling, add compactor to end to end test harness (#4677)
* feat: add test gRPC client

* test: start compactor in mini cluster

* test: assert panic handling

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-24 14:55:26 +00:00
kodiakhq[bot] df7b3c3a88
Merge pull request #4678 from influxdata/dom/streaming-compaction
feat: streaming compaction
2022-05-24 14:48:18 +00:00
kodiakhq[bot] 8b1c704a82
Merge branch 'main' into dom/streaming-compaction 2022-05-24 14:42:18 +00:00
Andrew Lamb 52a50c4a14
fix: use large circleci executor for docs job (#4680) 2022-05-24 14:26:49 +00:00
Andrew Lamb 4d8ece5524
feat: Add `Tombstone` to querier cache (#4663)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-24 13:21:23 +00:00
Dom Dwyer 8ff1a73797 revert: fix: compaction deadlock
This reverts commit 00b5c1b296.

This change reverts the StreamSplitExec plan to using bounded, blocking
channels, with the possibility of deadlock added to the docs.

This is now tolerable because of the concurrent consumption of both
output partitions in the compactor.
2022-05-24 14:12:00 +01:00
Dom Dwyer c885b845dc refactor: concurrent StreamSplitExec execution
Changes the compactor to consume both StreamSplitExec output partitions
concurrently.

Practically speaking this means both Parquet files will be generated
concurrently, and uploaded to object store concurrently.
2022-05-24 14:10:46 +01:00
Dom Dwyer 8f05250c96 feat: steaming compaction
This commit changes the Compactor::compact() method to stream the
RecordBatch instances directly to the parquet serialiser, before being
uploaded directly to object storage.
2022-05-24 14:09:10 +01:00
Dom Dwyer 6aa626ef84 refactor: retry object store upload
Changes the Storage::upload() method to endlessly retry uploading the
generated Parquet file.
2022-05-24 11:29:42 +01:00
Luke Bond b76a0080d5
chore: remove unused iox_gitops_adapter (#4675)
* chore: remove unused iox_gitops_adapter

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-05-24 10:28:43 +00:00
Marco Neumann a3dab68f3f
fix: actually log error (#4672)
While logging all the helpful information to replicate failing
querier->ingester requests via CLI, I totally forgot to log the error
message itself.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-24 08:44:35 +00:00
dependabot[bot] ca49820a0f
chore(deps): Bump console-subscriber from 0.1.5 to 0.1.6 (#4670)
Bumps [console-subscriber](https://github.com/tokio-rs/console) from 0.1.5 to 0.1.6.
- [Release notes](https://github.com/tokio-rs/console/releases)
- [Commits](https://github.com/tokio-rs/console/compare/console-subscriber-v0.1.5...console-subscriber-v0.1.6)

---
updated-dependencies:
- dependency-name: console-subscriber
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-24 08:24:12 +00:00
dependabot[bot] 76f7043417
chore(deps): Bump once_cell from 1.11.0 to 1.12.0 (#4666)
Bumps [once_cell](https://github.com/matklad/once_cell) from 1.11.0 to 1.12.0.
- [Release notes](https://github.com/matklad/once_cell/releases)
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md)
- [Commits](https://github.com/matklad/once_cell/compare/v1.11.0...v1.12.0)

---
updated-dependencies:
- dependency-name: once_cell
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-24 08:14:03 +00:00
Andrew Lamb e877a64462
feat: Add `ParquetFiles` cache and memory size estimation for ParquetMetadata (#4661)
* feat: Add `ParquetFiles` cache

* fix: Apply suggestions from code review

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>

* fix: remove commented out debugging println

* refactor: Improve size calculation

* fix: mark `ParquetFileCache::clear` test only

* fix: assert on metric count

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
2022-05-23 17:11:38 +00:00
Dom 5239417925
Merge pull request #4662 from influxdata/dom/meta-remove-row-count
refactor: do not embed row count & min/max timestamps in IOxMetadata
2022-05-23 17:00:19 +01:00
Dom 9cd1286051
Merge branch 'main' into dom/meta-remove-row-count 2022-05-23 16:39:38 +01:00
Marco Neumann 2029bd16ba
feat: enable debugging of failed querier->ingester requests (#4659)
* feat: enable debugging of failed querier->ingester requests

- extend `query-ingester` CLI to allow usage of predicates
- on failed requests: log all information that required for the CLI
- test the "ingester fails" scenario

* test: explain

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* docs: improve

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refactor: move b64 pred. serde into a single crate

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-05-23 15:37:31 +00:00
Dom Dwyer 2e6c49be83 refactor: remove IoxMetadata min & max timestamp
Removes the min/max timestamp fields from the IoxMetadata proto
structure embedded within a Parquet file's metadata.

These values are redundant as they already exist within the Parquet
column statistics, and precluded streaming serialisation as these
removed min/max values were needed before serialising the file.
2022-05-23 16:27:08 +01:00
Dom Dwyer a142a9eb57 refactor: remove row_count from IoxMetadata
Remove the redundant row_count from the IoxMetadata structure that is
serialised into the Parquet file.

The reasoning is twofold:

    * The Parquet file's native metadata already contains a row count
    * Needing to know the number of rows up-front precludes streaming
2022-05-23 16:18:35 +01:00
Dom Dwyer 71555ee55c test: Parquet metadata integration test
Adds two integration tests covering validation of the embedded IOx
metadata within the Parquet file metadata, and validation of the derived
ParquetFileParams metadata used to populate the catalog.
2022-05-23 16:17:56 +01:00
kodiakhq[bot] 1fccee841b
Merge pull request #4649 from influxdata/dom/codec-object-store
perf: streaming RecordBatch -> parquet encoder
2022-05-23 14:45:35 +00:00
Dom f0d0f1ba0c
Merge branch 'main' into dom/codec-object-store 2022-05-23 15:39:54 +01:00
Andrew Lamb a64b2b1d0b
feat: Add `SharedBackend` to cache system (#4652)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-23 14:24:24 +00:00
kodiakhq[bot] d752991a25
Merge pull request #4638 from influxdata/cn/last-available
feat: Add ingester CLI and env option to skip to oldest available WB seq num
2022-05-23 13:14:23 +00:00
kodiakhq[bot] a06746c715
Merge branch 'main' into cn/last-available 2022-05-23 13:08:19 +00:00
Marco Neumann 47347bef9f
test: add query test scenario w/ missing columns in different chunks (#4656)
* test: do NOT filter out query test scenarios w/ unordered stages in different partitions

It should be possible to have two chunks in different partitions where
both are in the ingester stage or the first one is in the parquet stage
and the 2nd one in the ingester stage.

* test: add query test scenario w/ missing columns in different chunks

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-23 12:13:41 +00:00
Dom Dwyer af6d3f4d48 docs: remove clone ref comment 2022-05-23 11:46:06 +01:00
dependabot[bot] 5c033b462e
chore(deps): Bump regex from 1.5.5 to 1.5.6 (#4655)
Bumps [regex](https://github.com/rust-lang/regex) from 1.5.5 to 1.5.6.
- [Release notes](https://github.com/rust-lang/regex/releases)
- [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/regex/compare/1.5.5...1.5.6)

---
updated-dependencies:
- dependency-name: regex
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-23 08:39:01 +00:00
dependabot[bot] 292f71759e
chore(deps): Bump http-body from 0.4.4 to 0.4.5 (#4654)
Bumps [http-body](https://github.com/hyperium/http-body) from 0.4.4 to 0.4.5.
- [Release notes](https://github.com/hyperium/http-body/releases)
- [Changelog](https://github.com/hyperium/http-body/blob/master/CHANGELOG.md)
- [Commits](https://github.com/hyperium/http-body/compare/v0.4.4...v0.4.5)

---
updated-dependencies:
- dependency-name: http-body
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-23 08:30:49 +00:00
dependabot[bot] 1bc02b1487
chore(deps): Bump regex-syntax from 0.6.25 to 0.6.26 (#4653)
Bumps [regex-syntax](https://github.com/rust-lang/regex) from 0.6.25 to 0.6.26.
- [Release notes](https://github.com/rust-lang/regex/releases)
- [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/regex/commits)

---
updated-dependencies:
- dependency-name: regex-syntax
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-23 08:21:39 +00:00
Carol (Nichols || Goulding) 05bd9de4d3
test: Add a test for the sequence number skipping metric
Ok, so... this needed lots of... channels. Channels everywhere.

The stream method on TestWriteBufferStreamHandler previously assumed it
would only be called once. In a test where reset_to_earliest is called,
stream might be called again to get the reset stream.

We want to be able to control which of the streams gets which
operations, so that's why the macro now takes a vec of vec of
operations-- one vec of operations per expected call to stream, and the
stream will send all the operations in its vec.

The test thread needs to wait for the handler stream to consume the last
item from the last receiver stream, so when the
TestWriteBufferStreamHandler has set up the last expected call to
stream, pass back the last transmitter and have it wait until it's at
full expected capacity (which means all operations have been consumed by
the receiver).
2022-05-20 20:50:02 -04:00
Carol (Nichols || Goulding) bda231051a
feat: Record metrics when resetting the write buffer and skipping sequence numbers 2022-05-20 20:48:17 -04:00
Carol (Nichols || Goulding) e5e08e5b16
test: Add a test of reset_to_earliest for all write buffer implementations
This is the basic test case; I've filed #4651 for the more complex test
needing deletion of records from the write buffer.
2022-05-20 20:48:17 -04:00
Carol (Nichols || Goulding) bcbf7b4f46
refactor: Move error handling logic to be all together 2022-05-20 20:48:17 -04:00
Carol (Nichols || Goulding) 549dd497ea
refactor: Extract an ingester verification function 2022-05-20 20:48:16 -04:00
kodiakhq[bot] b79db5f609
Merge pull request #4645 from influxdata/cn/update-rustc
chore: Update to Rust 1.61
2022-05-21 00:47:20 +00:00
kodiakhq[bot] f6b3296136
Merge branch 'main' into cn/update-rustc 2022-05-21 00:41:42 +00:00
Carol (Nichols || Goulding) 2aa76622c3
refactor: Extract a test setup function 2022-05-20 11:51:57 -04:00
Carol (Nichols || Goulding) ab72c93a5e
docs: Updating wrapping, content, and grammar of comments 2022-05-20 10:51:07 -04:00
Carol (Nichols || Goulding) c811bebdb7
feat: Add ingester CLI option to skip to oldest available WB seq num
The default behavior of the ingester is to panic if the min unpersisted
sequence number in the catalog is unknown to the write buffer due to the
retention policies having evicted that sequence number.

Specifying `--skip-to-oldest-available` changes this behavior to skip to
the oldest sequence number the write buffer does have available and go
from there.

Fixes #4624.
2022-05-20 10:51:07 -04:00
Carol (Nichols || Goulding) b3f97bdb9d
test: Capture existing behavior for unknown sequence number 2022-05-20 10:51:06 -04:00
Jake Goulding 359046f3f2 ci: give the doc builder more memory 2022-05-20 10:44:06 -04:00
Dom Dwyer 00dc95829d style: enable more lints
Enable more lints on the parquet_file crate to keep it a little cleaner
- adds the following:

    clippy::clone_on_ref_ptr,
    unreachable_pub,
    missing_docs,
    clippy::todo,
    clippy::dbg_macro

This commit includes fixes for any new lint failures.
2022-05-20 15:17:40 +01:00
Dom Dwyer 7df7c4844c refactor: remove redundant ParquetChunk errors
Eliminates unused / refactors away unnecessary errors for the
parquet::chunk module.
2022-05-20 15:17:40 +01:00
Dom Dwyer 661f8599a6 refactor: internalise Parquet path generation
Derive the ParquetFilePath from the IoxMetadata within the
ParquetStorage::read_filter() call.

This prevents the "put/get RecordBatches" abstraction from leaking out
the object store path generation concern - an implementation detail of
the ParquetStorage layer.
2022-05-20 15:17:40 +01:00
Dom Dwyer cdb341d45a test: ParquetStorage upload() and read_filter()
Adds tests for the previously untested (directly at least) Parquet
(de)serialisation & persistence layer, provided by the ParquetStorage
type.
2022-05-20 15:17:40 +01:00
Dom Dwyer 302301659e refactor: derive ParquetFilePath from IoxMetadata
Allow directly converting an IoxMetadata to a ParquetFilePath.
2022-05-20 15:17:40 +01:00
Dom Dwyer b9a745d42d feat: RecordBatch stream to Parquet file upload
Implements an upload() method on the ParquetStorage type, consuming a
stream of RecordBatch, serialising the Parquet file, and uploading the
result to object storage. Returns the IOx-specific file metadata.

Currently while the upload() method accepts a stream of RecordBatch, the
actual resulting Parquet file is buffered in memory before uploading to
object store, due to lack of streaming upload functionality in the
ObjectStore abstraction - this isn't the end of the world, as the files
tend to be relatively small with our current usage.

This impl should be easily modified to be fully streaming once streaming
object store puts are implemented:

    https://github.com/influxdata/object_store_rs/issues/9
2022-05-20 15:17:40 +01:00
Dom Dwyer 76e08d14a3 perf: IoxParquetMetaData direct from file metadata
Construct a IoxParquetMetaData instance directly from the FileMetaData
instance returned by the ArrowWriter.

This change will allow us to avoid the inefficient impl currently in
use:

    * Serialise batches into memory
    * Wrap buffer in arrow cursor
    * Read parquet metadata with arrow file reader
    * Serialise schema with thrift
    * Serialise each row group's metadata with thrift
    * Construct our own FileMetaData instance
    * Serialise FileMetaData with thrift
    * zstd encode resulting thrift bytes
    * Wrap in IoxParquetMetaData

Now we "only":

    * Stream batches into opaque Write impl
    * Serialise FileMetaData with thrift
    * zstd encode resulting thrift bytes
    * Wrap in IoxParquetMetaData

Then accessing any data within the IoxParquetMetaData (as before this
change) requires deserialising it first.

There are still a number of easy performance improvements to be had
w.r.t the metadata handling.
2022-05-20 15:17:40 +01:00