influxdb

Commit Graph

Author	SHA1	Message	Date
Dom Dwyer	6b6dbb0286	build: remove iox_gitops_adapter from build Broken release builds since: https://github.com/influxdata/influxdb_iox/pull/4675	2022-05-24 16:30:19 +01:00
Marco Neumann	9c1ffc2b0d	test: panic handling, add compactor to end to end test harness (#4677 ) * feat: add test gRPC client * test: start compactor in mini cluster * test: assert panic handling Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-24 14:55:26 +00:00
kodiakhq[bot]	df7b3c3a88	Merge pull request #4678 from influxdata/dom/streaming-compaction feat: streaming compaction	2022-05-24 14:48:18 +00:00
kodiakhq[bot]	8b1c704a82	Merge branch 'main' into dom/streaming-compaction	2022-05-24 14:42:18 +00:00
Andrew Lamb	52a50c4a14	fix: use large circleci executor for docs job (#4680 )	2022-05-24 14:26:49 +00:00
Andrew Lamb	4d8ece5524	feat: Add `Tombstone` to querier cache (#4663 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-24 13:21:23 +00:00
Dom Dwyer	8ff1a73797	revert: fix: compaction deadlock This reverts commit `00b5c1b296`. This change reverts the StreamSplitExec plan to using bounded, blocking channels, with the possibility of deadlock added to the docs. This is now tolerable because of the concurrent consumption of both output partitions in the compactor.	2022-05-24 14:12:00 +01:00
Dom Dwyer	c885b845dc	refactor: concurrent StreamSplitExec execution Changes the compactor to consume both StreamSplitExec output partitions concurrently. Practically speaking this means both Parquet files will be generated concurrently, and uploaded to object store concurrently.	2022-05-24 14:10:46 +01:00
Dom Dwyer	8f05250c96	feat: steaming compaction This commit changes the Compactor::compact() method to stream the RecordBatch instances directly to the parquet serialiser, before being uploaded directly to object storage.	2022-05-24 14:09:10 +01:00
Dom Dwyer	6aa626ef84	refactor: retry object store upload Changes the Storage::upload() method to endlessly retry uploading the generated Parquet file.	2022-05-24 11:29:42 +01:00
Luke Bond	b76a0080d5	chore: remove unused iox_gitops_adapter (#4675 ) * chore: remove unused iox_gitops_adapter * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-05-24 10:28:43 +00:00
Marco Neumann	a3dab68f3f	fix: actually log error (#4672 ) While logging all the helpful information to replicate failing querier->ingester requests via CLI, I totally forgot to log the error message itself. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-24 08:44:35 +00:00
dependabot[bot]	ca49820a0f	chore(deps): Bump console-subscriber from 0.1.5 to 0.1.6 (#4670 ) Bumps [console-subscriber](https://github.com/tokio-rs/console) from 0.1.5 to 0.1.6. - [Release notes](https://github.com/tokio-rs/console/releases) - [Commits](https://github.com/tokio-rs/console/compare/console-subscriber-v0.1.5...console-subscriber-v0.1.6) --- updated-dependencies: - dependency-name: console-subscriber dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-24 08:24:12 +00:00
dependabot[bot]	76f7043417	chore(deps): Bump once_cell from 1.11.0 to 1.12.0 (#4666 ) Bumps [once_cell](https://github.com/matklad/once_cell) from 1.11.0 to 1.12.0. - [Release notes](https://github.com/matklad/once_cell/releases) - [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md) - [Commits](https://github.com/matklad/once_cell/compare/v1.11.0...v1.12.0) --- updated-dependencies: - dependency-name: once_cell dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-24 08:14:03 +00:00
Andrew Lamb	e877a64462	feat: Add `ParquetFiles` cache and memory size estimation for ParquetMetadata (#4661 ) * feat: Add `ParquetFiles` cache * fix: Apply suggestions from code review Co-authored-by: Marko Mikulicic <mkm@influxdata.com> * fix: remove commented out debugging println * refactor: Improve size calculation * fix: mark `ParquetFileCache::clear` test only * fix: assert on metric count Co-authored-by: Marko Mikulicic <mkm@influxdata.com>	2022-05-23 17:11:38 +00:00
Dom	5239417925	Merge pull request #4662 from influxdata/dom/meta-remove-row-count refactor: do not embed row count & min/max timestamps in IOxMetadata	2022-05-23 17:00:19 +01:00
Dom	9cd1286051	Merge branch 'main' into dom/meta-remove-row-count	2022-05-23 16:39:38 +01:00
Marco Neumann	2029bd16ba	feat: enable debugging of failed querier->ingester requests (#4659 ) * feat: enable debugging of failed querier->ingester requests - extend `query-ingester` CLI to allow usage of predicates - on failed requests: log all information that required for the CLI - test the "ingester fails" scenario * test: explain Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * docs: improve Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: move b64 pred. serde into a single crate Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-05-23 15:37:31 +00:00
Dom Dwyer	2e6c49be83	refactor: remove IoxMetadata min & max timestamp Removes the min/max timestamp fields from the IoxMetadata proto structure embedded within a Parquet file's metadata. These values are redundant as they already exist within the Parquet column statistics, and precluded streaming serialisation as these removed min/max values were needed before serialising the file.	2022-05-23 16:27:08 +01:00
Dom Dwyer	a142a9eb57	refactor: remove row_count from IoxMetadata Remove the redundant row_count from the IoxMetadata structure that is serialised into the Parquet file. The reasoning is twofold: * The Parquet file's native metadata already contains a row count * Needing to know the number of rows up-front precludes streaming	2022-05-23 16:18:35 +01:00
Dom Dwyer	71555ee55c	test: Parquet metadata integration test Adds two integration tests covering validation of the embedded IOx metadata within the Parquet file metadata, and validation of the derived ParquetFileParams metadata used to populate the catalog.	2022-05-23 16:17:56 +01:00
kodiakhq[bot]	1fccee841b	Merge pull request #4649 from influxdata/dom/codec-object-store perf: streaming RecordBatch -> parquet encoder	2022-05-23 14:45:35 +00:00
Dom	f0d0f1ba0c	Merge branch 'main' into dom/codec-object-store	2022-05-23 15:39:54 +01:00
Andrew Lamb	a64b2b1d0b	feat: Add `SharedBackend` to cache system (#4652 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-23 14:24:24 +00:00
kodiakhq[bot]	d752991a25	Merge pull request #4638 from influxdata/cn/last-available feat: Add ingester CLI and env option to skip to oldest available WB seq num	2022-05-23 13:14:23 +00:00
kodiakhq[bot]	a06746c715	Merge branch 'main' into cn/last-available	2022-05-23 13:08:19 +00:00
Marco Neumann	47347bef9f	test: add query test scenario w/ missing columns in different chunks (#4656 ) * test: do NOT filter out query test scenarios w/ unordered stages in different partitions It should be possible to have two chunks in different partitions where both are in the ingester stage or the first one is in the parquet stage and the 2nd one in the ingester stage. * test: add query test scenario w/ missing columns in different chunks Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-23 12:13:41 +00:00
Dom Dwyer	af6d3f4d48	docs: remove clone ref comment	2022-05-23 11:46:06 +01:00
dependabot[bot]	5c033b462e	chore(deps): Bump regex from 1.5.5 to 1.5.6 (#4655 ) Bumps [regex](https://github.com/rust-lang/regex) from 1.5.5 to 1.5.6. - [Release notes](https://github.com/rust-lang/regex/releases) - [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-lang/regex/compare/1.5.5...1.5.6) --- updated-dependencies: - dependency-name: regex dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-23 08:39:01 +00:00
dependabot[bot]	292f71759e	chore(deps): Bump http-body from 0.4.4 to 0.4.5 (#4654 ) Bumps [http-body](https://github.com/hyperium/http-body) from 0.4.4 to 0.4.5. - [Release notes](https://github.com/hyperium/http-body/releases) - [Changelog](https://github.com/hyperium/http-body/blob/master/CHANGELOG.md) - [Commits](https://github.com/hyperium/http-body/compare/v0.4.4...v0.4.5) --- updated-dependencies: - dependency-name: http-body dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-23 08:30:49 +00:00
dependabot[bot]	1bc02b1487	chore(deps): Bump regex-syntax from 0.6.25 to 0.6.26 (#4653 ) Bumps [regex-syntax](https://github.com/rust-lang/regex) from 0.6.25 to 0.6.26. - [Release notes](https://github.com/rust-lang/regex/releases) - [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-lang/regex/commits) --- updated-dependencies: - dependency-name: regex-syntax dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-23 08:21:39 +00:00
Carol (Nichols \|\| Goulding)	05bd9de4d3	test: Add a test for the sequence number skipping metric Ok, so... this needed lots of... channels. Channels everywhere. The stream method on TestWriteBufferStreamHandler previously assumed it would only be called once. In a test where reset_to_earliest is called, stream might be called again to get the reset stream. We want to be able to control which of the streams gets which operations, so that's why the macro now takes a vec of vec of operations-- one vec of operations per expected call to stream, and the stream will send all the operations in its vec. The test thread needs to wait for the handler stream to consume the last item from the last receiver stream, so when the TestWriteBufferStreamHandler has set up the last expected call to stream, pass back the last transmitter and have it wait until it's at full expected capacity (which means all operations have been consumed by the receiver).	2022-05-20 20:50:02 -04:00
Carol (Nichols \|\| Goulding)	bda231051a	feat: Record metrics when resetting the write buffer and skipping sequence numbers	2022-05-20 20:48:17 -04:00
Carol (Nichols \|\| Goulding)	e5e08e5b16	test: Add a test of reset_to_earliest for all write buffer implementations This is the basic test case; I've filed #4651 for the more complex test needing deletion of records from the write buffer.	2022-05-20 20:48:17 -04:00
Carol (Nichols \|\| Goulding)	bcbf7b4f46	refactor: Move error handling logic to be all together	2022-05-20 20:48:17 -04:00
Carol (Nichols \|\| Goulding)	549dd497ea	refactor: Extract an ingester verification function	2022-05-20 20:48:16 -04:00
kodiakhq[bot]	b79db5f609	Merge pull request #4645 from influxdata/cn/update-rustc chore: Update to Rust 1.61	2022-05-21 00:47:20 +00:00
kodiakhq[bot]	f6b3296136	Merge branch 'main' into cn/update-rustc	2022-05-21 00:41:42 +00:00
Carol (Nichols \|\| Goulding)	2aa76622c3	refactor: Extract a test setup function	2022-05-20 11:51:57 -04:00
Carol (Nichols \|\| Goulding)	ab72c93a5e	docs: Updating wrapping, content, and grammar of comments	2022-05-20 10:51:07 -04:00
Carol (Nichols \|\| Goulding)	c811bebdb7	feat: Add ingester CLI option to skip to oldest available WB seq num The default behavior of the ingester is to panic if the min unpersisted sequence number in the catalog is unknown to the write buffer due to the retention policies having evicted that sequence number. Specifying `--skip-to-oldest-available` changes this behavior to skip to the oldest sequence number the write buffer does have available and go from there. Fixes #4624.	2022-05-20 10:51:07 -04:00
Carol (Nichols \|\| Goulding)	b3f97bdb9d	test: Capture existing behavior for unknown sequence number	2022-05-20 10:51:06 -04:00
Jake Goulding	359046f3f2	ci: give the doc builder more memory	2022-05-20 10:44:06 -04:00
Dom Dwyer	00dc95829d	style: enable more lints Enable more lints on the parquet_file crate to keep it a little cleaner - adds the following: clippy::clone_on_ref_ptr, unreachable_pub, missing_docs, clippy::todo, clippy::dbg_macro This commit includes fixes for any new lint failures.	2022-05-20 15:17:40 +01:00
Dom Dwyer	7df7c4844c	refactor: remove redundant ParquetChunk errors Eliminates unused / refactors away unnecessary errors for the parquet::chunk module.	2022-05-20 15:17:40 +01:00
Dom Dwyer	661f8599a6	refactor: internalise Parquet path generation Derive the ParquetFilePath from the IoxMetadata within the ParquetStorage::read_filter() call. This prevents the "put/get RecordBatches" abstraction from leaking out the object store path generation concern - an implementation detail of the ParquetStorage layer.	2022-05-20 15:17:40 +01:00
Dom Dwyer	cdb341d45a	test: ParquetStorage upload() and read_filter() Adds tests for the previously untested (directly at least) Parquet (de)serialisation & persistence layer, provided by the ParquetStorage type.	2022-05-20 15:17:40 +01:00
Dom Dwyer	302301659e	refactor: derive ParquetFilePath from IoxMetadata Allow directly converting an IoxMetadata to a ParquetFilePath.	2022-05-20 15:17:40 +01:00
Dom Dwyer	b9a745d42d	feat: RecordBatch stream to Parquet file upload Implements an upload() method on the ParquetStorage type, consuming a stream of RecordBatch, serialising the Parquet file, and uploading the result to object storage. Returns the IOx-specific file metadata. Currently while the upload() method accepts a stream of RecordBatch, the actual resulting Parquet file is buffered in memory before uploading to object store, due to lack of streaming upload functionality in the ObjectStore abstraction - this isn't the end of the world, as the files tend to be relatively small with our current usage. This impl should be easily modified to be fully streaming once streaming object store puts are implemented: https://github.com/influxdata/object_store_rs/issues/9	2022-05-20 15:17:40 +01:00
Dom Dwyer	76e08d14a3	perf: IoxParquetMetaData direct from file metadata Construct a IoxParquetMetaData instance directly from the FileMetaData instance returned by the ArrowWriter. This change will allow us to avoid the inefficient impl currently in use: * Serialise batches into memory * Wrap buffer in arrow cursor * Read parquet metadata with arrow file reader * Serialise schema with thrift * Serialise each row group's metadata with thrift * Construct our own FileMetaData instance * Serialise FileMetaData with thrift * zstd encode resulting thrift bytes * Wrap in IoxParquetMetaData Now we "only": * Stream batches into opaque Write impl * Serialise FileMetaData with thrift * zstd encode resulting thrift bytes * Wrap in IoxParquetMetaData Then accessing any data within the IoxParquetMetaData (as before this change) requires deserialising it first. There are still a number of easy performance improvements to be had w.r.t the metadata handling.	2022-05-20 15:17:40 +01:00

1 2 3 4 5 ...

8024 Commits (6b6dbb02865869bbff02ab314f79727c474ebcb4) All Branches Search

8024 Commits (6b6dbb02865869bbff02ab314f79727c474ebcb4)

All Branches