Commit Graph

8045 Commits (04531e77dd1f57188b1a4045fb8cd713b265496a)

Author SHA1 Message Date
Carol (Nichols || Goulding) 04531e77dd
feat: Implement get on ReadBufferCache 2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding) 25b8260b72
feat: Implement ReadBufferCache::new 2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding) ab9010d9a6
refactor: Rename QuerierParquetChunk::new_parquet to new 2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding) df10452e2e
refactor: Rename methods from new_querier_chunk to new_querier_parquet_chunk 2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding) 4a90d0af32
refactor: Remove ChunkStorage enum; inline into QuerierParquetChunk instead 2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding) b2c62c6808
refactor: Rename QuerierChunk to QuerierParquetChunk 2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding) 66823522f3
docs: Fix comment wrapping while reading through 2022-05-25 17:19:10 -04:00
Nga Tran 6cc767efcc
feat: teach compactor to compact smaller number of files (#4671)
* refactor: split compact_partition into two functions to handle concurrency better

* feat: limit number of files to compact

* test: add test for limit num files

* chore: fix cipply

* feat: split group if over max size

* fix: split the overlapped group to limit size or file num

* chore: reduce config values

* test: add tests and clearer comments for the split_overlapped_groups and test_limit_size_and_num_files

* chore: more comments

* chore: cleanup
2022-05-25 19:54:34 +00:00
Marco Neumann 31d1b37d73
refactor: de-duplicate low-level arrow code (#4697)
It seems that during prototyping NG we've copied low level code (w/o
tests!) and never cleaned up. Let's not have this functionality twice.
2022-05-25 16:24:28 +00:00
Marko Mikulicic 9ddb0a816e
fix: Return panic message in internal error (#4693)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-25 15:11:17 +00:00
kodiakhq[bot] eb815f2b30
Merge pull request #4664 from influxdata/cn/fix-metric-names
fix: Make Kafka Partition related metric names potentially less confusing
2022-05-25 14:33:58 +00:00
Carol (Nichols || Goulding) 788e6eaf69
docs: Fix a comment that was very confused about what means kafka partition 2022-05-25 10:04:40 -04:00
Carol (Nichols || Goulding) 6ce6a38094
fix: Make metric names potentially less confusing 2022-05-25 10:04:39 -04:00
Marco Neumann a08a91c5ba
fix: ensure querier cache is refreshed for partition sort key (#4660)
* test: call `maybe_start_logging` in auto-generated cases

* fix: ensure querier cache is refreshed for partition sort key

Fixes #4631.

* docs: explain querier sort key handling and test

* test: test another version of issue 4631

* fix: correctly invalidate partition sort keys

* fix: fix `table_not_found_on_ingester`
2022-05-25 10:44:42 +00:00
Marko Mikulicic cdbe546e50
fix: return gRPC error on panic (#4686) 2022-05-25 07:06:25 +00:00
dependabot[bot] 24ee251080
chore(deps): Bump prost from 0.10.3 to 0.10.4 (#4688)
Bumps [prost](https://github.com/tokio-rs/prost) from 0.10.3 to 0.10.4.
- [Release notes](https://github.com/tokio-rs/prost/releases)
- [Commits](https://github.com/tokio-rs/prost/compare/v0.10.3...v0.10.4)

---
updated-dependencies:
- dependency-name: prost
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-25 06:07:05 +00:00
dependabot[bot] a8d3fe619c
chore(deps): Bump prost-build from 0.10.3 to 0.10.4 (#4687)
Bumps [prost-build](https://github.com/tokio-rs/prost) from 0.10.3 to 0.10.4.
- [Release notes](https://github.com/tokio-rs/prost/releases)
- [Commits](https://github.com/tokio-rs/prost/compare/v0.10.3...v0.10.4)

---
updated-dependencies:
- dependency-name: prost-build
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-25 06:00:27 +00:00
Andrew Lamb 935743b525
refactor: Implement `new_querier_chunk` and `new_querier_chunk_from_file_with_metadata` (#4685) 2022-05-24 21:58:27 +00:00
Andrew Lamb a8d5f7f5f7
test: add debug output to test (#4684) 2022-05-24 19:57:11 +00:00
Andrew Lamb 95e6a8ed46
chore: Update datafusion (again) (#4679)
* chore: Update datafusion deps

* fix: fix for changes in ScalarValue

* fix: fix for using TableSource rather than TableProvider

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-24 15:54:39 +00:00
kodiakhq[bot] 8c537c9467
Merge pull request #4682 from influxdata/dom/fix-build
build: remove iox_gitops_adapter from build
2022-05-24 15:37:31 +00:00
Dom Dwyer 6b6dbb0286 build: remove iox_gitops_adapter from build
Broken release builds since:

    https://github.com/influxdata/influxdb_iox/pull/4675
2022-05-24 16:30:19 +01:00
Marco Neumann 9c1ffc2b0d
test: panic handling, add compactor to end to end test harness (#4677)
* feat: add test gRPC client

* test: start compactor in mini cluster

* test: assert panic handling

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-24 14:55:26 +00:00
kodiakhq[bot] df7b3c3a88
Merge pull request #4678 from influxdata/dom/streaming-compaction
feat: streaming compaction
2022-05-24 14:48:18 +00:00
kodiakhq[bot] 8b1c704a82
Merge branch 'main' into dom/streaming-compaction 2022-05-24 14:42:18 +00:00
Andrew Lamb 52a50c4a14
fix: use large circleci executor for docs job (#4680) 2022-05-24 14:26:49 +00:00
Andrew Lamb 4d8ece5524
feat: Add `Tombstone` to querier cache (#4663)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-24 13:21:23 +00:00
Dom Dwyer 8ff1a73797 revert: fix: compaction deadlock
This reverts commit 00b5c1b296.

This change reverts the StreamSplitExec plan to using bounded, blocking
channels, with the possibility of deadlock added to the docs.

This is now tolerable because of the concurrent consumption of both
output partitions in the compactor.
2022-05-24 14:12:00 +01:00
Dom Dwyer c885b845dc refactor: concurrent StreamSplitExec execution
Changes the compactor to consume both StreamSplitExec output partitions
concurrently.

Practically speaking this means both Parquet files will be generated
concurrently, and uploaded to object store concurrently.
2022-05-24 14:10:46 +01:00
Dom Dwyer 8f05250c96 feat: steaming compaction
This commit changes the Compactor::compact() method to stream the
RecordBatch instances directly to the parquet serialiser, before being
uploaded directly to object storage.
2022-05-24 14:09:10 +01:00
Dom Dwyer 6aa626ef84 refactor: retry object store upload
Changes the Storage::upload() method to endlessly retry uploading the
generated Parquet file.
2022-05-24 11:29:42 +01:00
Luke Bond b76a0080d5
chore: remove unused iox_gitops_adapter (#4675)
* chore: remove unused iox_gitops_adapter

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-05-24 10:28:43 +00:00
Marco Neumann a3dab68f3f
fix: actually log error (#4672)
While logging all the helpful information to replicate failing
querier->ingester requests via CLI, I totally forgot to log the error
message itself.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-24 08:44:35 +00:00
dependabot[bot] ca49820a0f
chore(deps): Bump console-subscriber from 0.1.5 to 0.1.6 (#4670)
Bumps [console-subscriber](https://github.com/tokio-rs/console) from 0.1.5 to 0.1.6.
- [Release notes](https://github.com/tokio-rs/console/releases)
- [Commits](https://github.com/tokio-rs/console/compare/console-subscriber-v0.1.5...console-subscriber-v0.1.6)

---
updated-dependencies:
- dependency-name: console-subscriber
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-24 08:24:12 +00:00
dependabot[bot] 76f7043417
chore(deps): Bump once_cell from 1.11.0 to 1.12.0 (#4666)
Bumps [once_cell](https://github.com/matklad/once_cell) from 1.11.0 to 1.12.0.
- [Release notes](https://github.com/matklad/once_cell/releases)
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md)
- [Commits](https://github.com/matklad/once_cell/compare/v1.11.0...v1.12.0)

---
updated-dependencies:
- dependency-name: once_cell
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-24 08:14:03 +00:00
Andrew Lamb e877a64462
feat: Add `ParquetFiles` cache and memory size estimation for ParquetMetadata (#4661)
* feat: Add `ParquetFiles` cache

* fix: Apply suggestions from code review

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>

* fix: remove commented out debugging println

* refactor: Improve size calculation

* fix: mark `ParquetFileCache::clear` test only

* fix: assert on metric count

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
2022-05-23 17:11:38 +00:00
Dom 5239417925
Merge pull request #4662 from influxdata/dom/meta-remove-row-count
refactor: do not embed row count & min/max timestamps in IOxMetadata
2022-05-23 17:00:19 +01:00
Dom 9cd1286051
Merge branch 'main' into dom/meta-remove-row-count 2022-05-23 16:39:38 +01:00
Marco Neumann 2029bd16ba
feat: enable debugging of failed querier->ingester requests (#4659)
* feat: enable debugging of failed querier->ingester requests

- extend `query-ingester` CLI to allow usage of predicates
- on failed requests: log all information that required for the CLI
- test the "ingester fails" scenario

* test: explain

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* docs: improve

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refactor: move b64 pred. serde into a single crate

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-05-23 15:37:31 +00:00
Dom Dwyer 2e6c49be83 refactor: remove IoxMetadata min & max timestamp
Removes the min/max timestamp fields from the IoxMetadata proto
structure embedded within a Parquet file's metadata.

These values are redundant as they already exist within the Parquet
column statistics, and precluded streaming serialisation as these
removed min/max values were needed before serialising the file.
2022-05-23 16:27:08 +01:00
Dom Dwyer a142a9eb57 refactor: remove row_count from IoxMetadata
Remove the redundant row_count from the IoxMetadata structure that is
serialised into the Parquet file.

The reasoning is twofold:

    * The Parquet file's native metadata already contains a row count
    * Needing to know the number of rows up-front precludes streaming
2022-05-23 16:18:35 +01:00
Dom Dwyer 71555ee55c test: Parquet metadata integration test
Adds two integration tests covering validation of the embedded IOx
metadata within the Parquet file metadata, and validation of the derived
ParquetFileParams metadata used to populate the catalog.
2022-05-23 16:17:56 +01:00
kodiakhq[bot] 1fccee841b
Merge pull request #4649 from influxdata/dom/codec-object-store
perf: streaming RecordBatch -> parquet encoder
2022-05-23 14:45:35 +00:00
Dom f0d0f1ba0c
Merge branch 'main' into dom/codec-object-store 2022-05-23 15:39:54 +01:00
Andrew Lamb a64b2b1d0b
feat: Add `SharedBackend` to cache system (#4652)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-23 14:24:24 +00:00
kodiakhq[bot] d752991a25
Merge pull request #4638 from influxdata/cn/last-available
feat: Add ingester CLI and env option to skip to oldest available WB seq num
2022-05-23 13:14:23 +00:00
kodiakhq[bot] a06746c715
Merge branch 'main' into cn/last-available 2022-05-23 13:08:19 +00:00
Marco Neumann 47347bef9f
test: add query test scenario w/ missing columns in different chunks (#4656)
* test: do NOT filter out query test scenarios w/ unordered stages in different partitions

It should be possible to have two chunks in different partitions where
both are in the ingester stage or the first one is in the parquet stage
and the 2nd one in the ingester stage.

* test: add query test scenario w/ missing columns in different chunks

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-23 12:13:41 +00:00
Dom Dwyer af6d3f4d48 docs: remove clone ref comment 2022-05-23 11:46:06 +01:00
dependabot[bot] 5c033b462e
chore(deps): Bump regex from 1.5.5 to 1.5.6 (#4655)
Bumps [regex](https://github.com/rust-lang/regex) from 1.5.5 to 1.5.6.
- [Release notes](https://github.com/rust-lang/regex/releases)
- [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/regex/compare/1.5.5...1.5.6)

---
updated-dependencies:
- dependency-name: regex
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-23 08:39:01 +00:00