Carol (Nichols || Goulding)
04531e77dd
feat: Implement get on ReadBufferCache
2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding)
25b8260b72
feat: Implement ReadBufferCache::new
2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding)
ab9010d9a6
refactor: Rename QuerierParquetChunk::new_parquet to new
2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding)
df10452e2e
refactor: Rename methods from new_querier_chunk to new_querier_parquet_chunk
2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding)
4a90d0af32
refactor: Remove ChunkStorage enum; inline into QuerierParquetChunk instead
2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding)
b2c62c6808
refactor: Rename QuerierChunk to QuerierParquetChunk
2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding)
66823522f3
docs: Fix comment wrapping while reading through
2022-05-25 17:19:10 -04:00
Nga Tran
6cc767efcc
feat: teach compactor to compact smaller number of files ( #4671 )
...
* refactor: split compact_partition into two functions to handle concurrency better
* feat: limit number of files to compact
* test: add test for limit num files
* chore: fix cipply
* feat: split group if over max size
* fix: split the overlapped group to limit size or file num
* chore: reduce config values
* test: add tests and clearer comments for the split_overlapped_groups and test_limit_size_and_num_files
* chore: more comments
* chore: cleanup
2022-05-25 19:54:34 +00:00
Marco Neumann
31d1b37d73
refactor: de-duplicate low-level arrow code ( #4697 )
...
It seems that during prototyping NG we've copied low level code (w/o
tests!) and never cleaned up. Let's not have this functionality twice.
2022-05-25 16:24:28 +00:00
Marko Mikulicic
9ddb0a816e
fix: Return panic message in internal error ( #4693 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-25 15:11:17 +00:00
kodiakhq[bot]
eb815f2b30
Merge pull request #4664 from influxdata/cn/fix-metric-names
...
fix: Make Kafka Partition related metric names potentially less confusing
2022-05-25 14:33:58 +00:00
Carol (Nichols || Goulding)
788e6eaf69
docs: Fix a comment that was very confused about what means kafka partition
2022-05-25 10:04:40 -04:00
Carol (Nichols || Goulding)
6ce6a38094
fix: Make metric names potentially less confusing
2022-05-25 10:04:39 -04:00
Marco Neumann
a08a91c5ba
fix: ensure querier cache is refreshed for partition sort key ( #4660 )
...
* test: call `maybe_start_logging` in auto-generated cases
* fix: ensure querier cache is refreshed for partition sort key
Fixes #4631 .
* docs: explain querier sort key handling and test
* test: test another version of issue 4631
* fix: correctly invalidate partition sort keys
* fix: fix `table_not_found_on_ingester`
2022-05-25 10:44:42 +00:00
Marko Mikulicic
cdbe546e50
fix: return gRPC error on panic ( #4686 )
2022-05-25 07:06:25 +00:00
dependabot[bot]
24ee251080
chore(deps): Bump prost from 0.10.3 to 0.10.4 ( #4688 )
...
Bumps [prost](https://github.com/tokio-rs/prost ) from 0.10.3 to 0.10.4.
- [Release notes](https://github.com/tokio-rs/prost/releases )
- [Commits](https://github.com/tokio-rs/prost/compare/v0.10.3...v0.10.4 )
---
updated-dependencies:
- dependency-name: prost
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-25 06:07:05 +00:00
dependabot[bot]
a8d3fe619c
chore(deps): Bump prost-build from 0.10.3 to 0.10.4 ( #4687 )
...
Bumps [prost-build](https://github.com/tokio-rs/prost ) from 0.10.3 to 0.10.4.
- [Release notes](https://github.com/tokio-rs/prost/releases )
- [Commits](https://github.com/tokio-rs/prost/compare/v0.10.3...v0.10.4 )
---
updated-dependencies:
- dependency-name: prost-build
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-25 06:00:27 +00:00
Andrew Lamb
935743b525
refactor: Implement `new_querier_chunk` and `new_querier_chunk_from_file_with_metadata` ( #4685 )
2022-05-24 21:58:27 +00:00
Andrew Lamb
a8d5f7f5f7
test: add debug output to test ( #4684 )
2022-05-24 19:57:11 +00:00
Andrew Lamb
95e6a8ed46
chore: Update datafusion (again) ( #4679 )
...
* chore: Update datafusion deps
* fix: fix for changes in ScalarValue
* fix: fix for using TableSource rather than TableProvider
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-24 15:54:39 +00:00
kodiakhq[bot]
8c537c9467
Merge pull request #4682 from influxdata/dom/fix-build
...
build: remove iox_gitops_adapter from build
2022-05-24 15:37:31 +00:00
Dom Dwyer
6b6dbb0286
build: remove iox_gitops_adapter from build
...
Broken release builds since:
https://github.com/influxdata/influxdb_iox/pull/4675
2022-05-24 16:30:19 +01:00
Marco Neumann
9c1ffc2b0d
test: panic handling, add compactor to end to end test harness ( #4677 )
...
* feat: add test gRPC client
* test: start compactor in mini cluster
* test: assert panic handling
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-24 14:55:26 +00:00
kodiakhq[bot]
df7b3c3a88
Merge pull request #4678 from influxdata/dom/streaming-compaction
...
feat: streaming compaction
2022-05-24 14:48:18 +00:00
kodiakhq[bot]
8b1c704a82
Merge branch 'main' into dom/streaming-compaction
2022-05-24 14:42:18 +00:00
Andrew Lamb
52a50c4a14
fix: use large circleci executor for docs job ( #4680 )
2022-05-24 14:26:49 +00:00
Andrew Lamb
4d8ece5524
feat: Add `Tombstone` to querier cache ( #4663 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-24 13:21:23 +00:00
Dom Dwyer
8ff1a73797
revert: fix: compaction deadlock
...
This reverts commit 00b5c1b296
.
This change reverts the StreamSplitExec plan to using bounded, blocking
channels, with the possibility of deadlock added to the docs.
This is now tolerable because of the concurrent consumption of both
output partitions in the compactor.
2022-05-24 14:12:00 +01:00
Dom Dwyer
c885b845dc
refactor: concurrent StreamSplitExec execution
...
Changes the compactor to consume both StreamSplitExec output partitions
concurrently.
Practically speaking this means both Parquet files will be generated
concurrently, and uploaded to object store concurrently.
2022-05-24 14:10:46 +01:00
Dom Dwyer
8f05250c96
feat: steaming compaction
...
This commit changes the Compactor::compact() method to stream the
RecordBatch instances directly to the parquet serialiser, before being
uploaded directly to object storage.
2022-05-24 14:09:10 +01:00
Dom Dwyer
6aa626ef84
refactor: retry object store upload
...
Changes the Storage::upload() method to endlessly retry uploading the
generated Parquet file.
2022-05-24 11:29:42 +01:00
Luke Bond
b76a0080d5
chore: remove unused iox_gitops_adapter ( #4675 )
...
* chore: remove unused iox_gitops_adapter
* chore: Run cargo hakari tasks
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-05-24 10:28:43 +00:00
Marco Neumann
a3dab68f3f
fix: actually log error ( #4672 )
...
While logging all the helpful information to replicate failing
querier->ingester requests via CLI, I totally forgot to log the error
message itself.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-24 08:44:35 +00:00
dependabot[bot]
ca49820a0f
chore(deps): Bump console-subscriber from 0.1.5 to 0.1.6 ( #4670 )
...
Bumps [console-subscriber](https://github.com/tokio-rs/console ) from 0.1.5 to 0.1.6.
- [Release notes](https://github.com/tokio-rs/console/releases )
- [Commits](https://github.com/tokio-rs/console/compare/console-subscriber-v0.1.5...console-subscriber-v0.1.6 )
---
updated-dependencies:
- dependency-name: console-subscriber
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-24 08:24:12 +00:00
dependabot[bot]
76f7043417
chore(deps): Bump once_cell from 1.11.0 to 1.12.0 ( #4666 )
...
Bumps [once_cell](https://github.com/matklad/once_cell ) from 1.11.0 to 1.12.0.
- [Release notes](https://github.com/matklad/once_cell/releases )
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md )
- [Commits](https://github.com/matklad/once_cell/compare/v1.11.0...v1.12.0 )
---
updated-dependencies:
- dependency-name: once_cell
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-24 08:14:03 +00:00
Andrew Lamb
e877a64462
feat: Add `ParquetFiles` cache and memory size estimation for ParquetMetadata ( #4661 )
...
* feat: Add `ParquetFiles` cache
* fix: Apply suggestions from code review
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
* fix: remove commented out debugging println
* refactor: Improve size calculation
* fix: mark `ParquetFileCache::clear` test only
* fix: assert on metric count
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
2022-05-23 17:11:38 +00:00
Dom
5239417925
Merge pull request #4662 from influxdata/dom/meta-remove-row-count
...
refactor: do not embed row count & min/max timestamps in IOxMetadata
2022-05-23 17:00:19 +01:00
Dom
9cd1286051
Merge branch 'main' into dom/meta-remove-row-count
2022-05-23 16:39:38 +01:00
Marco Neumann
2029bd16ba
feat: enable debugging of failed querier->ingester requests ( #4659 )
...
* feat: enable debugging of failed querier->ingester requests
- extend `query-ingester` CLI to allow usage of predicates
- on failed requests: log all information that required for the CLI
- test the "ingester fails" scenario
* test: explain
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* docs: improve
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* refactor: move b64 pred. serde into a single crate
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-05-23 15:37:31 +00:00
Dom Dwyer
2e6c49be83
refactor: remove IoxMetadata min & max timestamp
...
Removes the min/max timestamp fields from the IoxMetadata proto
structure embedded within a Parquet file's metadata.
These values are redundant as they already exist within the Parquet
column statistics, and precluded streaming serialisation as these
removed min/max values were needed before serialising the file.
2022-05-23 16:27:08 +01:00
Dom Dwyer
a142a9eb57
refactor: remove row_count from IoxMetadata
...
Remove the redundant row_count from the IoxMetadata structure that is
serialised into the Parquet file.
The reasoning is twofold:
* The Parquet file's native metadata already contains a row count
* Needing to know the number of rows up-front precludes streaming
2022-05-23 16:18:35 +01:00
Dom Dwyer
71555ee55c
test: Parquet metadata integration test
...
Adds two integration tests covering validation of the embedded IOx
metadata within the Parquet file metadata, and validation of the derived
ParquetFileParams metadata used to populate the catalog.
2022-05-23 16:17:56 +01:00
kodiakhq[bot]
1fccee841b
Merge pull request #4649 from influxdata/dom/codec-object-store
...
perf: streaming RecordBatch -> parquet encoder
2022-05-23 14:45:35 +00:00
Dom
f0d0f1ba0c
Merge branch 'main' into dom/codec-object-store
2022-05-23 15:39:54 +01:00
Andrew Lamb
a64b2b1d0b
feat: Add `SharedBackend` to cache system ( #4652 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-23 14:24:24 +00:00
kodiakhq[bot]
d752991a25
Merge pull request #4638 from influxdata/cn/last-available
...
feat: Add ingester CLI and env option to skip to oldest available WB seq num
2022-05-23 13:14:23 +00:00
kodiakhq[bot]
a06746c715
Merge branch 'main' into cn/last-available
2022-05-23 13:08:19 +00:00
Marco Neumann
47347bef9f
test: add query test scenario w/ missing columns in different chunks ( #4656 )
...
* test: do NOT filter out query test scenarios w/ unordered stages in different partitions
It should be possible to have two chunks in different partitions where
both are in the ingester stage or the first one is in the parquet stage
and the 2nd one in the ingester stage.
* test: add query test scenario w/ missing columns in different chunks
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-23 12:13:41 +00:00
Dom Dwyer
af6d3f4d48
docs: remove clone ref comment
2022-05-23 11:46:06 +01:00
dependabot[bot]
5c033b462e
chore(deps): Bump regex from 1.5.5 to 1.5.6 ( #4655 )
...
Bumps [regex](https://github.com/rust-lang/regex ) from 1.5.5 to 1.5.6.
- [Release notes](https://github.com/rust-lang/regex/releases )
- [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md )
- [Commits](https://github.com/rust-lang/regex/compare/1.5.5...1.5.6 )
---
updated-dependencies:
- dependency-name: regex
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-23 08:39:01 +00:00