kodiakhq[bot]
2d08478e1b
Merge branch 'main' into dom/consistent-sort-key
2022-05-31 16:15:08 +00:00
Marco Neumann
988bd38e93
refactor: remove unused code ( #4742 )
2022-05-31 11:36:02 +00:00
Marco Neumann
5a95da7327
refactor: do NOT use ANY file IO for parquet reading ( #4741 )
2022-05-31 11:18:24 +00:00
Marco Neumann
2bf03e57bf
feat: limit tmp parquet file count and size ( #4737 )
...
Fixes #4736 .
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-31 08:56:15 +00:00
dependabot[bot]
01fc550034
chore(deps): Bump parking_lot from 0.12.0 to 0.12.1 ( #4732 )
...
Bumps [parking_lot](https://github.com/Amanieu/parking_lot ) from 0.12.0 to 0.12.1.
- [Release notes](https://github.com/Amanieu/parking_lot/releases )
- [Changelog](https://github.com/Amanieu/parking_lot/blob/master/CHANGELOG.md )
- [Commits](https://github.com/Amanieu/parking_lot/compare/0.12.0...0.12.1 )
---
updated-dependencies:
- dependency-name: parking_lot
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-31 08:01:58 +00:00
dependabot[bot]
5b5f0efef5
chore(deps): Bump rayon from 1.5.2 to 1.5.3 ( #4731 )
...
Bumps [rayon](https://github.com/rayon-rs/rayon ) from 1.5.2 to 1.5.3.
- [Release notes](https://github.com/rayon-rs/rayon/releases )
- [Changelog](https://github.com/rayon-rs/rayon/blob/master/RELEASES.md )
- [Commits](https://github.com/rayon-rs/rayon/compare/v1.5.2...v1.5.3 )
---
updated-dependencies:
- dependency-name: rayon
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-31 07:50:42 +00:00
dependabot[bot]
642aef103d
chore(deps): Bump comfy-table from 5.0.1 to 6.0.0 ( #4730 )
...
Bumps [comfy-table](https://github.com/nukesor/comfy-table ) from 5.0.1 to 6.0.0.
- [Release notes](https://github.com/nukesor/comfy-table/releases )
- [Changelog](https://github.com/Nukesor/comfy-table/blob/main/CHANGELOG.md )
- [Commits](https://github.com/nukesor/comfy-table/compare/v5.0.1...v6.0.0 )
---
updated-dependencies:
- dependency-name: comfy-table
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-31 07:10:37 +00:00
Dom Dwyer
70864b9f48
refactor: always use correct chunk sort key
...
Don't use the same sort key for all files - sort keys may grow over
time, and the information is already at hand.
2022-05-30 17:41:41 +01:00
Dom Dwyer
6aa2a6958a
refactor: assert consistent parquet file metadata
...
Assert consistent metadata when evaluating candidate parquet files for
compaction.
Asserts all files have the same:
* Sequencer ID
* Namespace ID
* Table ID
* Partition ID
* Sort key
2022-05-30 17:41:41 +01:00
Dom Dwyer
0f16d6cabb
refactor: consistent SortKey source
...
Changes the compaction logic to always reference the same SortKey
instance, rather than repeatedly querying for it.
The Partition metadata is always read from the catalog as part of
compact_partition(), where it previously threw away all metadata except
the sort key, which was passed into compact(). Then compact() would
always re-query the catalog to look up just the sort key again, and mix
up the two instances during use - one passed into the fn, one freshly
queried within the fn.
Now the Partition metadata is resolved in compact_partition() as it was
previously, but the entire Partition reference is passed to compact(),
and this is consistently used do access the sort key. This also removes
a catalog query per compaction call.
2022-05-30 17:41:41 +01:00
Marco Neumann
79c054ffc9
fix: do NOT block in parquet file IO ( #4727 )
...
* fix: do NOT block in parquet file IO
I think for historical reason we were using blocking IO to read parquet
files. With the current streaming `SendableRecordStream` approach this
is technically NOT required anymore.
Now one might think that the sync-async dance that we did is kinda
harmless, but looking at our producition querier I think it is really
bad. The querier seems to be stuck but looking at `strace` and other
health signal it seems it is not entirely dead. Looking at GDB
backtraces it seems that nearly all threads are busy in
`download_and_scan_parquet`. Looking at the tokio docs
(<https://docs.rs/tokio/1.18.2/tokio/task/fn.spawn_blocking.html >)
for `spawn_blocking` (which is used to start the sync download) this
makes sense: tokio only starts replacement threads for the current
runtime thread (which calls `spawn_blocking`) if this does NOT exceed the
runtime thread limit. However we set the runtime thread limit to the
number of CPU cores available to IOx, so this is a limiting factor. This
means that there are only a few threads left to do actual work (I've
seen postgres data flowing back and forth for example) but tokio is not
able to use its full potential anymore. This is esp. bad because the
sync code in `download_and_scan_parquet` then uses `futures` `block_on`
functionality to call back into async code, so it waits for tokio
itself.
The change is rather simple: just use async task spawns.
* fix: use async IO to write stream to temp file
* fix: do not block tokio thread during parquet file reading
* refactor: ensure parquet IO tasks are cancelled if they are not needed anymore
There is no REAL way to cancel sync tasks, but at least we can try our
best.
2022-05-30 13:32:20 +00:00
Andrew Lamb
d0903b11bb
refactor: reduce test duplication in `querier/src/table/mod.rs` ( #4698 )
...
* refactor: reduce test duplication in `querier/src/table/mod.rs`
* fix: Apply suggestions from code review
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
* fix: Update querier/src/table/test_util.rs
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
* fix: use now_nanos()
* refactor: Add TestQuerierTable
* refactor: rename functions for explicitness
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
2022-05-30 12:56:09 +00:00
Paul Dix
6af32b7750
feat: add concurrency limit for ingester queries ( #4703 )
...
I've defaulted it to 20, we can adjust as needed.
Closes #4657
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-30 10:22:17 +00:00
dependabot[bot]
73168f7989
chore(deps): Bump flate2 from 1.0.23 to 1.0.24 ( #4726 )
...
Bumps [flate2](https://github.com/rust-lang/flate2-rs ) from 1.0.23 to 1.0.24.
- [Release notes](https://github.com/rust-lang/flate2-rs/releases )
- [Commits](https://github.com/rust-lang/flate2-rs/commits )
---
updated-dependencies:
- dependency-name: flate2
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-30 08:26:12 +00:00
dependabot[bot]
29069be7d4
chore(deps): Bump hyper from 0.14.18 to 0.14.19 ( #4725 )
...
Bumps [hyper](https://github.com/hyperium/hyper ) from 0.14.18 to 0.14.19.
- [Release notes](https://github.com/hyperium/hyper/releases )
- [Changelog](https://github.com/hyperium/hyper/blob/master/CHANGELOG.md )
- [Commits](https://github.com/hyperium/hyper/compare/v0.14.18...v0.14.19 )
---
updated-dependencies:
- dependency-name: hyper
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-30 07:58:12 +00:00
dependabot[bot]
7d4670e171
chore(deps): Bump indexmap from 1.8.1 to 1.8.2 ( #4724 )
...
Bumps [indexmap](https://github.com/bluss/indexmap ) from 1.8.1 to 1.8.2.
- [Release notes](https://github.com/bluss/indexmap/releases )
- [Changelog](https://github.com/bluss/indexmap/blob/1.8.2/RELEASES.rst )
- [Commits](https://github.com/bluss/indexmap/compare/1.8.1...1.8.2 )
---
updated-dependencies:
- dependency-name: indexmap
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-30 07:47:05 +00:00
Andrew Lamb
cddd6d9b6d
chore: Update datafusion ( #4723 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-28 19:00:54 +00:00
Carol (Nichols || Goulding)
b52a3586a7
fix: Turn cargo doc warnings into errors ( #4710 )
...
* fix: Correct intra-doc links
* fix: Turn cargo doc warnings into errors
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-28 11:24:22 +00:00
Andrew Lamb
9f21512296
chore: reduce `debug!` log spew in `parquet_file` ( #4718 )
...
* chore: reduce log spew
* chore: trace another overly verbose message
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-27 20:57:10 +00:00
kodiakhq[bot]
0a84727c72
Merge pull request #4709 from influxdata/cn/fetch-from-parquet-file
...
feat: Make a QuerierRBChunk wrapper that implement QueryChunk and QueryChunkMeta
2022-05-27 17:14:05 +00:00
kodiakhq[bot]
842ef8e308
Merge branch 'main' into cn/fetch-from-parquet-file
2022-05-27 17:08:28 +00:00
Carol (Nichols || Goulding)
55cd8d15be
fix: Update method name to specify the kind of chunk it makes
2022-05-27 13:04:24 -04:00
Carol (Nichols || Goulding)
f0b4d71f47
docs: Update comment to reflect new implementation
2022-05-27 13:04:24 -04:00
Carol (Nichols || Goulding)
5232594aab
docs: Fix grammar in a comment
...
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-05-27 13:04:13 -04:00
Nga Tran
16e7a6d596
test: test that hits panic becasue of no column meta data ( #4719 )
...
* test: test that hits panic becasue of no column meta data
* chore: Apply suggestions from code review
* chore: run format after applying changes
* chore: Apply suggestions from code review
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* chore: run clippy
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-05-27 15:27:03 +00:00
Andrew Lamb
dde3c3922c
refactor: use consistent spelling of serialize ( #4717 )
2022-05-27 14:42:59 +00:00
Nga Tran
ea81152fac
refactor: add partition ID into debug info and panic earlier to identify the bug easier ( #4716 )
...
* chore: point tests to the new ticket
* chore: cleanup
* refactor: add partition ID into debug info and panic earlier to identify the bug easier
2022-05-27 12:20:36 +00:00
Nga Tran
09b55a209d
chore: point tests to the new ticket ( #4715 )
...
* chore: point tests to the new ticket
* chore: cleanup
2022-05-27 11:12:55 +00:00
Nga Tran
372b262f37
test: parquet meta decoded tests and more debug info ( #4713 )
...
* test: reproducer for 4695
* chore: some debug info
* test: test with many columns and rows
* chore: cleanup and add debug info
* chore: cleanup
* chore: cleanup
* chore: more debug info
2022-05-27 09:53:07 +00:00
Andrew Lamb
700a1de8f3
fix: fix at least one intermittent failure ( #4711 )
2022-05-26 21:24:37 +00:00
Carol (Nichols || Goulding)
2cb351cd0d
feat: Make a QuerierRBChunk wrapper to handle traits and extra data
...
This brings back a bunch of code from OG from read buffer backed
DbChunks.
2022-05-26 16:52:14 -04:00
Carol (Nichols || Goulding)
b2905650aa
refactor: Extract extract_range to be a method on TableSummary
...
So that other kinds of chunks can use this code too.
2022-05-26 16:52:14 -04:00
Carol (Nichols || Goulding)
5fd3ffc17f
refactor: Rename ParquetChunkAdapter to only ChunkAdapter
...
It might be creating chunks of different kinds other than ParquetChunks.
2022-05-26 16:52:14 -04:00
Andrew Lamb
633117e595
feat: avoid catalog access on each query ( #4650 )
...
* feat: cache catalog access on query
* fix: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2022-05-26 20:44:22 +00:00
Nga Tran
05151d5c69
test: reproducer for 4695 ( #4706 )
...
* test: reproducer for 4695
* chore: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-26 15:32:30 +00:00
kodiakhq[bot]
f645ec8a42
Merge pull request #4704 from influxdata/cn/welcome-back-read-buffer
...
feat: Start of a read buffer chunk cache
2022-05-26 13:53:29 +00:00
kodiakhq[bot]
1043c98e17
Merge branch 'main' into cn/welcome-back-read-buffer
2022-05-26 13:47:27 +00:00
Andrew Lamb
2d5a327bf4
fix: expire empty parquet_files cache and empty tombstones cache ( #4701 )
...
* fix: expire empty parquet_files cache
* fix: expire empty tombstones cache
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-26 11:08:15 +00:00
Carol (Nichols || Goulding)
cddcca1e05
feat: Implement a method to get a read buffer chunk from a stream of record batches
2022-05-25 17:24:35 -04:00
Carol (Nichols || Goulding)
f7bc551d9a
feat: Sketch out skeleton methods for RBChunk cache
2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding)
04531e77dd
feat: Implement get on ReadBufferCache
2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding)
25b8260b72
feat: Implement ReadBufferCache::new
2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding)
ab9010d9a6
refactor: Rename QuerierParquetChunk::new_parquet to new
2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding)
df10452e2e
refactor: Rename methods from new_querier_chunk to new_querier_parquet_chunk
2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding)
4a90d0af32
refactor: Remove ChunkStorage enum; inline into QuerierParquetChunk instead
2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding)
b2c62c6808
refactor: Rename QuerierChunk to QuerierParquetChunk
2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding)
66823522f3
docs: Fix comment wrapping while reading through
2022-05-25 17:19:10 -04:00
Nga Tran
6cc767efcc
feat: teach compactor to compact smaller number of files ( #4671 )
...
* refactor: split compact_partition into two functions to handle concurrency better
* feat: limit number of files to compact
* test: add test for limit num files
* chore: fix cipply
* feat: split group if over max size
* fix: split the overlapped group to limit size or file num
* chore: reduce config values
* test: add tests and clearer comments for the split_overlapped_groups and test_limit_size_and_num_files
* chore: more comments
* chore: cleanup
2022-05-25 19:54:34 +00:00
Marco Neumann
31d1b37d73
refactor: de-duplicate low-level arrow code ( #4697 )
...
It seems that during prototyping NG we've copied low level code (w/o
tests!) and never cleaned up. Let's not have this functionality twice.
2022-05-25 16:24:28 +00:00
Marko Mikulicic
9ddb0a816e
fix: Return panic message in internal error ( #4693 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-25 15:11:17 +00:00