Commit Graph

8085 Commits (2d08478e1be4520ef648b68875974756a21c3dca)

Author SHA1 Message Date
kodiakhq[bot] 2d08478e1b
Merge branch 'main' into dom/consistent-sort-key 2022-05-31 16:15:08 +00:00
Marco Neumann 988bd38e93
refactor: remove unused code (#4742) 2022-05-31 11:36:02 +00:00
Marco Neumann 5a95da7327
refactor: do NOT use ANY file IO for parquet reading (#4741) 2022-05-31 11:18:24 +00:00
Marco Neumann 2bf03e57bf
feat: limit tmp parquet file count and size (#4737)
Fixes #4736.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-31 08:56:15 +00:00
dependabot[bot] 01fc550034
chore(deps): Bump parking_lot from 0.12.0 to 0.12.1 (#4732)
Bumps [parking_lot](https://github.com/Amanieu/parking_lot) from 0.12.0 to 0.12.1.
- [Release notes](https://github.com/Amanieu/parking_lot/releases)
- [Changelog](https://github.com/Amanieu/parking_lot/blob/master/CHANGELOG.md)
- [Commits](https://github.com/Amanieu/parking_lot/compare/0.12.0...0.12.1)

---
updated-dependencies:
- dependency-name: parking_lot
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-31 08:01:58 +00:00
dependabot[bot] 5b5f0efef5
chore(deps): Bump rayon from 1.5.2 to 1.5.3 (#4731)
Bumps [rayon](https://github.com/rayon-rs/rayon) from 1.5.2 to 1.5.3.
- [Release notes](https://github.com/rayon-rs/rayon/releases)
- [Changelog](https://github.com/rayon-rs/rayon/blob/master/RELEASES.md)
- [Commits](https://github.com/rayon-rs/rayon/compare/v1.5.2...v1.5.3)

---
updated-dependencies:
- dependency-name: rayon
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-31 07:50:42 +00:00
dependabot[bot] 642aef103d
chore(deps): Bump comfy-table from 5.0.1 to 6.0.0 (#4730)
Bumps [comfy-table](https://github.com/nukesor/comfy-table) from 5.0.1 to 6.0.0.
- [Release notes](https://github.com/nukesor/comfy-table/releases)
- [Changelog](https://github.com/Nukesor/comfy-table/blob/main/CHANGELOG.md)
- [Commits](https://github.com/nukesor/comfy-table/compare/v5.0.1...v6.0.0)

---
updated-dependencies:
- dependency-name: comfy-table
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-31 07:10:37 +00:00
Dom Dwyer 70864b9f48 refactor: always use correct chunk sort key
Don't use the same sort key for all files - sort keys may grow over
time, and the information is already at hand.
2022-05-30 17:41:41 +01:00
Dom Dwyer 6aa2a6958a refactor: assert consistent parquet file metadata
Assert consistent metadata when evaluating candidate parquet files for
compaction.

Asserts all files have the same:
    * Sequencer ID
    * Namespace ID
    * Table ID
    * Partition ID
    * Sort key
2022-05-30 17:41:41 +01:00
Dom Dwyer 0f16d6cabb refactor: consistent SortKey source
Changes the compaction logic to always reference the same SortKey
instance, rather than repeatedly querying for it.

The Partition metadata is always read from the catalog as part of
compact_partition(), where it previously threw away all metadata except
the sort key, which was passed into compact(). Then compact() would
always re-query the catalog to look up just the sort key again, and mix
up the two instances during use - one passed into the fn, one freshly
queried within the fn.

Now the Partition metadata is resolved in compact_partition() as it was
previously, but the entire Partition reference is passed to compact(),
and this is consistently used do access the sort key. This also removes
a catalog query per compaction call.
2022-05-30 17:41:41 +01:00
Marco Neumann 79c054ffc9
fix: do NOT block in parquet file IO (#4727)
* fix: do NOT block in parquet file IO

I think for historical reason we were using blocking IO to read parquet
files. With the current streaming `SendableRecordStream` approach this
is technically NOT required anymore.

Now one might think that the sync-async dance that we did is kinda
harmless, but looking at our producition querier I think it is really
bad. The querier seems to be stuck but looking at `strace` and other
health signal it seems it is not entirely dead. Looking at GDB
backtraces it seems that nearly all threads are busy in
`download_and_scan_parquet`. Looking at the tokio docs
(<https://docs.rs/tokio/1.18.2/tokio/task/fn.spawn_blocking.html>)
for `spawn_blocking` (which is used to start the sync download) this
makes sense: tokio only starts replacement threads for the current
runtime thread (which calls `spawn_blocking`) if this does NOT exceed the
runtime thread limit. However we set the runtime thread limit to the
number of CPU cores available to IOx, so this is a limiting factor. This
means that there are only a few threads left to do actual work (I've
seen postgres data flowing back and forth for example) but tokio is not
able to use its full potential anymore. This is esp. bad because the
sync code in `download_and_scan_parquet` then uses `futures` `block_on`
functionality to call back into async code, so it waits for tokio
itself.

The change is rather simple: just use async task spawns.

* fix: use async IO to write stream to temp file

* fix: do not block tokio thread during parquet file reading

* refactor: ensure parquet IO tasks are cancelled if they are not needed anymore

There is no REAL way to cancel sync tasks, but at least we can try our
best.
2022-05-30 13:32:20 +00:00
Andrew Lamb d0903b11bb
refactor: reduce test duplication in `querier/src/table/mod.rs` (#4698)
* refactor: reduce test duplication in `querier/src/table/mod.rs`

* fix: Apply suggestions from code review

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>

* fix: Update querier/src/table/test_util.rs

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>

* fix: use now_nanos()

* refactor: Add TestQuerierTable

* refactor: rename functions for explicitness

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
2022-05-30 12:56:09 +00:00
Paul Dix 6af32b7750
feat: add concurrency limit for ingester queries (#4703)
I've defaulted it to 20, we can adjust as needed.

Closes #4657

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-30 10:22:17 +00:00
dependabot[bot] 73168f7989
chore(deps): Bump flate2 from 1.0.23 to 1.0.24 (#4726)
Bumps [flate2](https://github.com/rust-lang/flate2-rs) from 1.0.23 to 1.0.24.
- [Release notes](https://github.com/rust-lang/flate2-rs/releases)
- [Commits](https://github.com/rust-lang/flate2-rs/commits)

---
updated-dependencies:
- dependency-name: flate2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-30 08:26:12 +00:00
dependabot[bot] 29069be7d4
chore(deps): Bump hyper from 0.14.18 to 0.14.19 (#4725)
Bumps [hyper](https://github.com/hyperium/hyper) from 0.14.18 to 0.14.19.
- [Release notes](https://github.com/hyperium/hyper/releases)
- [Changelog](https://github.com/hyperium/hyper/blob/master/CHANGELOG.md)
- [Commits](https://github.com/hyperium/hyper/compare/v0.14.18...v0.14.19)

---
updated-dependencies:
- dependency-name: hyper
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-30 07:58:12 +00:00
dependabot[bot] 7d4670e171
chore(deps): Bump indexmap from 1.8.1 to 1.8.2 (#4724)
Bumps [indexmap](https://github.com/bluss/indexmap) from 1.8.1 to 1.8.2.
- [Release notes](https://github.com/bluss/indexmap/releases)
- [Changelog](https://github.com/bluss/indexmap/blob/1.8.2/RELEASES.rst)
- [Commits](https://github.com/bluss/indexmap/compare/1.8.1...1.8.2)

---
updated-dependencies:
- dependency-name: indexmap
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-30 07:47:05 +00:00
Andrew Lamb cddd6d9b6d
chore: Update datafusion (#4723)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-28 19:00:54 +00:00
Carol (Nichols || Goulding) b52a3586a7
fix: Turn cargo doc warnings into errors (#4710)
* fix: Correct intra-doc links

* fix: Turn cargo doc warnings into errors

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-28 11:24:22 +00:00
Andrew Lamb 9f21512296
chore: reduce `debug!` log spew in `parquet_file` (#4718)
* chore: reduce log spew

* chore: trace another overly verbose message

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-27 20:57:10 +00:00
kodiakhq[bot] 0a84727c72
Merge pull request #4709 from influxdata/cn/fetch-from-parquet-file
feat: Make a QuerierRBChunk wrapper that implement QueryChunk and QueryChunkMeta
2022-05-27 17:14:05 +00:00
kodiakhq[bot] 842ef8e308
Merge branch 'main' into cn/fetch-from-parquet-file 2022-05-27 17:08:28 +00:00
Carol (Nichols || Goulding) 55cd8d15be
fix: Update method name to specify the kind of chunk it makes 2022-05-27 13:04:24 -04:00
Carol (Nichols || Goulding) f0b4d71f47
docs: Update comment to reflect new implementation 2022-05-27 13:04:24 -04:00
Carol (Nichols || Goulding) 5232594aab
docs: Fix grammar in a comment
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-05-27 13:04:13 -04:00
Nga Tran 16e7a6d596
test: test that hits panic becasue of no column meta data (#4719)
* test: test that hits panic becasue of no column meta data

* chore: Apply suggestions from code review

* chore: run format after applying changes

* chore: Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* chore: run clippy

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-05-27 15:27:03 +00:00
Andrew Lamb dde3c3922c
refactor: use consistent spelling of serialize (#4717) 2022-05-27 14:42:59 +00:00
Nga Tran ea81152fac
refactor: add partition ID into debug info and panic earlier to identify the bug easier (#4716)
* chore: point tests to the new ticket

* chore: cleanup

* refactor: add partition ID into debug info and panic earlier to identify the bug easier
2022-05-27 12:20:36 +00:00
Nga Tran 09b55a209d
chore: point tests to the new ticket (#4715)
* chore: point tests to the new ticket

* chore: cleanup
2022-05-27 11:12:55 +00:00
Nga Tran 372b262f37
test: parquet meta decoded tests and more debug info (#4713)
* test: reproducer for 4695

* chore: some debug info

* test: test with many columns and rows

* chore: cleanup and add debug info

* chore: cleanup

* chore: cleanup

* chore: more debug info
2022-05-27 09:53:07 +00:00
Andrew Lamb 700a1de8f3
fix: fix at least one intermittent failure (#4711) 2022-05-26 21:24:37 +00:00
Carol (Nichols || Goulding) 2cb351cd0d
feat: Make a QuerierRBChunk wrapper to handle traits and extra data
This brings back a bunch of code from OG from read buffer backed
DbChunks.
2022-05-26 16:52:14 -04:00
Carol (Nichols || Goulding) b2905650aa
refactor: Extract extract_range to be a method on TableSummary
So that other kinds of chunks can use this code too.
2022-05-26 16:52:14 -04:00
Carol (Nichols || Goulding) 5fd3ffc17f
refactor: Rename ParquetChunkAdapter to only ChunkAdapter
It might be creating chunks of different kinds other than ParquetChunks.
2022-05-26 16:52:14 -04:00
Andrew Lamb 633117e595
feat: avoid catalog access on each query (#4650)
* feat: cache catalog access on query

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2022-05-26 20:44:22 +00:00
Nga Tran 05151d5c69
test: reproducer for 4695 (#4706)
* test: reproducer for 4695

* chore: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-26 15:32:30 +00:00
kodiakhq[bot] f645ec8a42
Merge pull request #4704 from influxdata/cn/welcome-back-read-buffer
feat: Start of a read buffer chunk cache
2022-05-26 13:53:29 +00:00
kodiakhq[bot] 1043c98e17
Merge branch 'main' into cn/welcome-back-read-buffer 2022-05-26 13:47:27 +00:00
Andrew Lamb 2d5a327bf4
fix: expire empty parquet_files cache and empty tombstones cache (#4701)
* fix: expire empty parquet_files cache

* fix: expire empty tombstones cache

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-26 11:08:15 +00:00
Carol (Nichols || Goulding) cddcca1e05
feat: Implement a method to get a read buffer chunk from a stream of record batches 2022-05-25 17:24:35 -04:00
Carol (Nichols || Goulding) f7bc551d9a
feat: Sketch out skeleton methods for RBChunk cache 2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding) 04531e77dd
feat: Implement get on ReadBufferCache 2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding) 25b8260b72
feat: Implement ReadBufferCache::new 2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding) ab9010d9a6
refactor: Rename QuerierParquetChunk::new_parquet to new 2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding) df10452e2e
refactor: Rename methods from new_querier_chunk to new_querier_parquet_chunk 2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding) 4a90d0af32
refactor: Remove ChunkStorage enum; inline into QuerierParquetChunk instead 2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding) b2c62c6808
refactor: Rename QuerierChunk to QuerierParquetChunk 2022-05-25 17:19:10 -04:00
Carol (Nichols || Goulding) 66823522f3
docs: Fix comment wrapping while reading through 2022-05-25 17:19:10 -04:00
Nga Tran 6cc767efcc
feat: teach compactor to compact smaller number of files (#4671)
* refactor: split compact_partition into two functions to handle concurrency better

* feat: limit number of files to compact

* test: add test for limit num files

* chore: fix cipply

* feat: split group if over max size

* fix: split the overlapped group to limit size or file num

* chore: reduce config values

* test: add tests and clearer comments for the split_overlapped_groups and test_limit_size_and_num_files

* chore: more comments

* chore: cleanup
2022-05-25 19:54:34 +00:00
Marco Neumann 31d1b37d73
refactor: de-duplicate low-level arrow code (#4697)
It seems that during prototyping NG we've copied low level code (w/o
tests!) and never cleaned up. Let's not have this functionality twice.
2022-05-25 16:24:28 +00:00
Marko Mikulicic 9ddb0a816e
fix: Return panic message in internal error (#4693)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-25 15:11:17 +00:00