influxdb

Commit Graph

Author	SHA1	Message	Date
Dom Dwyer	0bfc11f4a1	refactor: always panic for empty parquet files Moves the panic into the child call to_parquet() so all code paths are covered (i.e. not serialising into memory via to_parquet_bytes()).	2022-06-01 16:54:36 +01:00
kodiakhq[bot]	507e153c5a	Merge pull request #4699 from influxdata/dom/silly-config refactor: warn for silly object store configs	2022-06-01 15:48:10 +00:00
Dom Dwyer	60de97ac26	test(e2e): ensure "partition pull" writes files Adds a test case covering the "remote partition pull" command configured with file-based object storage.	2022-06-01 16:41:57 +01:00
Dom Dwyer	6d647fb7a9	refactor: warn for silly object store configs Warn when downloading files to an in-memory object store. The "remote partition pull" command downloads parquet files from an object store via a router, and saves them locally. It's pretty unlikely the user intends to download those files to memory of the CLI process which then exits when the pull is complete, throwing away the downloaded files, but this is the default.	2022-06-01 16:41:57 +01:00
kodiakhq[bot]	1ca58b2b70	Merge pull request #4757 from influxdata/dom/use-constructors-plz refactor: constructor for ParquetFileWithTombstone	2022-06-01 15:34:02 +00:00
Dom Dwyer	9ae58c89b6	refactor: constructor for ParquetFileWithTombstone Use a constructor to initialise a ParquetFileWithTombstone struct, rather than making the fields pub. This allows IDEs to "go to" places where this is constructed when browsing the code, but also keeps the type closed for modification of internals (SOLID).	2022-06-01 15:58:06 +01:00
Nga Tran	f0e477fcee	chore: let aggressively increase compactor job size and concurrency level (#4747 ) * chore: let aggressively increase compactor job size and concurrency level * chore: Apply suggestions from code review Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-01 14:32:36 +00:00
Andrew Lamb	2886149afc	chore: naming / comment cleanups from namespace semaphore (#4753 )	2022-06-01 12:46:38 +00:00
Marco Neumann	446d94487d	feat: add tooling to instrument async semaphores (#4751 ) * feat: add tooling to instrument async semaphores Ref #4739. * test: improve `test_permits_acquired_and_holders_acquired` Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-01 12:19:23 +00:00
Marco Neumann	ebeccf037c	feat: limit querier concurrency by limiting number of active namespaces (#4752 ) This is a rather quick fix for prod. On the mid-term we probably wanna rethink our deployment strategy, e.g. by using "one query per pod" and by deploying queryd w/ IOx into the same pod.	2022-06-01 11:59:35 +00:00
dependabot[bot]	e638385782	chore(deps): Bump pbjson-types from 0.3.1 to 0.3.2 (#4750 ) Bumps [pbjson-types](https://github.com/influxdata/pbjson) from 0.3.1 to 0.3.2. - [Release notes](https://github.com/influxdata/pbjson/releases) - [Commits](https://github.com/influxdata/pbjson/compare/0.3.1...0.3.2) --- updated-dependencies: - dependency-name: pbjson-types dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-01 09:02:37 +00:00
dependabot[bot]	21ec05a6ee	chore(deps): Bump pbjson from 0.3.1 to 0.3.2 (#4749 ) Bumps [pbjson](https://github.com/influxdata/pbjson) from 0.3.1 to 0.3.2. - [Release notes](https://github.com/influxdata/pbjson/releases) - [Commits](https://github.com/influxdata/pbjson/compare/0.3.1...0.3.2) --- updated-dependencies: - dependency-name: pbjson dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-01 08:12:46 +00:00
dependabot[bot]	043dee43c8	chore(deps): Bump pbjson-build from 0.3.1 to 0.3.2 (#4748 ) Bumps [pbjson-build](https://github.com/influxdata/pbjson) from 0.3.1 to 0.3.2. - [Release notes](https://github.com/influxdata/pbjson/releases) - [Commits](https://github.com/influxdata/pbjson/compare/0.3.1...0.3.2) --- updated-dependencies: - dependency-name: pbjson-build dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-01 07:54:13 +00:00
Marco Neumann	c91dbe062e	test: "optimize" ingesterrecord batches in query tests (#4700 ) * test: "optimize" ingesterrecord batches in query tests It seems that I had the right idea in #4656 but wasn't able to trigger https://github.com/influxdata/conductor/issues/955 because the query tests do not "optimize" the record batches in the same way the actual gRPC implementation does. If we apply the same transformation we indeed end up with the same error. * fix: all batches within the ingester flight response must have same schema * refactor: simplify and reuse code Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-01 07:37:11 +00:00
Nga Tran	79220720be	chore: increase size of a compactor job and level of concurrency (#4746 ) * fix: let us not compact no-data * fix: split time must be greater min_time, too * fix: resolve merge conflict * chore: increase size of a compactor job and level of concurrency Co-authored-by: Dom <dom@itsallbroken.com>	2022-05-31 19:57:06 +00:00
Nga Tran	dfd35c05a1	fix: let us not compact no-data (#4744 ) * fix: let us not compact no-data * fix: split time must be greater min_time, too * fix: resolve merge conflict Co-authored-by: Dom <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-31 17:02:14 +00:00
kodiakhq[bot]	51fd20c769	Merge pull request #4745 from influxdata/dom/metadata-test-with-schema test: fix test_metadata_from_parquet_metadata	2022-05-31 16:42:46 +00:00
Dom Dwyer	5d74ae2ac1	test: fix test_metadata_from_parquet_metadata Changes the test_metadata_from_parquet_metadata test to embed the IOx metadata before asserting it can be read back.	2022-05-31 17:34:04 +01:00
kodiakhq[bot]	f7c198ebb5	Merge pull request #4729 from influxdata/dom/consistent-sort-key refactor(compactor): consistent sort key	2022-05-31 16:24:45 +00:00
kodiakhq[bot]	2d08478e1b	Merge branch 'main' into dom/consistent-sort-key	2022-05-31 16:15:08 +00:00
Marco Neumann	988bd38e93	refactor: remove unused code (#4742 )	2022-05-31 11:36:02 +00:00
Marco Neumann	5a95da7327	refactor: do NOT use ANY file IO for parquet reading (#4741 )	2022-05-31 11:18:24 +00:00
Marco Neumann	2bf03e57bf	feat: limit tmp parquet file count and size (#4737 ) Fixes #4736. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-31 08:56:15 +00:00
dependabot[bot]	01fc550034	chore(deps): Bump parking_lot from 0.12.0 to 0.12.1 (#4732 ) Bumps [parking_lot](https://github.com/Amanieu/parking_lot) from 0.12.0 to 0.12.1. - [Release notes](https://github.com/Amanieu/parking_lot/releases) - [Changelog](https://github.com/Amanieu/parking_lot/blob/master/CHANGELOG.md) - [Commits](https://github.com/Amanieu/parking_lot/compare/0.12.0...0.12.1) --- updated-dependencies: - dependency-name: parking_lot dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-31 08:01:58 +00:00
dependabot[bot]	5b5f0efef5	chore(deps): Bump rayon from 1.5.2 to 1.5.3 (#4731 ) Bumps [rayon](https://github.com/rayon-rs/rayon) from 1.5.2 to 1.5.3. - [Release notes](https://github.com/rayon-rs/rayon/releases) - [Changelog](https://github.com/rayon-rs/rayon/blob/master/RELEASES.md) - [Commits](https://github.com/rayon-rs/rayon/compare/v1.5.2...v1.5.3) --- updated-dependencies: - dependency-name: rayon dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-31 07:50:42 +00:00
dependabot[bot]	642aef103d	chore(deps): Bump comfy-table from 5.0.1 to 6.0.0 (#4730 ) Bumps [comfy-table](https://github.com/nukesor/comfy-table) from 5.0.1 to 6.0.0. - [Release notes](https://github.com/nukesor/comfy-table/releases) - [Changelog](https://github.com/Nukesor/comfy-table/blob/main/CHANGELOG.md) - [Commits](https://github.com/nukesor/comfy-table/compare/v5.0.1...v6.0.0) --- updated-dependencies: - dependency-name: comfy-table dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-31 07:10:37 +00:00
Dom Dwyer	70864b9f48	refactor: always use correct chunk sort key Don't use the same sort key for all files - sort keys may grow over time, and the information is already at hand.	2022-05-30 17:41:41 +01:00
Dom Dwyer	6aa2a6958a	refactor: assert consistent parquet file metadata Assert consistent metadata when evaluating candidate parquet files for compaction. Asserts all files have the same: * Sequencer ID * Namespace ID * Table ID * Partition ID * Sort key	2022-05-30 17:41:41 +01:00
Dom Dwyer	0f16d6cabb	refactor: consistent SortKey source Changes the compaction logic to always reference the same SortKey instance, rather than repeatedly querying for it. The Partition metadata is always read from the catalog as part of compact_partition(), where it previously threw away all metadata except the sort key, which was passed into compact(). Then compact() would always re-query the catalog to look up just the sort key again, and mix up the two instances during use - one passed into the fn, one freshly queried within the fn. Now the Partition metadata is resolved in compact_partition() as it was previously, but the entire Partition reference is passed to compact(), and this is consistently used do access the sort key. This also removes a catalog query per compaction call.	2022-05-30 17:41:41 +01:00
Marco Neumann	79c054ffc9	fix: do NOT block in parquet file IO (#4727 ) * fix: do NOT block in parquet file IO I think for historical reason we were using blocking IO to read parquet files. With the current streaming `SendableRecordStream` approach this is technically NOT required anymore. Now one might think that the sync-async dance that we did is kinda harmless, but looking at our producition querier I think it is really bad. The querier seems to be stuck but looking at `strace` and other health signal it seems it is not entirely dead. Looking at GDB backtraces it seems that nearly all threads are busy in `download_and_scan_parquet`. Looking at the tokio docs (<https://docs.rs/tokio/1.18.2/tokio/task/fn.spawn_blocking.html>) for `spawn_blocking` (which is used to start the sync download) this makes sense: tokio only starts replacement threads for the current runtime thread (which calls `spawn_blocking`) if this does NOT exceed the runtime thread limit. However we set the runtime thread limit to the number of CPU cores available to IOx, so this is a limiting factor. This means that there are only a few threads left to do actual work (I've seen postgres data flowing back and forth for example) but tokio is not able to use its full potential anymore. This is esp. bad because the sync code in `download_and_scan_parquet` then uses `futures` `block_on` functionality to call back into async code, so it waits for tokio itself. The change is rather simple: just use async task spawns. * fix: use async IO to write stream to temp file * fix: do not block tokio thread during parquet file reading * refactor: ensure parquet IO tasks are cancelled if they are not needed anymore There is no REAL way to cancel sync tasks, but at least we can try our best.	2022-05-30 13:32:20 +00:00
Andrew Lamb	d0903b11bb	refactor: reduce test duplication in `querier/src/table/mod.rs` (#4698 ) * refactor: reduce test duplication in `querier/src/table/mod.rs` * fix: Apply suggestions from code review Co-authored-by: Jake Goulding <jake.goulding@integer32.com> * fix: Update querier/src/table/test_util.rs Co-authored-by: Jake Goulding <jake.goulding@integer32.com> * fix: use now_nanos() * refactor: Add TestQuerierTable * refactor: rename functions for explicitness Co-authored-by: Jake Goulding <jake.goulding@integer32.com>	2022-05-30 12:56:09 +00:00
Paul Dix	6af32b7750	feat: add concurrency limit for ingester queries (#4703 ) I've defaulted it to 20, we can adjust as needed. Closes #4657 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-30 10:22:17 +00:00
dependabot[bot]	73168f7989	chore(deps): Bump flate2 from 1.0.23 to 1.0.24 (#4726 ) Bumps [flate2](https://github.com/rust-lang/flate2-rs) from 1.0.23 to 1.0.24. - [Release notes](https://github.com/rust-lang/flate2-rs/releases) - [Commits](https://github.com/rust-lang/flate2-rs/commits) --- updated-dependencies: - dependency-name: flate2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-30 08:26:12 +00:00
dependabot[bot]	29069be7d4	chore(deps): Bump hyper from 0.14.18 to 0.14.19 (#4725 ) Bumps [hyper](https://github.com/hyperium/hyper) from 0.14.18 to 0.14.19. - [Release notes](https://github.com/hyperium/hyper/releases) - [Changelog](https://github.com/hyperium/hyper/blob/master/CHANGELOG.md) - [Commits](https://github.com/hyperium/hyper/compare/v0.14.18...v0.14.19) --- updated-dependencies: - dependency-name: hyper dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-30 07:58:12 +00:00
dependabot[bot]	7d4670e171	chore(deps): Bump indexmap from 1.8.1 to 1.8.2 (#4724 ) Bumps [indexmap](https://github.com/bluss/indexmap) from 1.8.1 to 1.8.2. - [Release notes](https://github.com/bluss/indexmap/releases) - [Changelog](https://github.com/bluss/indexmap/blob/1.8.2/RELEASES.rst) - [Commits](https://github.com/bluss/indexmap/compare/1.8.1...1.8.2) --- updated-dependencies: - dependency-name: indexmap dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-30 07:47:05 +00:00
Andrew Lamb	cddd6d9b6d	chore: Update datafusion (#4723 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-28 19:00:54 +00:00
Carol (Nichols \|\| Goulding)	b52a3586a7	fix: Turn cargo doc warnings into errors (#4710 ) * fix: Correct intra-doc links * fix: Turn cargo doc warnings into errors Co-authored-by: Jake Goulding <jake.goulding@integer32.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-28 11:24:22 +00:00
Andrew Lamb	9f21512296	chore: reduce `debug!` log spew in `parquet_file` (#4718 ) * chore: reduce log spew * chore: trace another overly verbose message Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-27 20:57:10 +00:00
kodiakhq[bot]	0a84727c72	Merge pull request #4709 from influxdata/cn/fetch-from-parquet-file feat: Make a QuerierRBChunk wrapper that implement QueryChunk and QueryChunkMeta	2022-05-27 17:14:05 +00:00
kodiakhq[bot]	842ef8e308	Merge branch 'main' into cn/fetch-from-parquet-file	2022-05-27 17:08:28 +00:00
Carol (Nichols \|\| Goulding)	55cd8d15be	fix: Update method name to specify the kind of chunk it makes	2022-05-27 13:04:24 -04:00
Carol (Nichols \|\| Goulding)	f0b4d71f47	docs: Update comment to reflect new implementation	2022-05-27 13:04:24 -04:00
Carol (Nichols \|\| Goulding)	5232594aab	docs: Fix grammar in a comment Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-05-27 13:04:13 -04:00
Nga Tran	16e7a6d596	test: test that hits panic becasue of no column meta data (#4719 ) * test: test that hits panic becasue of no column meta data * chore: Apply suggestions from code review * chore: run format after applying changes * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * chore: run clippy Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-05-27 15:27:03 +00:00
Andrew Lamb	dde3c3922c	refactor: use consistent spelling of serialize (#4717 )	2022-05-27 14:42:59 +00:00
Nga Tran	ea81152fac	refactor: add partition ID into debug info and panic earlier to identify the bug easier (#4716 ) * chore: point tests to the new ticket * chore: cleanup * refactor: add partition ID into debug info and panic earlier to identify the bug easier	2022-05-27 12:20:36 +00:00
Nga Tran	09b55a209d	chore: point tests to the new ticket (#4715 ) * chore: point tests to the new ticket * chore: cleanup	2022-05-27 11:12:55 +00:00
Nga Tran	372b262f37	test: parquet meta decoded tests and more debug info (#4713 ) * test: reproducer for 4695 * chore: some debug info * test: test with many columns and rows * chore: cleanup and add debug info * chore: cleanup * chore: cleanup * chore: more debug info	2022-05-27 09:53:07 +00:00
Andrew Lamb	700a1de8f3	fix: fix at least one intermittent failure (#4711 )	2022-05-26 21:24:37 +00:00
Carol (Nichols \|\| Goulding)	2cb351cd0d	feat: Make a QuerierRBChunk wrapper to handle traits and extra data This brings back a bunch of code from OG from read buffer backed DbChunks.	2022-05-26 16:52:14 -04:00

1 2 3 4 5 ...

8104 Commits (0bfc11f4a17da4aa97263fbdd28ce35062c10806) All Branches Search

8104 Commits (0bfc11f4a17da4aa97263fbdd28ce35062c10806)

All Branches