influxdb

Commit Graph

Author	SHA1	Message	Date
Carol (Nichols \|\| Goulding)	9328ba8c45	feat: Use new extra loading info to load read buffer chunks into cache	2022-06-02 09:22:44 -04:00
Carol (Nichols \|\| Goulding)	054c25de50	refactor: Add more methods to DecodedParquetFile I'm tired of trying to remember which info is on which metadata.	2022-06-02 09:22:44 -04:00
Marco Neumann	9e30a3eb29	refactor: rework querier concurrency limiting (#4760 ) * refactor: rework querier concurrency limiting With #4752 we introduced a concurrency limit into the querier. It works by drawing permits from a central semaphore whenever we create a `QuerierNamespace`. This however only limits concurrency during query planning and not query execution, because the objects contained within the plan (chunks and some metadata) neither reference the permit nor the `QuerierNamespace`. Now one approach to fix that would be to wire up the permit all the down into all the query-related data structures. This however is very fiddly and potentially will get lost at some point, because as soon as we transform these data structures -- e.g. into streams -- the permit might get lost again. This will be potentially query-dependent and very hard to debug. So instead we reverse the approach and track the permits at the upper layer of the stack: the gRPC service entry points. There we also need to be careful -- e.g. when we return streams to tonic -- but it's way easier to review that then the deeply nested object hierarchy that is involved with queries. Also the separation of concerns is a bit clearer, because why would a "chunk" care about the "query concurrency" as a whole. * refactor: improve gRPC permit keeping and prepare tests	2022-06-02 09:49:58 +00:00
Andrew Lamb	1472ec272f	refactor: consolidate duplicate testing logic (#4708 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-01 20:02:13 +00:00
Andrew Lamb	a37c553545	refactor: Split up rpc_predicate module a bit (#4763 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-01 19:56:11 +00:00
Andrew Lamb	7328cc6a9a	docs: Update readme (#4765 ) * docs: Update readme * fix: Update README.md Co-authored-by: Nga Tran <nga-tran@live.com> Co-authored-by: Nga Tran <nga-tran@live.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-01 19:50:06 +00:00
kodiakhq[bot]	b714269b13	Merge pull request #4754 from influxdata/cn/extra-cache-system feat: Add an Extra type to Cacher Loader to specify extra information…	2022-06-01 18:11:46 +00:00
kodiakhq[bot]	7ad3a50dd4	Merge branch 'main' into cn/extra-cache-system	2022-06-01 18:06:09 +00:00
kodiakhq[bot]	f3fb040294	Merge pull request #4756 from influxdata/dom/fix-e2e-db-fire test(e2e): do not mangle prod database	2022-06-01 16:38:34 +00:00
kodiakhq[bot]	51114a0a56	Merge branch 'main' into dom/fix-e2e-db-fire	2022-06-01 16:32:41 +00:00
Dom Dwyer	1caeb04869	test(e2e): do not mangle prod database Unset the all env vars for the following CLI e2e tests: * default_mode_is_run_all_in_one * default_run_mode_is_all_in_one This prevents them from executing against the "prod" catalog, running migrations and inserting values to the prod database specified in the prod DSN env (INFLUXDB_IOX_CATALOG_DSN).	2022-06-01 17:12:12 +01:00
kodiakhq[bot]	5a52954d0a	Merge pull request #4759 from influxdata/dom/ignored-metadata-test refactor: always panic for empty parquet files	2022-06-01 16:11:29 +00:00
kodiakhq[bot]	69da424a41	Merge branch 'main' into dom/ignored-metadata-test	2022-06-01 16:05:43 +00:00
Andrew Lamb	257aaa7e7b	fix: Support `_field != <name>` predicates (#4721 ) * fix: Support `_field != <name>` predicates * fix: update test * fix: add negative test * fix: improve comments * refactor: make `add_include` and `add_exclude` infallible * chore: add type annotations Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-01 16:04:53 +00:00
Dom Dwyer	f8b83c5085	test: assert panic behaviour Modifies the existing test added as part of #4695 to ensure a panic is emitted when serialising an empty parquet file.	2022-06-01 16:55:53 +01:00
Dom Dwyer	0bfc11f4a1	refactor: always panic for empty parquet files Moves the panic into the child call to_parquet() so all code paths are covered (i.e. not serialising into memory via to_parquet_bytes()).	2022-06-01 16:54:36 +01:00
kodiakhq[bot]	507e153c5a	Merge pull request #4699 from influxdata/dom/silly-config refactor: warn for silly object store configs	2022-06-01 15:48:10 +00:00
Dom Dwyer	60de97ac26	test(e2e): ensure "partition pull" writes files Adds a test case covering the "remote partition pull" command configured with file-based object storage.	2022-06-01 16:41:57 +01:00
Dom Dwyer	6d647fb7a9	refactor: warn for silly object store configs Warn when downloading files to an in-memory object store. The "remote partition pull" command downloads parquet files from an object store via a router, and saves them locally. It's pretty unlikely the user intends to download those files to memory of the CLI process which then exits when the pull is complete, throwing away the downloaded files, but this is the default.	2022-06-01 16:41:57 +01:00
kodiakhq[bot]	1ca58b2b70	Merge pull request #4757 from influxdata/dom/use-constructors-plz refactor: constructor for ParquetFileWithTombstone	2022-06-01 15:34:02 +00:00
Dom Dwyer	9ae58c89b6	refactor: constructor for ParquetFileWithTombstone Use a constructor to initialise a ParquetFileWithTombstone struct, rather than making the fields pub. This allows IDEs to "go to" places where this is constructed when browsing the code, but also keeps the type closed for modification of internals (SOLID).	2022-06-01 15:58:06 +01:00
Nga Tran	f0e477fcee	chore: let aggressively increase compactor job size and concurrency level (#4747 ) * chore: let aggressively increase compactor job size and concurrency level * chore: Apply suggestions from code review Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-01 14:32:36 +00:00
Carol (Nichols \|\| Goulding)	39fc19e946	test: Exercise the Extra type in the cache system tests	2022-06-01 09:19:52 -04:00
Carol (Nichols \|\| Goulding)	37347f2389	feat: Add an Extra type to Cacher Loader to specify extra information for loading entries	2022-06-01 08:58:19 -04:00
Andrew Lamb	2886149afc	chore: naming / comment cleanups from namespace semaphore (#4753 )	2022-06-01 12:46:38 +00:00
Marco Neumann	446d94487d	feat: add tooling to instrument async semaphores (#4751 ) * feat: add tooling to instrument async semaphores Ref #4739. * test: improve `test_permits_acquired_and_holders_acquired` Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-01 12:19:23 +00:00
Marco Neumann	ebeccf037c	feat: limit querier concurrency by limiting number of active namespaces (#4752 ) This is a rather quick fix for prod. On the mid-term we probably wanna rethink our deployment strategy, e.g. by using "one query per pod" and by deploying queryd w/ IOx into the same pod.	2022-06-01 11:59:35 +00:00
dependabot[bot]	e638385782	chore(deps): Bump pbjson-types from 0.3.1 to 0.3.2 (#4750 ) Bumps [pbjson-types](https://github.com/influxdata/pbjson) from 0.3.1 to 0.3.2. - [Release notes](https://github.com/influxdata/pbjson/releases) - [Commits](https://github.com/influxdata/pbjson/compare/0.3.1...0.3.2) --- updated-dependencies: - dependency-name: pbjson-types dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-01 09:02:37 +00:00
dependabot[bot]	21ec05a6ee	chore(deps): Bump pbjson from 0.3.1 to 0.3.2 (#4749 ) Bumps [pbjson](https://github.com/influxdata/pbjson) from 0.3.1 to 0.3.2. - [Release notes](https://github.com/influxdata/pbjson/releases) - [Commits](https://github.com/influxdata/pbjson/compare/0.3.1...0.3.2) --- updated-dependencies: - dependency-name: pbjson dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-01 08:12:46 +00:00
dependabot[bot]	043dee43c8	chore(deps): Bump pbjson-build from 0.3.1 to 0.3.2 (#4748 ) Bumps [pbjson-build](https://github.com/influxdata/pbjson) from 0.3.1 to 0.3.2. - [Release notes](https://github.com/influxdata/pbjson/releases) - [Commits](https://github.com/influxdata/pbjson/compare/0.3.1...0.3.2) --- updated-dependencies: - dependency-name: pbjson-build dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-01 07:54:13 +00:00
Marco Neumann	c91dbe062e	test: "optimize" ingesterrecord batches in query tests (#4700 ) * test: "optimize" ingesterrecord batches in query tests It seems that I had the right idea in #4656 but wasn't able to trigger https://github.com/influxdata/conductor/issues/955 because the query tests do not "optimize" the record batches in the same way the actual gRPC implementation does. If we apply the same transformation we indeed end up with the same error. * fix: all batches within the ingester flight response must have same schema * refactor: simplify and reuse code Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-01 07:37:11 +00:00
Nga Tran	79220720be	chore: increase size of a compactor job and level of concurrency (#4746 ) * fix: let us not compact no-data * fix: split time must be greater min_time, too * fix: resolve merge conflict * chore: increase size of a compactor job and level of concurrency Co-authored-by: Dom <dom@itsallbroken.com>	2022-05-31 19:57:06 +00:00
Nga Tran	dfd35c05a1	fix: let us not compact no-data (#4744 ) * fix: let us not compact no-data * fix: split time must be greater min_time, too * fix: resolve merge conflict Co-authored-by: Dom <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-31 17:02:14 +00:00
kodiakhq[bot]	51fd20c769	Merge pull request #4745 from influxdata/dom/metadata-test-with-schema test: fix test_metadata_from_parquet_metadata	2022-05-31 16:42:46 +00:00
Dom Dwyer	5d74ae2ac1	test: fix test_metadata_from_parquet_metadata Changes the test_metadata_from_parquet_metadata test to embed the IOx metadata before asserting it can be read back.	2022-05-31 17:34:04 +01:00
kodiakhq[bot]	f7c198ebb5	Merge pull request #4729 from influxdata/dom/consistent-sort-key refactor(compactor): consistent sort key	2022-05-31 16:24:45 +00:00
kodiakhq[bot]	2d08478e1b	Merge branch 'main' into dom/consistent-sort-key	2022-05-31 16:15:08 +00:00
Marco Neumann	988bd38e93	refactor: remove unused code (#4742 )	2022-05-31 11:36:02 +00:00
Marco Neumann	5a95da7327	refactor: do NOT use ANY file IO for parquet reading (#4741 )	2022-05-31 11:18:24 +00:00
Marco Neumann	2bf03e57bf	feat: limit tmp parquet file count and size (#4737 ) Fixes #4736. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-31 08:56:15 +00:00
dependabot[bot]	01fc550034	chore(deps): Bump parking_lot from 0.12.0 to 0.12.1 (#4732 ) Bumps [parking_lot](https://github.com/Amanieu/parking_lot) from 0.12.0 to 0.12.1. - [Release notes](https://github.com/Amanieu/parking_lot/releases) - [Changelog](https://github.com/Amanieu/parking_lot/blob/master/CHANGELOG.md) - [Commits](https://github.com/Amanieu/parking_lot/compare/0.12.0...0.12.1) --- updated-dependencies: - dependency-name: parking_lot dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-31 08:01:58 +00:00
dependabot[bot]	5b5f0efef5	chore(deps): Bump rayon from 1.5.2 to 1.5.3 (#4731 ) Bumps [rayon](https://github.com/rayon-rs/rayon) from 1.5.2 to 1.5.3. - [Release notes](https://github.com/rayon-rs/rayon/releases) - [Changelog](https://github.com/rayon-rs/rayon/blob/master/RELEASES.md) - [Commits](https://github.com/rayon-rs/rayon/compare/v1.5.2...v1.5.3) --- updated-dependencies: - dependency-name: rayon dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-31 07:50:42 +00:00
dependabot[bot]	642aef103d	chore(deps): Bump comfy-table from 5.0.1 to 6.0.0 (#4730 ) Bumps [comfy-table](https://github.com/nukesor/comfy-table) from 5.0.1 to 6.0.0. - [Release notes](https://github.com/nukesor/comfy-table/releases) - [Changelog](https://github.com/Nukesor/comfy-table/blob/main/CHANGELOG.md) - [Commits](https://github.com/nukesor/comfy-table/compare/v5.0.1...v6.0.0) --- updated-dependencies: - dependency-name: comfy-table dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-31 07:10:37 +00:00
Dom Dwyer	70864b9f48	refactor: always use correct chunk sort key Don't use the same sort key for all files - sort keys may grow over time, and the information is already at hand.	2022-05-30 17:41:41 +01:00
Dom Dwyer	6aa2a6958a	refactor: assert consistent parquet file metadata Assert consistent metadata when evaluating candidate parquet files for compaction. Asserts all files have the same: * Sequencer ID * Namespace ID * Table ID * Partition ID * Sort key	2022-05-30 17:41:41 +01:00
Dom Dwyer	0f16d6cabb	refactor: consistent SortKey source Changes the compaction logic to always reference the same SortKey instance, rather than repeatedly querying for it. The Partition metadata is always read from the catalog as part of compact_partition(), where it previously threw away all metadata except the sort key, which was passed into compact(). Then compact() would always re-query the catalog to look up just the sort key again, and mix up the two instances during use - one passed into the fn, one freshly queried within the fn. Now the Partition metadata is resolved in compact_partition() as it was previously, but the entire Partition reference is passed to compact(), and this is consistently used do access the sort key. This also removes a catalog query per compaction call.	2022-05-30 17:41:41 +01:00
Marco Neumann	79c054ffc9	fix: do NOT block in parquet file IO (#4727 ) * fix: do NOT block in parquet file IO I think for historical reason we were using blocking IO to read parquet files. With the current streaming `SendableRecordStream` approach this is technically NOT required anymore. Now one might think that the sync-async dance that we did is kinda harmless, but looking at our producition querier I think it is really bad. The querier seems to be stuck but looking at `strace` and other health signal it seems it is not entirely dead. Looking at GDB backtraces it seems that nearly all threads are busy in `download_and_scan_parquet`. Looking at the tokio docs (<https://docs.rs/tokio/1.18.2/tokio/task/fn.spawn_blocking.html>) for `spawn_blocking` (which is used to start the sync download) this makes sense: tokio only starts replacement threads for the current runtime thread (which calls `spawn_blocking`) if this does NOT exceed the runtime thread limit. However we set the runtime thread limit to the number of CPU cores available to IOx, so this is a limiting factor. This means that there are only a few threads left to do actual work (I've seen postgres data flowing back and forth for example) but tokio is not able to use its full potential anymore. This is esp. bad because the sync code in `download_and_scan_parquet` then uses `futures` `block_on` functionality to call back into async code, so it waits for tokio itself. The change is rather simple: just use async task spawns. * fix: use async IO to write stream to temp file * fix: do not block tokio thread during parquet file reading * refactor: ensure parquet IO tasks are cancelled if they are not needed anymore There is no REAL way to cancel sync tasks, but at least we can try our best.	2022-05-30 13:32:20 +00:00
Andrew Lamb	d0903b11bb	refactor: reduce test duplication in `querier/src/table/mod.rs` (#4698 ) * refactor: reduce test duplication in `querier/src/table/mod.rs` * fix: Apply suggestions from code review Co-authored-by: Jake Goulding <jake.goulding@integer32.com> * fix: Update querier/src/table/test_util.rs Co-authored-by: Jake Goulding <jake.goulding@integer32.com> * fix: use now_nanos() * refactor: Add TestQuerierTable * refactor: rename functions for explicitness Co-authored-by: Jake Goulding <jake.goulding@integer32.com>	2022-05-30 12:56:09 +00:00
Paul Dix	6af32b7750	feat: add concurrency limit for ingester queries (#4703 ) I've defaulted it to 20, we can adjust as needed. Closes #4657 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-30 10:22:17 +00:00
dependabot[bot]	73168f7989	chore(deps): Bump flate2 from 1.0.23 to 1.0.24 (#4726 ) Bumps [flate2](https://github.com/rust-lang/flate2-rs) from 1.0.23 to 1.0.24. - [Release notes](https://github.com/rust-lang/flate2-rs/releases) - [Commits](https://github.com/rust-lang/flate2-rs/commits) --- updated-dependencies: - dependency-name: flate2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-30 08:26:12 +00:00

1 2 3 4 5 ...

8121 Commits (9328ba8c4508b8a25529b24b438f7d76a78288df) All Branches Search

8121 Commits (9328ba8c4508b8a25529b24b438f7d76a78288df)

All Branches