influxdb

Commit Graph

Author	SHA1	Message	Date
Marco Neumann	45b3984aa3	refactor: simplify `QueryChunk` data access (#6015 ) * refactor: simplify `QueryChunk` data access We have only two types for chunks (now that the RUB is gone): 1. In-memory RecordBatches 2. Parquet files Loads of logic is duplicated in the different `read_filter` implementations. Also `read_filter` hides a solid amount of logic from DataFusion, which will prevent certain (future) optimizations. To enable #5897 and to simplify the interface, let the chunks return the data (batches or metadata for parquet files) directly and let `iox_query` perform the actual heavy-lifting. * docs: improve Co-authored-by: Andrew Lamb <alamb@influxdata.com> * docs: improve Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-02 08:18:33 +00:00
Andrew Lamb	9c1f0a3644	refactor: move SessionConfig creation into datafusion_utils (#6011 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-31 20:04:49 +00:00
Marco Neumann	072439e428	refactor: mandatory `QueryChunkMeta::summary` (#5997 ) With #5963 merged, all chunks now provide a summary (even though it may not contain data for all columns). So let's make it mandatory, which also removes a few 🙈-style `.except(...)` calls. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-31 16:38:02 +00:00
Carol (Nichols \|\| Goulding)	dad1ad1318	feat: Add the catalog service to ingester, querier, and compactor So that `remote get` that uses the catalog service can work no matter what kind of server you contact.	2022-10-28 10:49:26 -04:00
Carol (Nichols \|\| Goulding)	53445af25d	chore: Alphabetize some dependencies I can't handle not knowing where to look for a dependency or knowing where to add a new dependency.	2022-10-28 10:34:25 -04:00
Marco Neumann	8447d46093	refactor: remove `QueryChunkMeta::timestamp_min_max` (#5963 ) Use the table summary instead. This allows us to have a single mechanism that both IOx and DataFusion understand. This basically lifts the "basic table summary" mechanism that the querier uses to `iox_query` and let the compactor and ingester use the same mechanism. While not strictly necessary, simplifying the `QueryChunk[Meta]` interface helps with #5897. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-28 10:29:16 +00:00
Carol (Nichols \|\| Goulding)	3145e2c05b	feat: Use workspace dep inheritance for the arrow crate	2022-10-26 10:34:29 -04:00
Carol (Nichols \|\| Goulding)	44936f661a	feat: Use workspace dep inheritance for datafusion instead of shim crate	2022-10-26 10:33:56 -04:00
Marco Neumann	9b48437711	refactor: make influx column type mandatory (#5978 ) We basically assume everywhere that a column falls into one of the three known categories (time, tag, field), so lets encode this in our type system instead of defining "unknown" as "undefined behavior, may or may not crash". Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-26 11:20:29 +00:00
Carol (Nichols \|\| Goulding)	2e83e04eab	feat: Use workspace package metadata to reduce differences and repetition	2022-10-24 13:04:09 -04:00
Marco Neumann	3e4db81bc6	refactor: make `SchemaBuilder::field` fallible It would be nice if the IOx data type would not be optional and this is a prep clean-up to achieve that.	2022-10-24 18:12:42 +02:00
Marco Neumann	e0062f2d40	refactor: do NOT use fake DF context for parquet reading (#5942 ) Use the proper top-level DataFusion context and register the object store there. Note that we still hide the `ParquetExec` behind an opaque record batch stream. Fixing that is next on my list. Helps with #5897. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-24 08:20:26 +00:00
Carol (Nichols \|\| Goulding)	712cfc3f38	fix: Use span rather than child_span	2022-10-20 09:14:28 -04:00
Carol (Nichols \|\| Goulding)	59e1c1d5b9	feat: Pass trace id through Flight requests from querier to ingester Fixes #5723.	2022-10-20 08:55:30 -04:00
Marco Neumann	21e8fcad25	feat: rework cache refresh logic (#5886 ) * feat: rework cache refresh logic Instead of issuing a single refresh when a GET request for a cached key comes in, start a background job (using some efficient logic to not overload tokio) per key that refreshes the key using some exponential backoff. The timer is reset a new GET request comes in. This has the following advantages: - our backoff logic decorrelates the requests - the longer a key was not used, the less often it will be updated All test (esp. integration tests) as adjusted accordingly, mostly to account for the fact that no extra GET is required to start the refresh timer. Closes #5720. * docs: improve Co-authored-by: Andrew Lamb <alamb@influxdata.com> * refactor: simplify rng overwrite Co-authored-by: Andrew Lamb <alamb@influxdata.com>	2022-10-19 16:01:39 +00:00
Andrew Lamb	cd88e72f88	fix: reduce verbosity from `INFO querier::ingester: Time spent in ...` to `DEBUG` (#5913 ) * fix: reduce verbosity from `INFO querier::ingester: Time spent in ingester` * fix: clippy	2022-10-19 15:09:28 +00:00
Marco Neumann	eb5a661ab3	refactor: prep work for #5897 (#5907 ) * refactor: add ID to `ParquetStorage` * refactor: remove duplicate code * refactor: use dedicated `StorageId`	2022-10-19 11:54:42 +00:00
dependabot[bot]	b5574c07b7	chore(deps): Bump async-trait from 0.1.57 to 0.1.58 (#5904 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.57 to 0.1.58. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.57...0.1.58) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-10-19 09:40:26 +00:00
Marco Neumann	e1b50227f8	refactor: avoid some clones while caching ns schema (#5896 ) Found while reviewing the code. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-19 06:28:15 +00:00
Andrew Lamb	d706f8221d	chore: Update datafusion and arrow / parquet / arrow-flight 25.0.0 (#5900 ) * chore: Update datafusion and `arrow` / `parquet` / `arrow-flight` 25.0.0 * chore: Update for structure changes * chore: Update for new projection pushdown * chore: Run cargo hakari tasks * fix: fmt Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-18 20:58:47 +00:00
Marco Neumann	9310d26b92	refactor: remove querier dual chunk stage (#5890 )	2022-10-18 12:38:30 +00:00
Marco Neumann	d89aae88eb	refactor: remove querier read buffer cache (#5889 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-18 10:55:04 +00:00
Marco Neumann	819dbe9e0c	refactor: remove querier chunk load settings (#5888 ) We no longer use dual-state ReadBuffer/Parquet chunks.	2022-10-18 10:22:46 +00:00
Andrew Lamb	6f931411f3	feat: read from parquet and only parquet (#5879 ) * feat: query only from parquet * Revert "feat: query only from parquet" This reverts commit 5ce3c3449c0b9c90154c8c6ece4a40a9c083b7ba. * Revert "revert: disable read buffer usage in querier (#5579) (#5603)" This reverts commit `df5ef875b4`. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-18 10:09:48 +00:00
Andrew Lamb	9134ccd6c3	chore: Update datafusion again (#5855 ) * chore: Update datafusion * chore: Updates for changes in datafusion * chore: more updates * fix: update doc example Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-13 19:18:57 +00:00
Andrew Lamb	d57c99638c	chore: Update datafusion + `arrow`, `arrow-flight`, and `parquet` to 24.0.0.0 (#5792 ) * chore: Update datafusion + `arrow`, `arrow-flight`, and `parquet` to 24.0.0.0 * fix: Update for coercion, fix explain plans for change in column name display * chore: Update datafusion lock * fix: Update for other API changes * chore: Update to latest datafusion pin * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-12 16:19:14 +00:00
Dom Dwyer	c4f542bbe2	refactor(ingester): remove tombstone support This commit removes tombstone support from the ingester, and deletes associated code/helpers/tests. This commit does NOT remove tombstone support from any other service, but MAY include removing overlapping test coverage. This also removes the tombstone support from the Ingester -> Querier RPC response message. This has the nice side effect of removing a whole lot of thread spawning in the ingester tests for the Executor, speeding everything up!	2022-10-11 13:10:04 +02:00
dependabot[bot]	933493fab3	chore(deps): Bump object_store from 0.5.0 to 0.5.1 Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.0 to 0.5.1. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md) - [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.0...object_store_0.5.1) --- updated-dependencies: - dependency-name: object_store dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2022-10-11 01:19:10 +00:00
Nga Tran	95ed41f140	feat: Projection pushdown for querier -> ingester for rpc queries (#5782 ) * feat: initial step to identify where the projection should be provided * feat: start getting columns of all expressions * chore: format * test: test for the table_chunk_stream * fix: fix a compile error. Thanks @alamb * test: full tests for table_chunk_stream * chore: cleanup * fix: do not cut any columns in case all fields are needed * test: add one more test case of reading all columns * refactor: move code that identify columbs ot push down to a function. Add the use of field_columns * chore: cleanup * refactor: make sream_from_batch support empty batches * chore: cleanup * chore: fix clippy after auto merge Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-06 17:21:23 +00:00
Marco Neumann	c4c83e0840	fix: query error propagation (#5801 ) - treat OOM protection as "resource exhausted" - use `DataFusionError` in more places instead of opaque `Box<dyn Error>` - improve conversion from/into `DataFusionError` to preserve more semantics Overall, this improves our error handling. DF can now return errors like "resource exhausted" and gRPC should now automatically generate a sensible status code for it. Fixes #5799.	2022-10-06 08:54:01 +00:00
Dom Dwyer	cd4087e00d	style: add no todo!() or dbg!() lints Some crates had theme, some not - lets be consistent and have the compiler spot dbg!() and todo!() macro calls - they should never be in prod code!	2022-09-29 13:10:07 +02:00
Andrew Lamb	66dbb9541f	chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to 23.0.0, `thrift` to 0.16.0 (#5694 ) * chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to 23.0.0 * chore: Update thrift / remove parquet_format * fix: Update APIs * chore: Update lock + Run cargo hakari tasks * fix: use patched version of arrow-rs to work around https://github.com/apache/arrow-rs/issues/2779 * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-27 12:50:54 +00:00
Nga Tran	84b10b28b2	feat: send only needed projection columns from querier to ingester in… (#5678 ) * feat: send only needed projection columns from querier to ingester in case of normal SQL queries * refactor: push column index down until we need to convert them strings * fix: make the test deterministic * test: test for the projection pushdown * test: add asserts for the proj pushdown test * test: implement projection pushdown for partitions of MockIngesterConnection * chore: cleanup * chore: address review comments * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: address review comments Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-26 17:19:20 +00:00
Carol (Nichols \|\| Goulding)	c8108f01e7	chore: Upgrade to Rust 1.64 (#5727 ) * chore: Upgrade to Rust 1.64 * fix: Use iter find instead of a for loop, thanks clippy * fix: Remove some needless borrows, thanks clippy * fix: Use then_some rather than then with a closure, thanks clippy * fix: Use iter retain rather than filter collect, thanks clippy Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-22 18:04:00 +00:00
Marco Neumann	365a246f8d	refactor: do not run de-dup in ingester for querier requests (#5626 ) * refactor: do not run de-dup in ingester for querier requests This removes the entire de-dup logic from the inegster for querier requests. Furthermore, it even removes the entire datafusion execution from the querier and just dumps the in-memory record batches as quickly as possible. No filters are applied. Note that even prior to this PR, we've never applied projections (tracked by #5624). Pros: - speed up query planning within the querier (since we need the ingester response for state reconciling) - lowered ingester CPU load Cons: - more querier<>ingester network traffic Closes #5602. * test: extend query test case * fix: ingester tests Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-22 07:33:54 +00:00
Marco Neumann	fd45fbc9ab	refactor: use cheaper hash keys for projected schemas (#5713 ) * refactor: arc the cached table * refactor: use cheaper hash keys for projected schemas Instead of using the column names to address projected schemas, let's use the column IDs. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-22 05:31:02 +00:00
Marco Neumann	c66f16e4af	fix: ingester retries (#5708 ) * fix: retry ingester requests faster The retries introduced in #5695 are too slow and block the entire querier for minutes (until the very long gRPC timeout kicks in). * fix: add error details on why the query planning failed Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-21 09:27:47 +00:00
Marco Neumann	5e7fd55a42	refactor: retry querier->ingester requests (#5695 ) * refactor: retry querier->ingester requests Esp. for InfluxRPC requests that scan multiple tables, it may be that one ingester requests fails. We shall retry that request instead of failing the entire query. * refactor: improve docs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * fix: less foo * docs: remove outdated TODO * test: assert that panic happened Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-09-20 15:51:02 +00:00
Marco Neumann	fced536ebd	refactor: improve consistent access under "remove if" (#5693 ) * refactor: improve consistent access under "remove if" With all the concurrency introduced in #5668, we should be a bit more careful with our "remove if" handling, esp. if a removal is triggered while a load is running concurrently. This change introduces as `remove_if_and_get` helper that ensures this and the querier over to use it. The parquet file and tombstone caches required a bit of a larger change because there the invalidation and the actual GET were kinda separate. We had this separation for the other caches as well at some point and decided that this easily leads to API misuse, so I took this opportunity to "fix" the parquet file and tombstone cache as well. * docs: improve	2022-09-20 14:03:11 +00:00
Marco Neumann	513fdf1e26	feat: split "pruned" metric into "early" and "late" (#5645 ) * feat: split "pruned" metric into "early" and "late" * docs: improve Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * docs: explain `PruningMetrics` * test: try to test pruning Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-09-15 13:42:00 +00:00
Marco Neumann	f7b6f81fe1	feat: concurrent chunk creation (#5646 ) Create chunks in querier concurrently after we've pre-filtered them. Chunk creation still may require a bit of cached information (e.g. the partition sort key) and we can easily fetch these concurrently instead of in order. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-15 12:30:02 +00:00
Nga Tran	7c4c918636	chore: add parttion id into panic message (#5641 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-15 02:21:13 +00:00
Marco Neumann	2332e5de10	refactor: slightly increase querier namespace cache TTLs (#5635 ) This should lower catalog load and eliminate a few costly cache misses. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-14 13:54:51 +00:00
Andrew Lamb	f86d3e31da	chore: Update datafusion + object_store (#5619 ) * chore: Update datafusion pin * chore: update object_store to 0.5.0 * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-13 12:34:54 +00:00
Andrew Lamb	1fd31ee3bf	chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0 (#5591 ) * chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0 * fix: enable dynamic comparison flag * chore: derive Eq for clippy * chore: update explain plans * chore: Update sizes for ReadBuffer encoding * chore: update more tests Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-12 17:45:03 +00:00
Marco Neumann	8933f47ec1	refactor: make `QueryChunk::partition_id` non-optional (#5614 ) In our data model, a chunk always belongs to a partition[^1], so let's not make this attribute optional. The optional value only leads to -- mostly surprising -- conditional behavior, ranging from "do not equalize the partition sort key" (querier) to "always consider the chunk overlapping" (iox_query when dealing with ingester chunks). [^1]: This is even true when the chunk belongs to a parquet file that is not yet added to the catalog, contrary to what a comment in the ingester stated. The catalog and data model used by the querier are two totally different things.	2022-09-12 13:52:51 +00:00
Marco Neumann	df5ef875b4	revert: disable read buffer usage in querier (#5579 ) (#5603 ) This results in a 2x-3x slow down. It's not horrible, but also not good.	2022-09-09 11:26:09 +00:00
dependabot[bot]	786ce75e26	chore(deps): Bump tokio-util from 0.7.3 to 0.7.4 (#5596 ) Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.3 to 0.7.4. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.3...tokio-util-0.7.4) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-09-09 07:40:16 +00:00
Marco Neumann	c3b47dfe59	refactor: disable read buffer usage in querier (#5579 ) * refactor: read querier parquet files from cache * refactor: only use parquet files in querier (no RB) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-08 13:18:22 +00:00
YIXIAO SHI	52ae60bf2e	chore: fix comment typo (#5551 ) Co-authored-by: Dom <dom@itsallbroken.com>	2022-09-07 08:49:29 +00:00

1 2 3 4 5 ...

373 Commits (45b3984aa30159e253c94e18bee24e518c65f63d)