influxdb

Commit Graph

Author	SHA1	Message	Date
Carol (Nichols \|\| Goulding)	a4f51d99f6	feat: Use the read buffer chunk cache in the querier	2022-06-03 09:16:04 -04:00
Andrew Lamb	3592aa52d8	chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0` (#4743 ) * chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0` * chore: Update APIs * chore: Run cargo hakari tasks * feat: normalize parquet file metadata * chore: update size tests * chore: add docs on metadata stripping * chore: TEMP UPDATE TO DF BRANCH * chore: Update for new API * fix: Update to latest DF * fix: cargo hakari Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>	2022-06-03 10:32:26 +00:00
dependabot[bot]	9a21292db8	chore(deps): Bump async-trait from 0.1.53 to 0.1.56 (#4774 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.53 to 0.1.56. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.53...0.1.56) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-03 09:10:40 +00:00
Carol (Nichols \|\| Goulding)	9d9c5d3692	fix: Take backoff config as an argument to be consistent with the other caches	2022-06-02 09:50:48 -04:00
Carol (Nichols \|\| Goulding)	76b40ac6a1	refactor: Make the type alias into a struct	2022-06-02 09:26:11 -04:00
Carol (Nichols \|\| Goulding)	715c65dfef	docs: Clarify a comment about what is an Arc	2022-06-02 09:22:44 -04:00
Carol (Nichols \|\| Goulding)	879dd7cec4	test: LRU behavior of the read buffer chunk cache	2022-06-02 09:22:44 -04:00
Carol (Nichols \|\| Goulding)	9328ba8c45	feat: Use new extra loading info to load read buffer chunks into cache	2022-06-02 09:22:44 -04:00
Carol (Nichols \|\| Goulding)	054c25de50	refactor: Add more methods to DecodedParquetFile I'm tired of trying to remember which info is on which metadata.	2022-06-02 09:22:44 -04:00
Marco Neumann	9e30a3eb29	refactor: rework querier concurrency limiting (#4760 ) * refactor: rework querier concurrency limiting With #4752 we introduced a concurrency limit into the querier. It works by drawing permits from a central semaphore whenever we create a `QuerierNamespace`. This however only limits concurrency during query planning and not query execution, because the objects contained within the plan (chunks and some metadata) neither reference the permit nor the `QuerierNamespace`. Now one approach to fix that would be to wire up the permit all the down into all the query-related data structures. This however is very fiddly and potentially will get lost at some point, because as soon as we transform these data structures -- e.g. into streams -- the permit might get lost again. This will be potentially query-dependent and very hard to debug. So instead we reverse the approach and track the permits at the upper layer of the stack: the gRPC service entry points. There we also need to be careful -- e.g. when we return streams to tonic -- but it's way easier to review that then the deeply nested object hierarchy that is involved with queries. Also the separation of concerns is a bit clearer, because why would a "chunk" care about the "query concurrency" as a whole. * refactor: improve gRPC permit keeping and prepare tests	2022-06-02 09:49:58 +00:00
Carol (Nichols \|\| Goulding)	37347f2389	feat: Add an Extra type to Cacher Loader to specify extra information for loading entries	2022-06-01 08:58:19 -04:00
Andrew Lamb	2886149afc	chore: naming / comment cleanups from namespace semaphore (#4753 )	2022-06-01 12:46:38 +00:00
Marco Neumann	ebeccf037c	feat: limit querier concurrency by limiting number of active namespaces (#4752 ) This is a rather quick fix for prod. On the mid-term we probably wanna rethink our deployment strategy, e.g. by using "one query per pod" and by deploying queryd w/ IOx into the same pod.	2022-06-01 11:59:35 +00:00
Andrew Lamb	d0903b11bb	refactor: reduce test duplication in `querier/src/table/mod.rs` (#4698 ) * refactor: reduce test duplication in `querier/src/table/mod.rs` * fix: Apply suggestions from code review Co-authored-by: Jake Goulding <jake.goulding@integer32.com> * fix: Update querier/src/table/test_util.rs Co-authored-by: Jake Goulding <jake.goulding@integer32.com> * fix: use now_nanos() * refactor: Add TestQuerierTable * refactor: rename functions for explicitness Co-authored-by: Jake Goulding <jake.goulding@integer32.com>	2022-05-30 12:56:09 +00:00
Carol (Nichols \|\| Goulding)	55cd8d15be	fix: Update method name to specify the kind of chunk it makes	2022-05-27 13:04:24 -04:00
Carol (Nichols \|\| Goulding)	f0b4d71f47	docs: Update comment to reflect new implementation	2022-05-27 13:04:24 -04:00
Carol (Nichols \|\| Goulding)	5232594aab	docs: Fix grammar in a comment Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-05-27 13:04:13 -04:00
Carol (Nichols \|\| Goulding)	2cb351cd0d	feat: Make a QuerierRBChunk wrapper to handle traits and extra data This brings back a bunch of code from OG from read buffer backed DbChunks.	2022-05-26 16:52:14 -04:00
Carol (Nichols \|\| Goulding)	5fd3ffc17f	refactor: Rename ParquetChunkAdapter to only ChunkAdapter It might be creating chunks of different kinds other than ParquetChunks.	2022-05-26 16:52:14 -04:00
Andrew Lamb	633117e595	feat: avoid catalog access on each query (#4650 ) * feat: cache catalog access on query * fix: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com>	2022-05-26 20:44:22 +00:00
kodiakhq[bot]	1043c98e17	Merge branch 'main' into cn/welcome-back-read-buffer	2022-05-26 13:47:27 +00:00
Andrew Lamb	2d5a327bf4	fix: expire empty parquet_files cache and empty tombstones cache (#4701 ) * fix: expire empty parquet_files cache * fix: expire empty tombstones cache Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-26 11:08:15 +00:00
Carol (Nichols \|\| Goulding)	cddcca1e05	feat: Implement a method to get a read buffer chunk from a stream of record batches	2022-05-25 17:24:35 -04:00
Carol (Nichols \|\| Goulding)	f7bc551d9a	feat: Sketch out skeleton methods for RBChunk cache	2022-05-25 17:19:10 -04:00
Carol (Nichols \|\| Goulding)	04531e77dd	feat: Implement get on ReadBufferCache	2022-05-25 17:19:10 -04:00
Carol (Nichols \|\| Goulding)	25b8260b72	feat: Implement ReadBufferCache::new	2022-05-25 17:19:10 -04:00
Carol (Nichols \|\| Goulding)	ab9010d9a6	refactor: Rename QuerierParquetChunk::new_parquet to new	2022-05-25 17:19:10 -04:00
Carol (Nichols \|\| Goulding)	df10452e2e	refactor: Rename methods from new_querier_chunk to new_querier_parquet_chunk	2022-05-25 17:19:10 -04:00
Carol (Nichols \|\| Goulding)	4a90d0af32	refactor: Remove ChunkStorage enum; inline into QuerierParquetChunk instead	2022-05-25 17:19:10 -04:00
Carol (Nichols \|\| Goulding)	b2c62c6808	refactor: Rename QuerierChunk to QuerierParquetChunk	2022-05-25 17:19:10 -04:00
Carol (Nichols \|\| Goulding)	66823522f3	docs: Fix comment wrapping while reading through	2022-05-25 17:19:10 -04:00
Marco Neumann	a08a91c5ba	fix: ensure querier cache is refreshed for partition sort key (#4660 ) * test: call `maybe_start_logging` in auto-generated cases * fix: ensure querier cache is refreshed for partition sort key Fixes #4631. * docs: explain querier sort key handling and test * test: test another version of issue 4631 * fix: correctly invalidate partition sort keys * fix: fix `table_not_found_on_ingester`	2022-05-25 10:44:42 +00:00
Andrew Lamb	935743b525	refactor: Implement `new_querier_chunk` and `new_querier_chunk_from_file_with_metadata` (#4685 )	2022-05-24 21:58:27 +00:00
Andrew Lamb	4d8ece5524	feat: Add `Tombstone` to querier cache (#4663 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-24 13:21:23 +00:00
Marco Neumann	a3dab68f3f	fix: actually log error (#4672 ) While logging all the helpful information to replicate failing querier->ingester requests via CLI, I totally forgot to log the error message itself. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-24 08:44:35 +00:00
Andrew Lamb	e877a64462	feat: Add `ParquetFiles` cache and memory size estimation for ParquetMetadata (#4661 ) * feat: Add `ParquetFiles` cache * fix: Apply suggestions from code review Co-authored-by: Marko Mikulicic <mkm@influxdata.com> * fix: remove commented out debugging println * refactor: Improve size calculation * fix: mark `ParquetFileCache::clear` test only * fix: assert on metric count Co-authored-by: Marko Mikulicic <mkm@influxdata.com>	2022-05-23 17:11:38 +00:00
Marco Neumann	2029bd16ba	feat: enable debugging of failed querier->ingester requests (#4659 ) * feat: enable debugging of failed querier->ingester requests - extend `query-ingester` CLI to allow usage of predicates - on failed requests: log all information that required for the CLI - test the "ingester fails" scenario * test: explain Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * docs: improve Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: move b64 pred. serde into a single crate Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-05-23 15:37:31 +00:00
Dom Dwyer	7df7c4844c	refactor: remove redundant ParquetChunk errors Eliminates unused / refactors away unnecessary errors for the parquet::chunk module.	2022-05-20 15:17:40 +01:00
Andrew Lamb	a18a49736d	refactor: Encapsulate reconciliation logic more (#4644 ) * refactor: extract code from state_reconciler * refactor: Encapsulate reconcilation logic more * fix: docs	2022-05-19 19:25:36 +00:00
Dom Dwyer	baa86d846f	refactor: use ParquetStore instead of ObjectStore Changes the code paths that interact with Parquet files in the object store to reference the ParquetStorage directly (DRY refactor). This change takes us from a dependency graph of: ┌─────────────────┐ │ │ ▼ │ Parquet Consumer │ │ ┌──────────────┐ ├────────▶│ParquetStorage│ ▼ └──────────────┘ ┌──────────────┐ │ ObjectStore │ └──────────────┘ │ ┌────┴────┐ ▼ ▼ File s3 System (etc) to: Parquet Consumer │ ▼ ┌──────────────┐ │ParquetStorage│ └──────────────┘ │ ▼ ┌──────────────┐ │ ObjectStore │ └──────────────┘ │ ┌────┴────┐ ▼ ▼ File s3 System (etc) With the ParquetStorage being solely responsible for managing interactions with the object store when dealing with Parquet files.	2022-05-19 13:52:51 +01:00
Dom Dwyer	e20b02b914	refactor: tidy ParquetChunk constructor Removes two unused constructors for a ParquetChunk, and moves the bare fn constructor that is actually used to be an associated method (a conventional constructor).	2022-05-19 13:51:07 +01:00
Andrew Lamb	ed41622593	chore: Remove dead code from QueryDatabase (#4637 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-19 10:29:54 +00:00
Marco Neumann	6577887440	feat: instrument querier cache loaders w/ metrics (#4635 ) * feat: `MetricsLoader` Add ability to instrument cache loaders w/ metrics. * feat: instrument querier cache loaders w/ metrics * fix: fix metric descriptions and names	2022-05-19 08:30:34 +00:00
Marco Neumann	770293a973	feat: add LRU cache metrics (#4632 ) * refactor: require `Resource`s to be convertible to `u64` * refactor: require `Resource`s to have a unit name * refactor: make LRU cache IDs static * feat: add LRU cache metrics * docs: improve type names in LRU doctest * docs: epxlain `MeasuredT` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * docs: explain `test_metrics` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-05-19 08:05:17 +00:00
Marco Neumann	4bd899369e	feat: check for overlapping ingester partititions in querier (#4633 ) Right now this would clearly indicate a bug and before I am trying to understand some prod issues, I wanna rule that one out.	2022-05-18 13:16:27 +00:00
Marco Neumann	23b37a1991	refactor: remove unused `TableCache::id`	2022-05-18 11:39:30 +02:00
Marco Neumann	7c20acb2e6	refactor: remove unused `NamespaceCache::name`	2022-05-18 11:39:30 +02:00
Marco Neumann	52346642a0	ci: fix cargo deny (#4629 ) * ci: fix cargo deny * chore: downgrade `socket2`, version 0.4.5 was yanked * chore: rename `query` to `iox_query` `query` is already taken on crates.io and yanked and I am getting tired of working around that.	2022-05-18 09:38:35 +00:00
Andrew Lamb	3a33e806c7	chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `14.0.0` (#4619 ) * chore: Update datafusion deps * chore: update arrow/parquet/arrow flight deps * chore: Run cargo hakari tasks * chore: Update location of utils * chore: Update some more APIs Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-05-17 14:13:03 +00:00
Marco Neumann	779f0e9cdf	feat: querier RAM pool (#4593 ) * feat: `SortKey::size` * feat: `FunctionEstimator` * feat: querier RAM pool Let's put all the caches into a single RAM pool, so we can at least somewhat control RAM usage. Note that this does NOT limit the peak memory during query execution though, but should at least stop unlimited cache growth. A follow-up PR will add metrics. * refactor: improve some size calculations Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-17 13:11:20 +00:00

1 2 3 4

185 Commits (a4f51d99f6a906396938cdbb767e1e378fec6f7a)