influxdb

Commit Graph

Author	SHA1	Message	Date
Carol (Nichols \|\| Goulding)	10413a6e9a	docs: Explain that the querier gets the write info from the ingesters Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-05-12 09:48:46 -04:00
Carol (Nichols \|\| Goulding)	48b84b3bdf	feat: Querier can get write status from ingesters Connects to influxdata/influxdb-iox-client-go#27.	2022-05-11 14:12:10 -04:00
Carol (Nichols \|\| Goulding)	77205d9a8e	fix: Remove some unused error variants	2022-05-11 14:07:48 -04:00
Andrew Lamb	b8cb4c3f2b	feat: Interrogate schema from querier (as well as router) (#4557 ) * refactor: move SchemaService into `service_grpc_schema` * feat: implement schema gRPC for querier * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-05-10 20:55:58 +00:00
Raphael Taylor-Davies	8b379c83cc	refactor: simplify object_store path handling (#4534 ) * refactor: simplify object_store path handling * fix: aws integration tests * chore: lint * fix: update gcs tests * refactor: move errors into submodules * chore: lint * chore: review feedback * refactor: replace provider with Display * fix: failing tests Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-09 18:43:22 +00:00
Carol (Nichols \|\| Goulding)	dfced5b21c	fix: Move top-level allow dead code in querier to specific items	2022-05-06 16:58:03 -04:00
Jake Goulding	e07bcd40c2	refactor: Remove unused dependencies These were found by iterating over all of the dependencies of each Cargo.toml, then grepping that crate for the dependency's name. If it didn't show up, I attempted to remove it. I left a few dependencies that this process flagged: * generated_types - `pbjson`,`serde`. Apparently used by the generated code. * grpc-router-test-gen - `prost`. Apparently used by the generated code. * influxdb_iox - `heappy`. Doesn't appear used, but is behind enough feature flags that I don't care to reason about and it's already optional. - `tikv_jemalloc_sys`. Appears to be setting a feature flag of an indirect dependency. * iox_gitops_adapter - `k8s_openapi`. Appears to be setting a feature flag of an indirect dependency.	2022-05-06 15:57:58 -04:00
Carol (Nichols \|\| Goulding)	068096e7e1	fix: Rename data_types2 to data_types	2022-05-06 14:45:39 -04:00
Carol (Nichols \|\| Goulding)	0541c6e40f	fix: Remove data_types crate where it's no longer used	2022-05-06 14:45:39 -04:00
Carol (Nichols \|\| Goulding)	2ef44f2024	fix: Move timestamp types to data_types2	2022-05-06 14:45:38 -04:00
Carol (Nichols \|\| Goulding)	d2671355c3	fix: Move partition metadata types to data_types2	2022-05-06 14:45:37 -04:00
Carol (Nichols \|\| Goulding)	485d6edb8f	refactor: Move IngesterQueryRequest to generated_types	2022-05-06 14:45:37 -04:00
Carol (Nichols \|\| Goulding)	ea46830954	fix: Remove iox_object_store crate; move ParquetFilePath to parquet_file	2022-05-06 14:45:36 -04:00
Andrew Lamb	37c7ce793c	chore: Update datafusion (again) (#4518 ) * chore: Update datafusion (again) * refactor: Update ExecutionPlan:execute to not be async	2022-05-05 15:43:41 +00:00
Andrew Lamb	02893e598c	chore: Update datafusion and upgrade arrow/parquet/arrow-flight to 13 (#4516 ) * chore: Tool for automating arrow version update * chore: Update datafusion and arrow/parquet/arrow-flight * fix: update for changes in Arrow API Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-05 00:21:02 +00:00
Nga Tran	4813cc8332	test: Added explain tests for querier. Found and fixed #4468 (#4469 ) * test: Added explain tests for querier. Found and fixed #4468 * chore: cleanup * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-29 14:15:30 +00:00
Marco Neumann	6eed09a926	test: use "real" ingester in query tests (#4455 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-28 14:39:31 +00:00
dependabot[bot]	420c306caa	chore(deps): Bump tokio from 1.17.0 to 1.18.0 (#4453 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.17.0 to 1.18.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.17.0...tokio-1.18.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-04-28 08:21:17 +00:00
Marco Neumann	bd0bae13ce	fix: extend + harden querier `ensure_schema` (#4429 ) - only convert dictionary types that we really want to convert (instead of blindly converting all types) - handle missing / NULL columns Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-27 12:49:59 +00:00
Nga Tran	fa2c1febf4	feat: use stored partition sort key to deduplicate data (#4360 ) * feat: use stored sort key to deduplicate data * refactor: verify if one is a super sort key of the other * test: unit tests for scan and deduplication plans * fix: typo * refactor: refactor and add comments * feat: cache partition sort key to read during planning as needed * test: tests for query plans with different overlap groups * chore: cleanup * chore: resolve merge conflicts Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-26 20:36:32 +00:00
Marco Neumann	2337935660	test: chunks in ingester stage (#4415 ) * refactor: document and improve `MockIngesterConnection` * refactor: split `OldOneMeasurementFourChunksWithDuplicates` for `EXPLAIN` queries * fix: mark "IngsterPartition" chunks as unsorted * fix: "group by" queries may require sorted comparison * refactor: re-export a few more types from querier * fix: ensure that test parquet files are de-duped * test: chunks in ingester stage * docs: explain test code	2022-04-26 07:55:19 +00:00
二手掉包工程师	4b47d723b1	refactor: Rename time to iox_time (#4416 ) Signed-off-by: hi-rustin <rustin.liu@gmail.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-26 00:19:59 +00:00
Marco Neumann	86e8f05ed1	fix: make all catalog IDs 64bit (#4418 ) Closes #4365. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-25 16:49:34 +00:00
Marco Neumann	f5f80e879e	test: add benchmarks for addressable heap (#4201 )	2022-04-25 14:37:29 +00:00
Nga Tran	d963110842	feat: group chunk overlaps based on time range only (#4389 ) * feat: overlap for NG querier * chore: cleanup * refactor: address review comments * fix: typo Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-25 13:32:07 +00:00
Marco Neumann	99f6fb5f59	feat: calculate summaries for `IngesterPartition`	2022-04-22 10:21:14 +02:00
Andrew Lamb	e67cc9dbce	chore: Update datafusion again (#4385 ) * chore: Update datafusion * fix: Update imports Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-21 21:05:16 +00:00
Marco Neumann	c084785bc3	feat: fuse ingester and catalog states in querier (#4355 ) * feat: fuse ingester and catalog states in querier This now correctly combines the data we get from the ingester w/ the data we get from the catalog. Right now it bails out if during the very small time windows between asking the ingester and querying the catalog the compactor combines the newest files w/ "too new" files (see tests). * fix: improve error wording Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * fix: improve doc comment Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * fix: explain tests Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: improve tests, method naming and docs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-20 14:37:19 +00:00
Andrew Lamb	73bed810da	chore: Update arrow, arrow-flight, parquet, tonic, prost, etc (#4357 ) * chore: Update datafusion * chore: Update arrow/arrow-flight/parquet to 12 * chore: update datafusion correctly * chore: Update prost, tonic, and dependents * fix: Fixup some api changes * fix: Update test output in db * fix: Update test output in parquet_file * fix: remove old pbjson types * fix: Add "--experimental_allow_proto3_optional" flag * chore: Run cargo hakari tasks * fix: compile error * chore: Update heappy Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-20 11:12:17 +00:00
Andrew Lamb	383c0b328d	feat: Issue queries from querier to ingester in parallel (#4359 ) * feat: Issue queries from querier to ingester in parallel * refactor: complete Arc-ification * refactor: use a named struct to pass the state	2022-04-20 10:55:14 +00:00
Marco Neumann	d711816548	feat: add sequencer ID and correct partition key to `IngesterPartition` (#4348 ) * feat: impl `Debug` for `TestCatalog` * feat: add sequencer ID and correct partition key to `IngesterPartition` - simplifies debugging (parquet chunks and ingester chunks now use the same partition key naming) - the sequencer ID is required to correctly reason about tombstones (to be implemented in a later PR) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-19 15:59:49 +00:00
Marco Neumann	3588a06647	feat: correctly dismantle ingester gRPC reponse in querier (#4323 ) This now correctly processes record batches for the different partitions. The actual code change is rather small but it requires some substantial test infrastructure.	2022-04-19 11:09:40 +00:00
Marco Neumann	de1241db85	test: mock gRPC ingester response for querier Add infrastructure to test how the querier processes ingester gRPC responses w/o performing a full query or end2end test.	2022-04-14 15:00:35 +02:00
Marco Neumann	351b0d0c15	fix: unknown namespace/table in querier<>ingester flight protocol (#4307 ) * fix: return "not found" gRPC error instead of "internal" when ingester does not know table * fix: properly handle "namespace not found" in ingester queries * fix: make `initialize_db` work with async code * test: add custom step for NG tests * fix: handle "unknown table/namespace" resp. in querier * docs: explain test setup Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-04-14 12:36:15 +00:00
Carol (Nichols \|\| Goulding)	8c9b7b501b	fix: Update a signature to use ParquetFileWithMetadata	2022-04-13 11:09:06 -04:00
Carol (Nichols \|\| Goulding)	94dcde4996	fix: Do fewer queries for metadata By adding another _with_metadata catalog function. Also introduce a new type rather than passing around tuples everywhere.	2022-04-13 10:43:20 -04:00
Carol (Nichols \|\| Goulding)	02fee3b84f	feat: Request parquet metadata from the catalog when needed only	2022-04-13 10:43:19 -04:00
Marco Neumann	f75d3b1f5d	fix: proper executor shutdown in querier This is not a huge issue but might drain resources like file descriptors during tests. The dedicated exuector also logs a warning.	2022-04-13 15:52:44 +02:00
Marco Neumann	83f77712b1	refactor: querier<>ingester flight protocol adjustments (#4286 ) * refactor: querier<>ingester flight protocol adjustments This makes a few adjustments to the querier<>ingester flight protocol. Query Scope =========== The querier will request data for ALL sequencer IDs for now. There is no reason to have a request per sequencer ID. We can add a range/set filter later if we want, but this is not required for now. Partition-level =============== The only time when the querier cares about sequencer IDs (i.e. sharding) at all is when it selects which ingesters to ask for unpersisted data (this is currently not implemented, it just asks all ingesters). Afterwards the querier only cares about partitions (which are bound to specific sequencers anyways) because this is the level where parquet file persistence and compaction as well as deduplication happen. So we make partitions a first-class citizen in the ingester response. Metadata VS RecordBatches ========================= The global app-metadata will list all partitions and their max persisted parquet files and tombstones (theoretically tombstones are at table-level, but the ingester could in the future break them down to the partition-level). Then it receives a stream of record batches. Each record batch is tagged (via key-value metadata in its schema) so it can be assigned to a partition. At the moment the ingester returns 0 or 1 batches per unpersisted partition (0 in case we've filtered out all the data via the predicate), but in the future it is free to return multiple batches. This setup gives the ingester more freedom over memory management and (potentially parallel) query processing, while at the same time keeps the set of duplicated information minimal and allows easy extensions (since the global metadata is a full-blown protobuf message). Querier ======= At the moment the querier ignores all the metdata. Follow-up PRs will change that. * docs: improve Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: make code clearer Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-04-12 16:48:40 +00:00
Andrew Lamb	d8de38cdb9	feat: MVP include un-persisted results from the ingester in query results (#4255 ) * feat: Return not-yet-persisted data in query results * fix: comments from code review * fix: update for logical merge conflict Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-12 11:03:19 +00:00
Marco Neumann	380cd9bbff	refactor: use a single flight client implementation (#4273 ) "end-user -> querier" and "querier -> ingester" should use a single Flight client implementation. The difference is just the request and response metadata. This changes our default Flight client to use protobuf instead of JSON for the ticket format.	2022-04-12 09:08:25 +00:00
Andrew Lamb	3f5eab7648	feat: allow the querier to talk with multiple ingesters (#4271 ) * refactor: Move querier config to clap_blocks * refactor: Add tests * refactor: allow multiple addresses * refactor: Update to use multiple addresses * fix: bow to clippy * fix: docstring * fix: error if address is repeated multiple times * chore: Add error enum, plumb through * fix: clippy * refactor: improve Rust API * fix: fix test	2022-04-11 18:49:49 +00:00
Andrew Lamb	941dcc8e80	fix: return error rather than panic in querier namspace access (#4270 )	2022-04-11 14:01:15 +00:00
Andrew Lamb	f6e6821276	feat: Add basic Querier <--> Ingester "Service Configuration" (#4259 ) * feat: Add basic Querier <--> Ingester "Service Configuration" * docs: update comments in test * refactor: cleanup tests a little * refactor: make trait more consistent * docs: improve comments in IngesterPartition	2022-04-11 11:50:22 +00:00
Andrew Lamb	bbbdcc75a8	feat: `QuerierDatabase::chunks` returns `Result` (#4260 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-08 18:54:17 +00:00
Andrew Lamb	eb7d41f7a1	test: Add schema validation to end to end querier test (#4258 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-08 18:11:00 +00:00
Andrew Lamb	34e65c23fa	fix: Update for signature change (#4252 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-08 11:21:07 +00:00
Marco Neumann	5bebc73e3f	fix: consider "in-between" tombstones as processed (#4187 ) Abstract ======== We need to be careful w/ tombstone that fall exactly in sequence number range of a parquet file. Current Bug =========== Imagine the following order of events: 1. Router creates write at sequence number 1: - `table,selector=1 payload=1 1` - `table,selector=2 payload=2 2` 2. Ingester pulls write, waits a bit and persists it to parquet file 1: - `table,selector=1 payload=1 1` - `table,selector=2 payload=2 2` 4. Router creates write at sequence number 2: - `table,selector=1 payload=3 3` - `table,selector=2 payload=4 4` 5. Ingester pulls write 6. Router create delete at sequencer number 3: full time range, `selector=1` 7. Ingeser pulls delete and creates tombstone 1 8. Router creates write at sequence number 4: - `table,selector=1 payload=5 5` - `table,selector=2 payload=6 6` 9. Ingester pulls write 10. Ingester persists parquet file 2: - `table,selector=2 payload=4 4` - `table,selector=1 payload=5 5` - `table,selector=2 payload=6 6` When reading parquet file 2, the tombstone MUST NOT be applied. Otherwise `table,selector=1 payload=5 5` will be deleted. Notes ===== Technically this issue also applies to files created by the compactor, however the compactor marks tombstones as processed that fall into the sequence number range. It even does that in a single transaction: `fc4635a334/compactor/src/compact.rs (L821-L861)` Alternative =========== An alternative solution would be if the ingester would mark tombstones that it materialized during persistence as "processed" (tombstone 1 for parquet file 2 in the example above). However "processed" markers are currently a mere optimization and don't affect correctness, which is nice for caching on the querier side as well as reasoning. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-31 15:09:58 +00:00
Andrew Lamb	de6505c801	fix: retry catalog list operation (#4188 )	2022-03-31 13:07:00 +00:00
Andrew Lamb	a1df864283	feat: Support 'SHOW NAMESPACES' in sql repl (#4164 ) * feat: Support `SHOW NAMESPACES` in sql repl * feat: add basic support to clients * fix: add get_namespaces service test * fix: proper error handling * test: end to end test for namespace client * refactor: Use QuerierDatabase rather than Catalog * refactor: remove unused function	2022-03-31 12:57:33 +00:00

1 2 3

121 Commits (1bf519c2f62f3f9e417159c0d2da4f43d803e59a)