Commit Graph

121 Commits (1bf519c2f62f3f9e417159c0d2da4f43d803e59a)

Author SHA1 Message Date
Carol (Nichols || Goulding) 10413a6e9a
docs: Explain that the querier gets the write info from the ingesters
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-05-12 09:48:46 -04:00
Carol (Nichols || Goulding) 48b84b3bdf
feat: Querier can get write status from ingesters
Connects to influxdata/influxdb-iox-client-go#27.
2022-05-11 14:12:10 -04:00
Carol (Nichols || Goulding) 77205d9a8e
fix: Remove some unused error variants 2022-05-11 14:07:48 -04:00
Andrew Lamb b8cb4c3f2b
feat: Interrogate schema from querier (as well as router) (#4557)
* refactor: move SchemaService into `service_grpc_schema`

* feat: implement schema gRPC for querier

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-05-10 20:55:58 +00:00
Raphael Taylor-Davies 8b379c83cc
refactor: simplify object_store path handling (#4534)
* refactor: simplify object_store path handling

* fix: aws integration tests

* chore: lint

* fix: update gcs tests

* refactor: move errors into submodules

* chore: lint

* chore: review feedback

* refactor: replace provider with Display

* fix: failing tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-09 18:43:22 +00:00
Carol (Nichols || Goulding) dfced5b21c
fix: Move top-level allow dead code in querier to specific items 2022-05-06 16:58:03 -04:00
Jake Goulding e07bcd40c2 refactor: Remove unused dependencies
These were found by iterating over all of the dependencies of each
Cargo.toml, then grepping that crate for the dependency's name. If it
didn't show up, I attempted to remove it.

I left a few dependencies that this process flagged:

* generated_types
  - `pbjson`,`serde`. Apparently used by the generated code.

* grpc-router-test-gen
  - `prost`. Apparently used by the generated code.

* influxdb_iox
  - `heappy`. Doesn't appear used, but is behind enough feature
    flags that I don't care to reason about and it's already optional.
  - `tikv_jemalloc_sys`. Appears to be setting a feature flag of an
    indirect dependency.

* iox_gitops_adapter
  - `k8s_openapi`. Appears to be setting a feature flag of an indirect
    dependency.
2022-05-06 15:57:58 -04:00
Carol (Nichols || Goulding) 068096e7e1
fix: Rename data_types2 to data_types 2022-05-06 14:45:39 -04:00
Carol (Nichols || Goulding) 0541c6e40f
fix: Remove data_types crate where it's no longer used 2022-05-06 14:45:39 -04:00
Carol (Nichols || Goulding) 2ef44f2024
fix: Move timestamp types to data_types2 2022-05-06 14:45:38 -04:00
Carol (Nichols || Goulding) d2671355c3
fix: Move partition metadata types to data_types2 2022-05-06 14:45:37 -04:00
Carol (Nichols || Goulding) 485d6edb8f
refactor: Move IngesterQueryRequest to generated_types 2022-05-06 14:45:37 -04:00
Carol (Nichols || Goulding) ea46830954
fix: Remove iox_object_store crate; move ParquetFilePath to parquet_file 2022-05-06 14:45:36 -04:00
Andrew Lamb 37c7ce793c
chore: Update datafusion (again) (#4518)
* chore: Update datafusion (again)

* refactor: Update ExecutionPlan:execute to not be async
2022-05-05 15:43:41 +00:00
Andrew Lamb 02893e598c
chore: Update datafusion and upgrade arrow/parquet/arrow-flight to 13 (#4516)
* chore: Tool for automating arrow version update

* chore: Update datafusion and arrow/parquet/arrow-flight

* fix: update for changes in Arrow API

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-05 00:21:02 +00:00
Nga Tran 4813cc8332
test: Added explain tests for querier. Found and fixed #4468 (#4469)
* test: Added explain tests for querier. Found and fixed #4468

* chore: cleanup

* chore: Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-29 14:15:30 +00:00
Marco Neumann 6eed09a926
test: use "real" ingester in query tests (#4455)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-28 14:39:31 +00:00
dependabot[bot] 420c306caa
chore(deps): Bump tokio from 1.17.0 to 1.18.0 (#4453)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.17.0 to 1.18.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.17.0...tokio-1.18.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-28 08:21:17 +00:00
Marco Neumann bd0bae13ce
fix: extend + harden querier `ensure_schema` (#4429)
- only convert dictionary types that we really want to convert (instead
  of blindly converting all types)
- handle missing / NULL columns

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-27 12:49:59 +00:00
Nga Tran fa2c1febf4
feat: use stored partition sort key to deduplicate data (#4360)
* feat: use stored sort key to deduplicate data

* refactor: verify if one is a super sort key of the other

* test: unit tests for scan and deduplication plans

* fix: typo

* refactor: refactor and add comments

* feat: cache partition sort key to read during planning as needed

* test: tests for query plans with different overlap groups

* chore: cleanup

* chore: resolve merge conflicts

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-26 20:36:32 +00:00
Marco Neumann 2337935660
test: chunks in ingester stage (#4415)
* refactor: document and improve `MockIngesterConnection`

* refactor: split `OldOneMeasurementFourChunksWithDuplicates` for `EXPLAIN` queries

* fix: mark "IngsterPartition" chunks as unsorted

* fix: "group by" queries may require sorted comparison

* refactor: re-export a few more types from querier

* fix: ensure that test parquet files are de-duped

* test: chunks in ingester stage

* docs: explain test code
2022-04-26 07:55:19 +00:00
二手掉包工程师 4b47d723b1
refactor: Rename time to iox_time (#4416)
Signed-off-by: hi-rustin <rustin.liu@gmail.com>

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-26 00:19:59 +00:00
Marco Neumann 86e8f05ed1
fix: make all catalog IDs 64bit (#4418)
Closes #4365.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-25 16:49:34 +00:00
Marco Neumann f5f80e879e
test: add benchmarks for addressable heap (#4201) 2022-04-25 14:37:29 +00:00
Nga Tran d963110842
feat: group chunk overlaps based on time range only (#4389)
* feat: overlap for NG querier

* chore: cleanup

* refactor: address review comments

* fix: typo

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-25 13:32:07 +00:00
Marco Neumann 99f6fb5f59 feat: calculate summaries for `IngesterPartition` 2022-04-22 10:21:14 +02:00
Andrew Lamb e67cc9dbce
chore: Update datafusion again (#4385)
* chore: Update datafusion

* fix: Update imports

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-21 21:05:16 +00:00
Marco Neumann c084785bc3
feat: fuse ingester and catalog states in querier (#4355)
* feat: fuse ingester and catalog states in querier

This now correctly combines the data we get from the ingester w/ the
data we get from the catalog. Right now it bails out if during the very
small time windows between asking the ingester and querying the catalog
the compactor combines the newest files w/ "too new" files (see tests).

* fix: improve error wording

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* fix: improve doc comment

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* fix: explain tests

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refactor: improve tests, method naming and docs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-20 14:37:19 +00:00
Andrew Lamb 73bed810da
chore: Update arrow, arrow-flight, parquet, tonic, prost, etc (#4357)
* chore: Update datafusion

* chore: Update arrow/arrow-flight/parquet to 12

* chore: update datafusion correctly

* chore: Update prost, tonic, and dependents

* fix: Fixup some api changes

* fix: Update test output in db

* fix: Update test output in parquet_file

* fix: remove old pbjson types

* fix: Add "--experimental_allow_proto3_optional" flag

* chore: Run cargo hakari tasks

* fix: compile error

* chore: Update heappy

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-20 11:12:17 +00:00
Andrew Lamb 383c0b328d
feat: Issue queries from querier to ingester in parallel (#4359)
* feat: Issue queries from querier to ingester in parallel

* refactor: complete Arc-ification

* refactor: use a named struct to pass the state
2022-04-20 10:55:14 +00:00
Marco Neumann d711816548
feat: add sequencer ID and correct partition key to `IngesterPartition` (#4348)
* feat: impl `Debug` for `TestCatalog`

* feat: add sequencer ID and correct partition key to `IngesterPartition`

- simplifies debugging (parquet chunks and ingester chunks now use the
  same partition key naming)
- the sequencer ID is required to correctly reason about tombstones (to
  be implemented in a later PR)

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-19 15:59:49 +00:00
Marco Neumann 3588a06647
feat: correctly dismantle ingester gRPC reponse in querier (#4323)
This now correctly processes record batches for the different
partitions. The actual code change is rather small but it requires some
substantial test infrastructure.
2022-04-19 11:09:40 +00:00
Marco Neumann de1241db85 test: mock gRPC ingester response for querier
Add infrastructure to test how the querier processes ingester gRPC
responses w/o performing a full query or end2end test.
2022-04-14 15:00:35 +02:00
Marco Neumann 351b0d0c15
fix: unknown namespace/table in querier<>ingester flight protocol (#4307)
* fix: return "not found" gRPC error instead of "internal" when ingester does not know table

* fix: properly handle "namespace not found" in ingester queries

* fix: make `initialize_db` work with async code

* test: add custom step for NG tests

* fix: handle "unknown table/namespace" resp. in querier

* docs: explain test setup

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-04-14 12:36:15 +00:00
Carol (Nichols || Goulding) 8c9b7b501b
fix: Update a signature to use ParquetFileWithMetadata 2022-04-13 11:09:06 -04:00
Carol (Nichols || Goulding) 94dcde4996
fix: Do fewer queries for metadata
By adding another _with_metadata catalog function. Also introduce a new
type rather than passing around tuples everywhere.
2022-04-13 10:43:20 -04:00
Carol (Nichols || Goulding) 02fee3b84f
feat: Request parquet metadata from the catalog when needed only 2022-04-13 10:43:19 -04:00
Marco Neumann f75d3b1f5d fix: proper executor shutdown in querier
This is not a huge issue but might drain resources like file descriptors
during tests. The dedicated exuector also logs a warning.
2022-04-13 15:52:44 +02:00
Marco Neumann 83f77712b1
refactor: querier<>ingester flight protocol adjustments (#4286)
* refactor: querier<>ingester flight protocol adjustments

This makes a few adjustments to the querier<>ingester flight protocol.

Query Scope
===========
The querier will request data for ALL sequencer IDs for now. There is
no reason to have a request per sequencer ID. We can add a range/set
filter later if we want, but this is not required for now.

Partition-level
===============
The only time when the querier cares about sequencer IDs (i.e. sharding)
at all is when it selects which ingesters to ask for unpersisted data
(this is currently not implemented, it just asks all ingesters).
Afterwards the querier only cares about partitions (which are bound to
specific sequencers anyways) because this is the level where parquet
file persistence and compaction as well as deduplication happen. So we
make partitions a first-class citizen in the ingester response.

Metadata VS RecordBatches
=========================
The global app-metadata will list all partitions and their max
persisted parquet files and tombstones (theoretically tombstones are at
table-level, but the ingester could in the future break them down to the
partition-level). Then it receives a stream of record batches. Each
record batch is tagged (via key-value metadata in its schema) so it can
be assigned to a partition. At the moment the ingester returns 0 or 1
batches per unpersisted partition (0 in case we've filtered out all the
data via the predicate), but in the future it is free to return multiple
batches. This setup gives the ingester more freedom over memory
management and (potentially parallel) query processing, while at the
same time keeps the set of duplicated information minimal and allows
easy extensions (since the global metadata is a full-blown protobuf
message).

Querier
=======
At the moment the querier ignores all the metdata. Follow-up PRs will
change that.

* docs: improve

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refactor: make code clearer

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-04-12 16:48:40 +00:00
Andrew Lamb d8de38cdb9
feat: MVP include un-persisted results from the ingester in query results (#4255)
* feat: Return not-yet-persisted data in query results

* fix: comments from code review

* fix: update for logical merge conflict

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-12 11:03:19 +00:00
Marco Neumann 380cd9bbff
refactor: use a single flight client implementation (#4273)
"end-user -> querier" and "querier -> ingester" should use a single
Flight client implementation. The difference is just the request and
response metadata.

This changes our default Flight client to use protobuf instead of JSON
for the ticket format.
2022-04-12 09:08:25 +00:00
Andrew Lamb 3f5eab7648
feat: allow the querier to talk with multiple ingesters (#4271)
* refactor: Move querier config to clap_blocks

* refactor: Add tests

* refactor: allow multiple addresses

* refactor: Update to use multiple addresses

* fix: bow to clippy

* fix: docstring

* fix: error if address is repeated multiple times

* chore: Add error enum, plumb through

* fix: clippy

* refactor: improve Rust API

* fix: fix test
2022-04-11 18:49:49 +00:00
Andrew Lamb 941dcc8e80
fix: return error rather than panic in querier namspace access (#4270) 2022-04-11 14:01:15 +00:00
Andrew Lamb f6e6821276
feat: Add basic Querier <--> Ingester "Service Configuration" (#4259)
* feat: Add basic Querier <--> Ingester "Service Configuration"

* docs: update comments in test

* refactor: cleanup tests a little

* refactor: make trait more consistent

* docs: improve comments in IngesterPartition
2022-04-11 11:50:22 +00:00
Andrew Lamb bbbdcc75a8
feat: `QuerierDatabase::chunks` returns `Result` (#4260)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-08 18:54:17 +00:00
Andrew Lamb eb7d41f7a1
test: Add schema validation to end to end querier test (#4258)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-08 18:11:00 +00:00
Andrew Lamb 34e65c23fa
fix: Update for signature change (#4252)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-08 11:21:07 +00:00
Marco Neumann 5bebc73e3f
fix: consider "in-between" tombstones as processed (#4187)
Abstract
========
We need to be careful w/ tombstone that fall exactly in sequence number range of a parquet file.

Current Bug
===========
Imagine the following order of events:

1. Router creates write at sequence number 1:
   - `table,selector=1 payload=1 1`
   - `table,selector=2 payload=2 2`
2. Ingester pulls write, waits a bit and persists it to parquet file 1:
   - `table,selector=1 payload=1 1`
   - `table,selector=2 payload=2 2`
4. Router creates write at sequence number 2:
   - `table,selector=1 payload=3 3`
   - `table,selector=2 payload=4 4`
5. Ingester pulls write
6. Router create delete at sequencer number 3: full time range, `selector=1`
7. Ingeser pulls delete and creates tombstone 1
8. Router creates write at sequence number 4:
   - `table,selector=1 payload=5 5`
   - `table,selector=2 payload=6 6`
9. Ingester pulls write
10. Ingester persists parquet file 2:
    - `table,selector=2 payload=4 4`
    - `table,selector=1 payload=5 5`
    - `table,selector=2 payload=6 6`

When reading parquet file 2, the tombstone MUST NOT be applied. Otherwise `table,selector=1 payload=5 5` will be
deleted.

Notes
=====
Technically this issue also applies to files created by the compactor, however the compactor marks tombstones as
processed that fall into the sequence number range. It even does that in a single transaction:

fc4635a334/compactor/src/compact.rs (L821-L861)

Alternative
===========
An alternative solution would be if the ingester would mark tombstones that it materialized during persistence as
"processed" (tombstone 1 for parquet file 2 in the example above). However "processed" markers are currently a mere
optimization and don't affect correctness, which is nice for caching on the querier side as well as reasoning.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-31 15:09:58 +00:00
Andrew Lamb de6505c801
fix: retry catalog list operation (#4188) 2022-03-31 13:07:00 +00:00
Andrew Lamb a1df864283
feat: Support 'SHOW NAMESPACES' in sql repl (#4164)
* feat: Support `SHOW NAMESPACES` in sql repl

* feat: add basic support to clients

* fix: add get_namespaces service test

* fix: proper error handling

* test: end to end test for namespace client

* refactor: Use QuerierDatabase rather than Catalog

* refactor: remove unused function
2022-03-31 12:57:33 +00:00