Commit Graph

66 Commits (7e5d71902722caaee257455921d4f372e58dc536)

Author SHA1 Message Date
Marco Neumann 83f77712b1
refactor: querier<>ingester flight protocol adjustments (#4286)
* refactor: querier<>ingester flight protocol adjustments

This makes a few adjustments to the querier<>ingester flight protocol.

Query Scope
===========
The querier will request data for ALL sequencer IDs for now. There is
no reason to have a request per sequencer ID. We can add a range/set
filter later if we want, but this is not required for now.

Partition-level
===============
The only time when the querier cares about sequencer IDs (i.e. sharding)
at all is when it selects which ingesters to ask for unpersisted data
(this is currently not implemented, it just asks all ingesters).
Afterwards the querier only cares about partitions (which are bound to
specific sequencers anyways) because this is the level where parquet
file persistence and compaction as well as deduplication happen. So we
make partitions a first-class citizen in the ingester response.

Metadata VS RecordBatches
=========================
The global app-metadata will list all partitions and their max
persisted parquet files and tombstones (theoretically tombstones are at
table-level, but the ingester could in the future break them down to the
partition-level). Then it receives a stream of record batches. Each
record batch is tagged (via key-value metadata in its schema) so it can
be assigned to a partition. At the moment the ingester returns 0 or 1
batches per unpersisted partition (0 in case we've filtered out all the
data via the predicate), but in the future it is free to return multiple
batches. This setup gives the ingester more freedom over memory
management and (potentially parallel) query processing, while at the
same time keeps the set of duplicated information minimal and allows
easy extensions (since the global metadata is a full-blown protobuf
message).

Querier
=======
At the moment the querier ignores all the metdata. Follow-up PRs will
change that.

* docs: improve

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refactor: make code clearer

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-04-12 16:48:40 +00:00
Marco Neumann 380cd9bbff
refactor: use a single flight client implementation (#4273)
"end-user -> querier" and "querier -> ingester" should use a single
Flight client implementation. The difference is just the request and
response metadata.

This changes our default Flight client to use protobuf instead of JSON
for the ticket format.
2022-04-12 09:08:25 +00:00
Andrew Lamb a30a85e62c
feat: Add get_write_info service (#4227)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-07 19:24:58 +00:00
Andrew Lamb 833c10c083
feat: return write_token from HTTP writes to router2 (#4202)
* feat: return write_token from HTTP writes to router2

* fix: Update router2/src/dml_handlers/instrumentation.rs

Co-authored-by: Dom <dom@itsallbroken.com>

* refactor: Use WriteSummary::default more vigorously

* fix: fix typo and add links to follow on issues

Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-02 10:34:51 +00:00
Andrew Lamb a1df864283
feat: Support 'SHOW NAMESPACES' in sql repl (#4164)
* feat: Support `SHOW NAMESPACES` in sql repl

* feat: add basic support to clients

* fix: add get_namespaces service test

* fix: proper error handling

* test: end to end test for namespace client

* refactor: Use QuerierDatabase rather than Catalog

* refactor: remove unused function
2022-03-31 12:57:33 +00:00
Luke Bond b098828c97
feat: schema grpc server & proto in router2 (#4081)
* feat: schema grpc server & proto in router2

* chore: comments in schema proto

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-22 11:27:20 +00:00
Carol (Nichols || Goulding) 73828323ac
feat: Ingester Flight gRPC API (#3623)
* feat: Add a way to run ingester with an in-memory catalog from the CLI

If you set the --catalog-dsn string to "mem", rather than using that as
a Postgres connection URL, create an in-memory catalog.

Planning on using this in tests, so not documenting.

* fix: Set default topic to the same value as SHARED_KAFKA_TOPIC

Namely, both should use an underscore. I don't think there's a way to
directly share these values between a constant and an annotation.

* feat: Add a flight API (handshake only) to ingester

* fix: Create partitions if using file-based write buffer

* fix: Change the server fixture to handle ingester server type

For now, the ingester doesn't implement the deployment API. Not sure if
it should or not.

* feat: Start implementing ingester do_get, namely decoding the query

Skip serialization of the predicate for the moment.

* refactor: Rename ingest protos to ingester to match crate name

* refactor: Rename QueryResults to QueryData

* feat: Move ingester flight client to new querier crate

* fix: Off by one error, different starting indexes in sequencers

* fix: Create new CLI argument to pick the catalog type

* fix: Create a CLI option to set the number of topics to auto-create in the write buffer

* fix: Check the arrow flight service's health to tell that the ingester gRPC is up

* fix: Set postgres as the default catalog type

* fix: Return an error rather than panicking if CLI args aren't right
2022-02-09 19:07:44 +00:00
Carol (Nichols || Goulding) dd9620da0c
feat: Create a new proto definition for the new design's IoxMetadata 2022-01-31 10:36:32 -05:00
Andrew Lamb dd23056efd
chore: update datafusion, arrow, prost, tonic, pbjson, etc (#3455)
* chore: update datafusion, arrow, prost, tonic, etc

* fix: update pprof as well

* chore: update hakari

* fix: update pbjson

* chore: update heappy

* fix: hakari

* fix: workaround https://github.com/influxdata/influxdb_iox/issues/3458

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-13 17:07:15 +00:00
Andrew Lamb 1156a81567
feat: Add pbjson serialization for storage rpc (#3324)
* feat: Add pbjson serialization for storage rpc

* chore: update pbjson-build to 0.1.1

Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
2021-12-07 16:39:16 +00:00
Nga Tran 5f3706e0ee feat: grpc call for copact object store chunks 2021-12-03 18:01:28 -05:00
Raphael Taylor-Davies 1e515a1dec
feat: load RUB from object store (#3224) (#3250) 2021-11-30 14:39:52 +00:00
kodiakhq[bot] 23dffefcf8
Merge branch 'main' into crepererum/remove_routing_from_database_mode_7 2021-11-29 11:38:12 +00:00
Raphael Taylor-Davies 197634ed50
feat: reload chunk back into read buffer (#3209) (#3216)
* feat: reload chunk back into read buffer (#3209)

* chore: fix logical conflict

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-11-29 11:34:55 +00:00
Marco Neumann 4e043ecb55 refactor: remove old routing / sharding config
This is superseded by the new router subsystem.
2021-11-29 12:33:48 +01:00
Edd Robinson 1dedfc6697 refactor: remove storage API from pbjson 2021-11-22 22:30:31 +00:00
Edd Robinson f328f1bae5 chore: move read source 2021-11-22 21:40:42 +00:00
Edd Robinson ddf96efc8a chore: update storage RPC definitions 2021-11-22 21:40:41 +00:00
Nga Tran a5c04e5fe4 feat: framework for compact os chunks 2021-11-17 18:12:51 -05:00
Raphael Taylor-Davies 6320ce6f55
refactor: move delete predicate proto to own package (#2731) (#3065)
* refactor: move delete predicate proto to own package (#2731)

* chore: fmt

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-11-08 13:42:26 +00:00
Raphael Taylor-Davies 7d070f8582
feat: remove influxdata.iox.write.v1.WriteService (#3044)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-11-05 17:51:01 +00:00
Marco Neumann 511621d3ca refactor: use stripped-down shard config for router
This only implements table-based sharding. We can add value-based
sharding later.
2021-11-02 11:12:20 +01:00
Marco Neumann 0d0c0cb42b refactor: move write buffer configs to new home
Write buffer configs will partially be shared by database and router
nodes, so lets move them into a shared home.
2021-11-02 10:17:01 +01:00
Marco Neumann 4c9570b519 refactor: move `catalog` protobuf to `preserved_catalog`
This makes it clearer what's going since the contained messages are
only for the preserved part, not the in-mem catalog and its management.
2021-11-01 18:07:25 +01:00
Marco Neumann 011af2d6ba feat: add `DeploymentService`
Ref #2980.
2021-11-01 17:55:40 +01:00
kodiakhq[bot] 8c3446ac87
Merge branch 'main' into crepererum/issue2980b 2021-11-01 16:39:32 +00:00
Marco Neumann bcd66c555a feat: add `RemoteService`
Ref #2980.
2021-11-01 11:38:46 +01:00
Marco Neumann 83e4514a43 feat: add `DeleteService`
Ref #2980.
2021-10-29 11:57:31 +02:00
Marco Neumann 7db09316f1 refactor: move router into its own service 2021-10-21 10:35:09 +02:00
Marco Neumann c27f096377 feat: rework router config 2021-10-21 10:09:29 +02:00
Marco Neumann 4f78981958 refactor: factor out partition cfg into its own proto file 2021-10-21 10:09:29 +02:00
Marco Neumann 1fc00095f5 refactor: factor out write buffer cfg into its own proto file 2021-10-21 10:09:29 +02:00
Marco Neumann f75bd0771a refactor: sort file list in build script 2021-10-21 10:09:29 +02:00
Carol (Nichols || Goulding) afd6e826e5 feat: Write out server config files listing database name and locations 2021-10-15 09:46:20 -04:00
Raphael Taylor-Davies 074ae40382
feat: migrate entry to use bytes::Bytes (#2842)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-10-14 10:55:51 +00:00
Marco Neumann 63d74be490 refactor: make `ChunkId` a UUID 2021-10-07 10:23:27 +02:00
Marco Neumann b8aa4c33ce refactor: use protobuf bytes for transaction UUIDs 2021-10-05 12:27:48 +02:00
Raphael Taylor-Davies 86cee568d5
feat: use upstream pbjson (#2650)
* feat: use upstream pbjson

* chore: fmt
2021-09-28 16:29:26 +00:00
Marco Neumann 7e804db0a3 fix: use btree map for some protobuf messages for deterministic outputs 2021-09-20 09:33:18 +02:00
Raphael Taylor-Davies 1d55d9a1b5
feat: add pbjson support (#2468)
* feat: add pbjson support

* chore: fix test
2021-09-16 07:33:27 +00:00
Marco Neumann 44eb3b994d feat: `Predicate` serialization
Closes #2493.
2021-09-15 16:37:25 +02:00
Marco Neumann 27248850e5 refactor: use `byte::Bytes` for metadata in protobuf messages
That simplifies printing a bit since we `Vec<u8>` prints quite badly.
2021-09-01 11:26:05 +02:00
Jacob Marble 98d4c9fca1
feat: switch protobuf write service to canonical definition (#2182)
* feat: switch protobuf write service to canonical definition

The protobuf definition used for the proto write endpoint was a WIP. Now
that a canonical definition exists at
https://github.com/influxdata/influxdb-pb-data-protocol/ we can switch
to that.

* chore: lint etc

* chore: fix rustdoc nit in proto definition comment
2021-08-04 00:16:49 +00:00
Raphael Taylor-Davies f1a100c6ae
refactor: remove now unused chunk sort order (#1854)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 16:39:45 +00:00
Jacob Marble 0779b0d9bd
feat: add gRPC listener for new write protocol (#1842)
* feat: add gRPC listener for new write protocol

* chore: clippy happy

* chore: lint

* chore: cargo fmt --all

* chore: cargo clippy

* chore: protobuf-lint

* chore: more formatting

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 16:15:12 +00:00
Marco Neumann 4204127b05 refactor: use protobuf for in-parquet metadata 2021-06-30 16:51:37 +02:00
Marko Mikulicic 0d6d94dc00
feat: Enable gRPC reflection 2021-05-05 22:04:47 +02:00
Marco Neumann ab142efdd3 feat: add protobuf types for preserved catalog
This only includes a minimal type structure to get the transaction layer
going and tested.

See issue #1253.
2021-05-03 09:51:00 +02:00
Marko Mikulicic 569099fc6e feat: Derive serde for protos
Rationale
---------

Our CLI needs to be able to accept configuration as JSON and render configuration as JSON.

Protobufs technically have an official JSON encoding rule called 'jsonpb` but prost doesn't
offer native supprot for it.

`prost` allows us to specify arbitrary derive metadata to be added to generated
code. We emit the `serde` derive directives in the two packages that generate prost code
(`generated_types` and `google_types`).

We use the `serde(rename_all = "camelCase")` to approximate `jsonpb`.

We instruct `prost` to use `bytes::Bytes` for some types, hence we must turn on the `serde` feature
on the `bytes` dependency.

We also use json to serialize the output of the `database get` command, to showcase the feature
and get rid of a TODO. In a subsequent PR I'll teach `database create` (and the yet to be done `database update`) to accept an option JSON configuration body so we can configure partitioning, lifecycle, sharding etc rules etc.

Caveats
-------

This is not technically `jsonpb`. Main issues:
1. default values not omitted
2. no special rendering of special types like `google.protobuf.Any`

Future work
-----------

Figure out if we can get fully compliant `jsonpb`, or at least a decent approximation.

Effect
------

```console
$ cargo run -- database get foobar_weather
{
  "name": "foobar_weather",
  "partitionTemplate": {
    "parts": [
      {
        "part": {
          "time": "%Y-%m-%d %H:00:00"
        }
      }
    ]
  },
  "lifecycleRules": {
    "mutableLingerSeconds": 0,
    "mutableMinimumAgeSeconds": 0,
    "mutableSizeThreshold": 0,
    "bufferSizeSoft": 0,
    "bufferSizeHard": 0,
    "sortOrder": {
      "order": 2,
      "sort": {
        "createdAtTime": {}
      }
    },
    "dropNonPersisted": false,
    "immutable": false
  },
  "walBufferConfig": null,
  "shardConfig": {
    "specificTargets": null,
    "hashRing": null,
    "ignoreErrors": false
  }
}
```
2021-03-30 15:16:31 +00:00
Marko Mikulicic cf51a1a3f1 feat: Add API for ShardConfig 2021-03-22 23:28:00 +00:00