Commit Graph

484 Commits (20f1ae1c8fb5a81dc806b2f327cc14a5394c992c)

Author SHA1 Message Date
Dom 685a0858dc
Merge branch 'main' into dom/rpc-write 2022-11-15 14:38:15 +00:00
Dom Dwyer dcc7b10bcf
feat: define router -> ingester write protocol
Specify a gRPC service and request/response message formats to push
writes directly from a router to an ingester.
2022-11-15 15:00:38 +01:00
Carol (Nichols || Goulding) 8dbdab8754
fix: Remove db_name field from DeletePayload
Doesn't seem to be used anywhere.
2022-11-14 16:46:04 -05:00
Carol (Nichols || Goulding) 4d4cdebbda
fix: Remove database_name field from DatabaseBatch
It was only being used in one error message.
2022-11-14 16:46:03 -05:00
Carol (Nichols || Goulding) bdff4e8848
fix: Consistently use 'namespace' instead of 'database' in comments and other internal text 2022-11-11 15:46:04 -05:00
Carol (Nichols || Goulding) 43687a86d2
fix: Remove lots of needless borrows that Clippy can now identify
Except for in generated code that we don't control.
2022-11-09 10:54:18 -05:00
Carol (Nichols || Goulding) 07505c8f72
fix: Remove needless borrows, thanks Clippy! 2022-11-09 10:54:18 -05:00
Dom d9c97795fc
feat: use IDs in ingester query API (#6093)
* refactor: NS+table ID (instead of name) in querier<>ingester

* feat(ingester): use IDs for query API

Changes the ingester to utilise the ID fields (instead of names) sent
over the query wire message wrapped within the Flight API.

BREAKING: this changes the "query-ingester" CLI command arguments which
now expects the namespace & table IDs, rather than their names.

* refactor(ingester): add more query logging context

Updates the log messages during query execution to include more context
fields.

* style: remove unused import

Co-authored-by: Marco Neumann <marco@crepererum.net>
2022-11-09 11:25:13 +00:00
kodiakhq[bot] df5ec013d1
Merge branch 'main' into dom/dml-delete-namespace-id 2022-11-07 09:07:38 +00:00
Nga Tran 9356f2a1b9
feat: grpc for updating namespace retention period (#6041)
* refactor: make namespace folder for all namesapce's commands

* feat: WIP for add command to set retention period

* feat: more on updating retention period

* feat: grpc for update namespace retention period

* test: end to end test fpr namespace retention

* fix: lint proto

* chore: cleanup

* chore: kick CI run again

* fix: command hierachy

* chore: fix comments
2022-11-04 20:58:11 +00:00
Dom Dwyer 6fa48731aa feat: NamespaceId in DmlDelete
Changes the DmlDelete to contain the NamespaceId for which it should be
applied, propagating this value over the wire.

Like the existing IDs within the DmlWrite, these values are marked
unsafe to use due to avoid the consumers utilising them accidentally
during deployment. Unlike DmlWrite, the DmlDelete is completely unused,
so this is less of an issue.
2022-11-03 13:57:40 +01:00
Dom Dwyer ddd6ab0ba4 refactor(write_buffer): pass IDs in wire format
This commit is part of a two-part change in order to add the table &
namespace IDs to the write buffer wire format. This commit forms the
first half; changing the producer to send the IDs.

In this commit the new ID values are never read on the consumer side,
ensuring there is no consumer dependency on them. This ensures they
remain operational during a rollout, where the consumer may be updated
to the latest code dependent on the IDs before the producer is updated
to send them. This also ensures we have a window of time where where the
consumers can be rolled back after being updated, and still handle
replaying messages in Kafka.
2022-11-02 13:28:56 +01:00
Carol (Nichols || Goulding) ace497d47c
fix: Rename database to namespace in the commands I just added 2022-10-27 10:40:39 -04:00
Carol (Nichols || Goulding) de2ae6f557
feat: MVP of remote store get-table command 2022-10-26 13:50:03 -04:00
Carol (Nichols || Goulding) 44936f661a
feat: Use workspace dep inheritance for datafusion instead of shim crate 2022-10-26 10:33:56 -04:00
Carol (Nichols || Goulding) 2e83e04eab
feat: Use workspace package metadata to reduce differences and repetition 2022-10-24 13:04:09 -04:00
Jake Goulding fa7fe2e9cf feat: Add a gRPC endpoint to delete a skipped compaction
Also add a CLI usage of it for convenience
2022-10-21 15:12:20 -04:00
Carol (Nichols || Goulding) b8a9fe4222
docs: Explain the meanings of skipped compaction's fields 2022-10-21 13:40:38 -04:00
Carol (Nichols || Goulding) 0132a33946
fix: Rename SkippedCompactionService to CompactionService
To make a good place for other compactor-related gRPC actions in the
future.
2022-10-21 13:40:37 -04:00
Carol (Nichols || Goulding) ba25300b01
feat: Create compactor service to list skipped compactions 2022-10-21 13:40:31 -04:00
Andrew Lamb 9134ccd6c3
chore: Update datafusion again (#5855)
* chore: Update datafusion

* chore: Updates for changes in datafusion

* chore: more updates

* fix: update doc example

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-13 19:18:57 +00:00
Dom Dwyer c4f542bbe2 refactor(ingester): remove tombstone support
This commit removes tombstone support from the ingester, and deletes
associated code/helpers/tests. This commit does NOT remove tombstone
support from any other service, but MAY include removing overlapping
test coverage.

This also removes the tombstone support from the Ingester -> Querier RPC
response message.

This has the nice side effect of removing a whole lot of thread spawning
in the ingester tests for the Executor, speeding everything up!
2022-10-11 13:10:04 +02:00
Andrew Lamb 04ae0aee80
refactor: Remove protobuf based write service (#5750)
* refactor: Remove grpc WriteService

* fix: update end to end test

* fix: Update generated_types/protos/influxdata/pbdata/v1/influxdb_pb_data_protocol.proto

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-30 10:55:03 +00:00
dependabot[bot] 1e4f4135a3
chore(deps): Bump pbjson-build from 0.4.0 to 0.5.0 (#5706)
Bumps [pbjson-build](https://github.com/influxdata/pbjson) from 0.4.0 to 0.5.0.
- [Release notes](https://github.com/influxdata/pbjson/releases)
- [Commits](https://github.com/influxdata/pbjson/commits)

---
updated-dependencies:
- dependency-name: pbjson-build
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-21 09:53:30 +00:00
dependabot[bot] 0cc29300ce
chore(deps): Bump pbjson-types from 0.4.0 to 0.5.0 (#5703)
Bumps [pbjson-types](https://github.com/influxdata/pbjson) from 0.4.0 to 0.5.0.
- [Release notes](https://github.com/influxdata/pbjson/releases)
- [Commits](https://github.com/influxdata/pbjson/commits)

---
updated-dependencies:
- dependency-name: pbjson-types
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-21 09:44:26 +00:00
dependabot[bot] 09cb62b75b
chore(deps): Bump pbjson from 0.4.0 to 0.5.0 (#5702)
Bumps [pbjson](https://github.com/influxdata/pbjson) from 0.4.0 to 0.5.0.
- [Release notes](https://github.com/influxdata/pbjson/releases)
- [Commits](https://github.com/influxdata/pbjson/commits)

---
updated-dependencies:
- dependency-name: pbjson
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-21 09:34:47 +00:00
YIXIAO SHI 52ae60bf2e
chore: fix comment typo (#5551)
Co-authored-by: Dom <dom@itsallbroken.com>
2022-09-07 08:49:29 +00:00
Carol (Nichols || Goulding) 1b49ad25f7
refactor: Rename KafkaTopicId to TopicId 2022-08-29 14:27:02 -04:00
Carol (Nichols || Goulding) 74c9529062
fix: Rename KafkaPartition to ShardIndex 2022-08-29 14:07:18 -04:00
Carol (Nichols || Goulding) 240946d8f5
fix: Deprecate proto sequencer_id fields; add shard_id fields 2022-08-29 14:06:44 -04:00
Dom Dwyer c6d4109e07 build: generate gRPC bindings for ShardService
Builds the ShardService proto file in the generated_types package.
2022-08-24 11:39:59 +02:00
Dom Dwyer f11af90c46 refactor(proto): simplify RPC messages & types
Removes the input oneof - a shard caller MUST always provide a
table/namespace, and MAY provide an optional payload (which in the
future will enable sharding using column valuess/etc). As there is
currently no payload-based sharding, this simplifies the RPC message.

Changes the returned types to better reflect the types we use internally
- this should avoid type juggling for both server & client.
2022-08-24 11:39:59 +02:00
Dom Dwyer 57bbe6b216 feat: sharder API definition
This commit adds a gRPC endpoint for callers to map (table, namespace)
tuples to Sequencer IDs, using the logic internal to the router.

Reference:
    https://github.com/influxdata/influxdb_iox/pull/5447#pullrequestreview-1080574538
2022-08-23 13:21:59 +02:00
Carol (Nichols || Goulding) b982bdaf2f
fix: Derive Eq when we derive PartialEq and members can derive Eq
Allow this in generated code that we don't control, though.

Recommended by clippy now. https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq
2022-08-11 15:04:06 -04:00
Andrew Lamb 16ddc5efc6
chore: Update datafusion / arrow/parquet/arrow-flight and prost/tonic ecosystem (#5360)
* chore: Update datafusion and arrow

* chore: Update Cargo.lock

* chore: update to Decimal128

* chore: Update tonic/prost/pbjson/etc

* chore: Run cargo hakari tasks

* fix: doctest in generated types

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-08-09 17:30:44 +00:00
Marko Mikulicic 5a0af921c8
chore: Roll forward: Sync ReadWindowAggregate API: TagKeyMetaNames (#5186)
This reverts commit 5d02c755687ef041f5f45dbfc3e633a833284edb.
2022-07-22 10:44:06 +00:00
Marko Mikulicic 07cdb99192
chore: Revert "Sync ReadWindowAggregate API: TagKeyMetaNames" (#5184)
We're noticing a possible regression (OOMs) in our testing cluster that roughly correlates with this.
2022-07-22 09:26:42 +00:00
Nga Tran 69cb3f2b19
refactor: remove min_sequence_number from Compactor and Querier, add `count_by_overlaps_with_level_0` and `count_by_overlaps_with_level_1` to catalog (#5151)
* refactor: remove min_sequnce_number

* fix: typos

* fix: remove min_sequencer_number from new files from merging main

* fix: add back throwing error if the compactor compacts files persisted by the ingester after the ingester sends max seq_num back to querier

* test: add test_compactor_collision back but modify the input to make it work woth new changes

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-21 13:51:54 +00:00
Marko Mikulicic 21d033eafd fix: Sync ReadWindowAggregate API: TagKeyMetaNames
The storage API has been updated in https://github.com/influxdata/idpe/pull/12868
in January, but since we forked the `.proto` files we never noticed.
2022-07-21 15:07:04 +02:00
dependabot[bot] 278a7f91af
chore(deps): Bump bytes from 1.1.0 to 1.2.0 (#5156)
Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.1.0 to 1.2.0.
- [Release notes](https://github.com/tokio-rs/bytes/releases)
- [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md)
- [Commits](https://github.com/tokio-rs/bytes/compare/v1.1.0...v1.2.0)

---
updated-dependencies:
- dependency-name: bytes
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-20 10:00:08 +00:00
Marco Neumann 1993448abf
refactor: remove `Predicat::partition_key` (#5016)
There is no way a user can filter for partition keys (neither via
InfluxRPC nor via SQL) and the query engine doesn't use this field at
all. So let's remove it.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-01 17:17:29 +00:00
Marco Neumann be53716e4d
refactor: use IDs for `parquet_file.column_set` (#4965)
* feat: `ColumnRepo::list_by_table_id`

* refactor: use IDs for `parquet_file.column_set`

Closes #4959.

* refactor: introduce `TableSchema::column_id_map`
2022-06-30 15:08:41 +00:00
Dom Dwyer 75c425f375 refactor(schema-api): column data type enum
Previously the column data type was exposed using an internal i32 value.
This commit changes the Schema API to use a self-descriptive proto enum
for the column data type.
2022-06-27 16:14:49 +01:00
Marco Neumann c3912e34e9
refactor: store per-file column set in catalog (#4908)
* refactor: store per-file column set in catalog

Together with the table-wide schema and the partition-wide sort key, this should
be everything we need to read a parquet file directly into memory
without peeking any file-level metadata.

The querier will use this to directly load parquet files into the read
buffer.

**WARNING: This requires a catalog wipe!**

Ref #4124.

* refactor: use proper `ColumnSet` type
2022-06-21 10:26:12 +00:00
Dom Dwyer c1f7154031 feat: propagate partition key through kafka
Changes the kafka message wire format to include the partition key for
serialised DML writes on the wire.

After this commit, the kafka messages will contain the partition key for
each op, but this information will go unused in the ingester - this
enables us to roll out the producer side, before making the value's
presence necessary on the consumer side.

A follow-up PR will change the ingester to utilise this embedded
partition key.

This has the unfortunate side effect of making the partition key part of
the public gRPC write API:

    https://github.com/influxdata/influxdb_iox/issues/4866
2022-06-20 13:42:51 +01:00
Marco Neumann 66c7d95312
refactor: use new ingester<>querier wire protocol (#4867)
* refactor: use new ingester<>querier wire protocol

Use and document the new and more flexible ingester<>querier wire
protocol.

Note that the ingester does NOT stream the response data yet, but the
internal data structures would allow that. A follow-up change will
adjust the ingester code to stream the data.

Ref #4849.

* fix: typos

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refactor: clarify naming and public interface

* test: add schema assertion to `ingester_response_to_record_batches`

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-06-16 08:02:28 +00:00
Marco Neumann 7c60edd38c
refactor: prepare new ingester<>querier protocol on the querier side (#4863)
* refactor: prepare new ingester<>querier protocol on the querier side

This changes the querier internals to work with the new protocol. The
wire protocol stays the same (for now). There's a (somewhat hackish)
adapter in place on the querier side that converts the old to the new
protocol on-the-fly. This is an intermediate step before we actually
change the wire protocol (and in a step after that also take advantage
of the new possibilites on the ingester side).

Ref #4849.

* docs: explain adapter
2022-06-15 14:32:24 +00:00
Nga Tran 13c57d524a
feat: Change data type of catalog partition's sort_key from a string to an array of string (#4801)
* feat: Change data type of catalog Postgres partition's sort_key from a string to an array of string

* test: add column with comma

* fix: use new protonuf field to avoid incompactible

* fix: ensure sort_key is an empty array rather than NULL

* refactor: address review comments

* refactor: address more comments

* chore: clearer comments

* chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql

* chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql

* fix: Rename migration so it will be applied after

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
2022-06-10 13:31:31 +00:00
Andrew Lamb 2ec7764fdd
refactor: rename builder like predicate methods to be `with_` (#4808)
* refactor: rename builder like predicate methods to be `with_`

* fix: merge conflict

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-09 11:26:03 +00:00
Andrew Lamb afc1c12062
refactor: consolidate `PredicateBuilder` into `Predicate` (#4799)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-08 12:21:24 +00:00