Commit Graph

395 Commits (144778430efac0c3ca20919615c9c3074b9fd87f)

Author SHA1 Message Date
Fraser Savage 30939cfe96
refactor(wal): Remove op-level `sequence_number`, use per table map
This commit removes the op-level sequence number from the proto
definition, now reading and writing solely to the per table (and thus
per partition) sequence number map. Tables/partitions within the same
write op are still assigned the same number for now, so there should be
no semantic different
2023-07-05 14:20:43 +01:00
kodiakhq[bot] e7effc62b5
Merge branch 'main' into savage/sequence-per-partition 2023-06-08 14:28:44 +00:00
Marko Mikulicic d26ad8e079
feat: Allow passing service protection limits in create db gRPC call (#7941)
* feat: Allow passing service protection limits in create db gRPC call

* fix: Move the impl into the catalog namespace trait

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-08 14:28:32 +00:00
Dom Dwyer ee4f633dba
refactor: remove unused replication proto
This was from an earlier design.
2023-06-08 16:04:49 +02:00
Carol (Nichols || Goulding) bf699a8b60
fix: Remove partition ID from the metadata serialized into Parquet files (#7947)
Nothing gets the partition ID out of the metadata. The parts of the code
interacting with object storage that need the ID to create the object
store path were using the partition ID from the metadata out of
convenience, but I changed those places to pass in the partition ID in a
separate argument instead.

This will make the transition to deterministic partition IDs a bit
smoother.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-08 14:03:21 +00:00
Fraser Savage d1031c5ec6
docs(wal): Explicitly call out transitive relation between table and partition in a write
Co-authored-by: Dom <dom@itsallbroken.com>
2023-06-08 10:17:47 +01:00
Fraser Savage 7de98a6f11
refactor(wal): Associate sequence numbers to table ID in `SequencedWalOp`s
Writes are partitioned before being placed in the buffer tree. This
has the effect of splitting up the persistence of a DmlWrite's contents
and thus the persistence of data referred to by write operations placed
into a single WAL entry for a write op.

This change associates the currently assigned sequence number
with every `TableId` in the write, so that persist events for a single
write can be tracked on a per table/partition level.
Making this partial change enables a transition period where changes
can be rolled back and WAL files can still be processed.

A future change will produce a new sequence number per table
ID.
2023-06-06 17:49:09 +01:00
Nga Tran 566869aa30
refactor: replace namespace with database for flight proto (#7910)
* refactor: replace namespace with database for flight proto

* chore: address review comments

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-05 16:53:09 +00:00
Carol (Nichols || Goulding) c2e19b3826
docs: Mention tag column creation in the table creation service description
Co-authored-by: Dom <dom@itsallbroken.com>
2023-05-25 14:02:37 -04:00
Carol (Nichols || Goulding) 32195748a3
feat: Add proto definitions for a table create gRPC API 2023-05-25 10:44:57 -04:00
Carol (Nichols || Goulding) 6f92bccc99
feat: Use protobuf for PartitionTemplate in CreateNamespace gRPC API
The service implementation doesn't use this field yet.
2023-05-24 10:10:34 -04:00
Marco Neumann 31b8813760
feat: hide `system.queries` table from prod by default (#7810)
Introduce a new header called `iox-debug` which when set enables certain
debug features. The first one will be the `system.queries` table which
is a process-local, namespace-scoped query log. In most prod setups this
is only useful for debugging and will confuse the user a lot because
when multiple queries are deployed then the K8s routing decides which
pod/process the users hits. This leads to an inconsistent view. However
the log is still useful for debugging.

This also wires the "debug header set" flag through the Flight ticket,
because JDBC proved (integration tests FTW!) that headers are only
passed to `GetFlightInfo` but not to `DoGet` and the ticket must encode
all the relevant information.

Closes #7119.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-22 12:29:24 +00:00
Dom 5fbf2d3d69
Merge branch 'main' into dom/partition-template-rpc 2023-05-17 15:06:17 +01:00
Dom Dwyer 63de1a3bc8
refactor(proto): use "tag" instead of "column"
I was going back and forth on this, but the MVP is tags only. If we
expand it to be the more general "columns" in the future, we can change
the proto to reflect the more generalised implementation and have a more
descriptive field name now!
2023-05-17 14:03:31 +02:00
Martin Hilton c9cd1fdc44
chore: add a go_package option to the authz proto file (#7802)
This is to fix a downstream service that builds a go package from
these definitions.
2023-05-16 16:51:51 +00:00
Dom Dwyer bc33ad1548
feat: PartitionTemplate proto definition
Defines the PartitionTemplate as a re-usable proto type.
2023-05-16 16:54:36 +02:00
Carol (Nichols || Goulding) 1770d0f4d8
fix: Move ingester-querier gRPC communication to its own crate 2023-05-12 13:28:30 -04:00
Carol (Nichols || Goulding) 7e9a449623
fix: Remove write buffer proto definitions 2023-05-08 20:24:35 -04:00
Carol (Nichols || Goulding) 56916cf942
fix: Rename ingester2 to ingester 2023-05-08 12:03:05 -04:00
Carol (Nichols || Goulding) b0959667d5
fix: Move topic and query pool within iox catalog (#7734)
Still insert them into the database and associate them with namespaces,
but don't ever query them back out.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-04 13:45:56 +00:00
Carol (Nichols || Goulding) 621caab2e9
fix: Remove unused parquet_max_sequence_number metadata 2023-05-03 10:57:27 -04:00
Carol (Nichols || Goulding) 721bb2661e
fix: Remove ShardService that is no longer used 2023-04-26 11:42:32 -04:00
Carol (Nichols || Goulding) 038f8e9ce0
fix: Move shard concepts into only the catalog
This still inserts the shard id into the database, always set to the
TRANSITION_SHARD_ID, but never reads it back out again.
2023-04-26 11:42:32 -04:00
Dom Dwyer 3a8803c43c
docs: remove misleading API comments
These fields are very much in use now!
2023-04-13 16:17:48 +02:00
Carol (Nichols || Goulding) 6387a9576a
fix: Remove the write_summary crate and write info service 2023-04-12 11:31:23 -04:00
Martin Hilton d2585002fe
chore(authz): Change "namespace" to "database" (#7502)
Part of the wider effort to consistently use tht term "database"
for the user-facing terminology, update the authorization system.
Whilst this system is technically user-facing, it is unlikely many
users will see it. It is however new enough that the change is
relatively little effort.
2023-04-11 11:04:51 +00:00
Fraser Savage b53b8c7d76
refactor(namespace): Flatten service protection limits in Namespace proto definition
This commit also cleans up the code formatting for the gRPC handler and
simplifies some of the gRPC handler tests for the new update service
limit API.
2023-04-05 14:46:30 +01:00
Fraser Savage 134967cddb
feat(namespace): Enable update of service protection limits over gRPC
This adds a message type to encapsulate service protection limits
for a namespace, an RPC to update any single limit and exposes
the limits on a namespace as part of the pre-existing Namespace message.
2023-03-31 17:14:19 +01:00
Martin Hilton 13657d5bcc
feat(authz): authorization service client and write integration (#7216)
* feat(authz): add authorization client.

Add a new authz crate to provide the interface for making authorization
checks from within IOx. This includes the default client that uses
the influxdata.iox.authz.v1 gRPC protocol. This feature is not used
by any IOx component yet.

* feat: optional authorization on write path

Support optionally enabling authorization checks on the /api/v2/write
handler. If an authrorizer is configured then the handler will
attempt to retrieve a token from the request's Authorization header.
If no such token exists then a response with a 401 error code is
returned. If the token is not valid, or does not have write permission
for the requested namespace then a response with a 403 error is
returned.

* chore: add unit test for authz in write handler

Add unit tests that test the correct functioning of the /api/v2/write
handler when an Authorizer is configured.

* chore(authz): use lazy connection

Change the initialization of the authz client to use a lazy connection.
This allows the client to be initialised synchronously.

* chore: Run cargo hakari tasks

* fix(authz): protolint complaints

* fix: authz tests

* fix: benches and lint

* chore: Update clap_blocks/src/authz.rs

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>

* chore: Update authz/src/lib.rs

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>

* chore: Update clap_blocks/src/authz.rs

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>

* chore: review suggestions

* chore: review suggestions

Apply a number of suggestions from review comments. The main
behavioural change is that if the authz service is configured
applictions will perform a probe request to ensure it can communicate
before continuing startup.

* chore: Update router/src/server/http.rs

Co-authored-by: Dom <dom@itsallbroken.com>

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
Co-authored-by: Dom <dom@itsallbroken.com>
2023-03-17 15:20:14 +00:00
Stuart Carnie 2b74f07fe5
feat: Support `GROUP BY` with tags in raw `SELECT` queries (#7109)
* chore: Normalise name of Call expression to lowercase

Simplifies matching functions in planner, as they are guaranteed to be
lowercase.

This also ensures compatibility with InfluxQL when generating column
alias names, which are reflected in updated tests.

* chore: Ensure aggregate functions fail gracefully.

* feat: GROUP BY tag support

* feat: Ensure schema-level metadata is propagated

Requires: https://github.com/apache/arrow-rs/issues/3779

* chore: Add some tests to validate GROUP BY output

* chore: Add clarifying comment

* chore: Declare message in flight.proto

The metadata is public API, so best practice is to encode this in a way
that is most compatible for clients in other languages, and will also
document the history of schema changes.

Added tests to validate the metadata is encoded correctly.

* chore: Placate linters

* chore: Use correct column in test cases

* chore: Add `is_projected` to the TagKeyColumn message

`is_projected` is necessary to inform a client whether it should include
the tag key is used exclusively for the group key (false) or also
projected in the `SELECT` column list.

* refactor: Move constants to `schema` crate per PR feedback

* chore: rustfmt 🙄

* chore: Update docs for InfluxQlMetadata

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

---------

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2023-03-07 22:40:23 +00:00
Carol (Nichols || Goulding) 65ba208f88
fix: Remove shard_id from the Parquet File protobuf in the catalog service 2023-02-17 13:53:03 -05:00
Carol (Nichols || Goulding) 20250d883e
fix: Remove shard_id from the catalog service Partition 2023-02-17 12:56:51 -05:00
Dom Dwyer 7ae6dda87c
docs(proto): ingester persist endpoint
This endpoint has some serious usability caveats that should be known by
users of this API!
2023-02-09 14:11:37 +01:00
Dom Dwyer 08cf71e0ac
refactor(proto): move PersistService to new file
Separate the PersistService into it's own file.
2023-02-09 14:04:46 +01:00
Dom Dwyer 52ac1b97a9
docs: namespace retention protobuf mappings
Document that the caller can specify 0 or NULL for an infinite retention
period, and that IOx will respond with NULL.

Document that negative retention periods are rejected.
2023-02-01 14:37:21 +01:00
Nga Tran b8a80869d4
feat: introduce a new way of max_sequence_number for ingester, compactor and querier (#6692)
* feat: introduce a new way of max_sequence_number for ingester, compactor and querier

* chore: cleanup

* feat: new column max_l0_created_at to order files for deduplication

* chore: cleanup

* chore: debug info for chnaging cpu.parquet

* fix: update test parquet file

Co-authored-by: Marco Neumann <marco@crepererum.net>
2023-01-26 10:52:47 +00:00
Carol (Nichols || Goulding) 4658510102
fix: For Ingester2, persist a particular namespace on demand and share MiniClusters
This should hopefully help CI from running out of Postgres
connections 😬

The old architecture will still need to be non-shared and persist
everything.
2023-01-25 10:36:56 -05:00
Andrew Lamb c3bc61f10e
refactor: Move `flightsql` code into its own module, add docs and tests (#6640)
* refactor: Move `flightsql`  code into its own module

* fix: get schema from LogicalPlan

* refactor: use arrow_flight::sql::Any instead of prost_types::any

* fix: cleanup docs and avoid as_ref

* fix: Use Bytes

* fix: use Any::pack

* fix: doclink
2023-01-24 18:24:32 +00:00
Carol (Nichols || Goulding) 3a2544a7eb
feat: Define a new gRPC service for ingester persist 2023-01-12 11:03:12 -05:00
Carol (Nichols || Goulding) adc5c2bf06
feat: Add a gRPC API to the catalog service to get Parquet files by namespace
Tests that write line protocol (that may contain writes to multiple
tables) need to be able to see when new Parquet files are saved.
2023-01-11 11:41:09 -05:00
Paul Dix 828992c9c5
feat: Ingest replica skeleton (#6529)
* feat: Update replication.proto

* Remove the PartitionId in the replicate request as a single replicate request can have the data for many partitions.
* Add namespace_id and table_id to persist complete request to make data easier to lookup in buffer.

* feat: Initial ingest_replica skeleton

A bunch of copy pasta here from ingester2, but this takes out a ton of stuff that isn't used in replicas.
Also lays the groundwork for the simpler buffer structure to keep the data and a basic cache for catalog information that will be required.

* feat: update replication.proto GetPartitionBufferResponse

* chore: PR cleanup

* chore: PR cleanup
2023-01-09 16:53:49 +00:00
Dom Dwyer 91680854ce
feat(replication): define replication RPC API
Defines the rough outline of an replication RPC API. More details/docs
to follow.
2023-01-04 17:37:32 +01:00
Luke Bond 3659be59c7 feat: delete namespace api mem impl
chore: tests for delete namespace; use unique ptn names in tests
2022-12-16 10:23:50 +00:00
Paul Dix d9c72bb93f
feat: optimize wal with batching (#6399)
* feat: optimize wal with batching

Simplified the wal writer so that it batches up write operations. Currently it waits 10ms between fsync calls. We can pull this out to a config variable later if we want, but I think this is good enough for now.

Also updated the reader to be a more simple blocking reader without the extra tasks and channels as that wasn't really getting us anything that I know of.

* chore: cleanup wal code for PR feedback
2022-12-14 16:07:20 +00:00
kodiakhq[bot] 66c610f7b1
Merge branch 'main' into cn/ingester-persisted-file-count 2022-12-14 14:58:31 +00:00
Andrew Lamb 47cd6821e1
feat: Document IOx Flight API and add convenience methods (#6392)
* feat: Document IOx Flight API and add convenience methods

* fix: InfluxQL handling

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-13 17:32:37 +00:00
Carol (Nichols || Goulding) 1c7f322a4e
feat: Keep track of and report number of Parquet files persisted
Per partition and starting over each time the ingester restarts.

Fixes #6334.
2022-12-12 11:45:00 -05:00
Carol (Nichols || Goulding) 2fd2d05ef6
feat: Identify each run of an ingester with a Uuid
And send that UUID in the Flight response for queries to that ingester
run.

Fixes #6333.
2022-12-08 17:22:52 -05:00
Carol (Nichols || Goulding) edd606aa3b
feat: Serialize using protobuf instead of json 2022-11-23 17:07:49 -05:00
Stuart Carnie 2306c383f3
feat: Introduce InfluxQL to Flight (#6166)
* feat: Introduce InfluxQL to Flight

All InfluxQL queries will fail with an error

* chore: Temper protobuf lint

* chore: Finalize flight.proto changes; fix tests

* chore: Add tests for InfluxQL planner

* chore: Update docs

* chore: Update docs

* chore: Rename back to original

* chore: Use .into() rather than cast

* chore: Use function rather than field

* chore: Improved InfluxQL planner name

* chore: Restore `impl Into<String>` argument

* chore: Add a comment that Go clients are unable to execute InfluxQL

* chore: Add a test for the `--lang` argument and InfluxQL
2022-11-23 00:33:49 +00:00