Commit Graph

558 Commits (6e13ff8cb8ae5644d65add024b10785ec2dfa1b4)

Author SHA1 Message Date
Dom Dwyer 58269bf463
refactor: re-export prost in generated_types
The generated types emit types that depend on prost (through Message
derives), and therefore all users of generated_types already depend on
prost.

It would be wrong for users of the generated_types crate to use a
different version of prost than what is used in generated_types.

By re-exporting prost, users can just depend on generated_types, and
always use the right prost version. prost prost prost. prost.
2023-08-01 13:22:27 +02:00
Dom Dwyer 081dc03a32
feat(proto): schema gossip message definitions
Define the gossip message types used to disseminate schema changes to
other peers.

Currently there are two types defined: an initial "create" operation,
intended to be followed by "update" operations where appropriate. Both
messages are trivial CRDTs in that they are effectively add-only column
sets (a monotonic type) with other fields required to be immutable (as
they currently are in IOx).
2023-08-01 13:22:26 +02:00
Carol (Nichols || Goulding) 4a9e76b8b7
feat: Make parquet_file.partition_id optional in the catalog (#8339)
* feat: Make parquet_file.partition_id optional in the catalog

This will acquire a short lock on the table in postgres, per:
<https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads>

This allows us to persist data for new partitions and associate the
Parquet file catalog records with the partition records using only the
partition hash ID, rather than both that are used now.

* fix: Support transition partition ID in the catalog service

* fix: Use transition partition ID in import/export

This commit also removes support for the `--partition-id` flag of the
`influxdb_iox remote store get-table` command, which Andrew approved.

The `--partition-id` filter was getting the results of the catalog gRPC
service's query for Parquet files of a table and then keeping only the
files whose partition IDs matched. The gRPC query is no longer returning
the partition ID from the Parquet file table, and really, this command
should instead be using `GetParquetFilesByPartitionId` to only request
what's needed rather than filtering.

* feat: Support looking up Parquet files by either kind of Partition id

Regardless of which is actually stored on the Parquet file record.

That is, say there's a Partition in the catalog with:

Partition {
    id: 3,
    hash_id: abcdefg,
}

and a Parquet file that has:

ParquetFile {
    partition_hash_id: abcdefg,
}

calling `list_by_partition_not_to_delete(PartitionId(3))` should still
return this Parquet file because it is associated with the partition
that has ID 3.

This is important for the compactor, which is currently only dealing in
PartitionIds, and I'd like to keep it that way for now to avoid having
to change Even More in this PR.

* fix: Use and set new partition ID fields everywhere they want to be

---------

Co-authored-by: Dom <dom@itsallbroken.com>
2023-07-31 12:40:56 +00:00
Fraser Savage 30939cfe96
refactor(wal): Remove op-level `sequence_number`, use per table map
This commit removes the op-level sequence number from the proto
definition, now reading and writing solely to the per table (and thus
per partition) sequence number map. Tables/partitions within the same
write op are still assigned the same number for now, so there should be
no semantic different
2023-07-05 14:20:43 +01:00
kodiakhq[bot] e7effc62b5
Merge branch 'main' into savage/sequence-per-partition 2023-06-08 14:28:44 +00:00
Marko Mikulicic d26ad8e079
feat: Allow passing service protection limits in create db gRPC call (#7941)
* feat: Allow passing service protection limits in create db gRPC call

* fix: Move the impl into the catalog namespace trait

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-08 14:28:32 +00:00
Dom Dwyer ee4f633dba
refactor: remove unused replication proto
This was from an earlier design.
2023-06-08 16:04:49 +02:00
Carol (Nichols || Goulding) bf699a8b60
fix: Remove partition ID from the metadata serialized into Parquet files (#7947)
Nothing gets the partition ID out of the metadata. The parts of the code
interacting with object storage that need the ID to create the object
store path were using the partition ID from the metadata out of
convenience, but I changed those places to pass in the partition ID in a
separate argument instead.

This will make the transition to deterministic partition IDs a bit
smoother.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-08 14:03:21 +00:00
Fraser Savage d1031c5ec6
docs(wal): Explicitly call out transitive relation between table and partition in a write
Co-authored-by: Dom <dom@itsallbroken.com>
2023-06-08 10:17:47 +01:00
Fraser Savage 7de98a6f11
refactor(wal): Associate sequence numbers to table ID in `SequencedWalOp`s
Writes are partitioned before being placed in the buffer tree. This
has the effect of splitting up the persistence of a DmlWrite's contents
and thus the persistence of data referred to by write operations placed
into a single WAL entry for a write op.

This change associates the currently assigned sequence number
with every `TableId` in the write, so that persist events for a single
write can be tracked on a per table/partition level.
Making this partial change enables a transition period where changes
can be rolled back and WAL files can still be processed.

A future change will produce a new sequence number per table
ID.
2023-06-06 17:49:09 +01:00
Nga Tran 566869aa30
refactor: replace namespace with database for flight proto (#7910)
* refactor: replace namespace with database for flight proto

* chore: address review comments

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-05 16:53:09 +00:00
Carol (Nichols || Goulding) c2e19b3826
docs: Mention tag column creation in the table creation service description
Co-authored-by: Dom <dom@itsallbroken.com>
2023-05-25 14:02:37 -04:00
Carol (Nichols || Goulding) 32195748a3
feat: Add proto definitions for a table create gRPC API 2023-05-25 10:44:57 -04:00
Carol (Nichols || Goulding) 6f92bccc99
feat: Use protobuf for PartitionTemplate in CreateNamespace gRPC API
The service implementation doesn't use this field yet.
2023-05-24 10:10:34 -04:00
Dom Dwyer 928a4d163e
build: remove unused dependencies from crates
This commit fixes loads of crates (47!) had unused dependencies, or
mis-configured dependencies (test deps as normal deps).

I added the "unused_crate_dependencies" to all crates to help prevent
this mess from growing again!

    https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html

This has the minor downside of false-positives when specifying
dev-dependencies for test/bench binaries - these are files in /test or
/benches (not normal tests). This commit includes a workaround,
importing them in lib.rs (gated by a feature flag). I think the
trade-off of better dependency management is worth it!
2023-05-23 14:55:43 +02:00
Marco Neumann 31b8813760
feat: hide `system.queries` table from prod by default (#7810)
Introduce a new header called `iox-debug` which when set enables certain
debug features. The first one will be the `system.queries` table which
is a process-local, namespace-scoped query log. In most prod setups this
is only useful for debugging and will confuse the user a lot because
when multiple queries are deployed then the K8s routing decides which
pod/process the users hits. This leads to an inconsistent view. However
the log is still useful for debugging.

This also wires the "debug header set" flag through the Flight ticket,
because JDBC proved (integration tests FTW!) that headers are only
passed to `GetFlightInfo` but not to `DoGet` and the ticket must encode
all the relevant information.

Closes #7119.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-22 12:29:24 +00:00
Dom 5fbf2d3d69
Merge branch 'main' into dom/partition-template-rpc 2023-05-17 15:06:17 +01:00
Dom Dwyer 63de1a3bc8
refactor(proto): use "tag" instead of "column"
I was going back and forth on this, but the MVP is tags only. If we
expand it to be the more general "columns" in the future, we can change
the proto to reflect the more generalised implementation and have a more
descriptive field name now!
2023-05-17 14:03:31 +02:00
Martin Hilton c9cd1fdc44
chore: add a go_package option to the authz proto file (#7802)
This is to fix a downstream service that builds a go package from
these definitions.
2023-05-16 16:51:51 +00:00
Dom Dwyer bc33ad1548
feat: PartitionTemplate proto definition
Defines the PartitionTemplate as a re-usable proto type.
2023-05-16 16:54:36 +02:00
Dom Dwyer 1814514c17
refactor: sort proto imports
Sorts the path lines.
2023-05-16 16:31:34 +02:00
Carol (Nichols || Goulding) 14007808bd
fix: Move remaining conversions between data types and proto into data_types
And have data_types depend on generated_types rather than vice versa.
2023-05-12 13:31:04 -04:00
Carol (Nichols || Goulding) 1770d0f4d8
fix: Move ingester-querier gRPC communication to its own crate 2023-05-12 13:28:30 -04:00
Carol (Nichols || Goulding) 4c7f96ead8
fix: Remove unused delete predicate proto conversion code 2023-05-12 11:27:46 -04:00
Carol (Nichols || Goulding) 3d5df5574a
fix: Remove vestiges of shards 2023-05-08 20:24:36 -04:00
Carol (Nichols || Goulding) 7e9a449623
fix: Remove write buffer proto definitions 2023-05-08 20:24:35 -04:00
Carol (Nichols || Goulding) 56916cf942
fix: Rename ingester2 to ingester 2023-05-08 12:03:05 -04:00
Carol (Nichols || Goulding) b0959667d5
fix: Move topic and query pool within iox catalog (#7734)
Still insert them into the database and associate them with namespaces,
but don't ever query them back out.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-04 13:45:56 +00:00
Carol (Nichols || Goulding) 621caab2e9
fix: Remove unused parquet_max_sequence_number metadata 2023-05-03 10:57:27 -04:00
Carol (Nichols || Goulding) 721bb2661e
fix: Remove ShardService that is no longer used 2023-04-26 11:42:32 -04:00
Carol (Nichols || Goulding) 038f8e9ce0
fix: Move shard concepts into only the catalog
This still inserts the shard id into the database, always set to the
TRANSITION_SHARD_ID, but never reads it back out again.
2023-04-26 11:42:32 -04:00
Dom Dwyer 3a8803c43c
docs: remove misleading API comments
These fields are very much in use now!
2023-04-13 16:17:48 +02:00
kodiakhq[bot] 53ddca45d8
Merge branch 'main' into cn/remove-write-summary 2023-04-12 16:07:35 +00:00
Andrew Lamb 20e9c91866
refactor: Use workspace dependencies for `tonic`, `tonic-build`, etc (#7515)
* refactor: Use workspace dependencies for `tonic`, `tonic-build`, etc

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-12 16:07:19 +00:00
Carol (Nichols || Goulding) 6387a9576a
fix: Remove the write_summary crate and write info service 2023-04-12 11:31:23 -04:00
Martin Hilton d2585002fe
chore(authz): Change "namespace" to "database" (#7502)
Part of the wider effort to consistently use tht term "database"
for the user-facing terminology, update the authorization system.
Whilst this system is technically user-facing, it is unlikely many
users will see it. It is however new enough that the change is
relatively little effort.
2023-04-11 11:04:51 +00:00
Fraser Savage b53b8c7d76
refactor(namespace): Flatten service protection limits in Namespace proto definition
This commit also cleans up the code formatting for the gRPC handler and
simplifies some of the gRPC handler tests for the new update service
limit API.
2023-04-05 14:46:30 +01:00
Fraser Savage 134967cddb
feat(namespace): Enable update of service protection limits over gRPC
This adds a message type to encapsulate service protection limits
for a namespace, an RPC to update any single limit and exposes
the limits on a namespace as part of the pre-existing Namespace message.
2023-03-31 17:14:19 +01:00
Martin Hilton 13657d5bcc
feat(authz): authorization service client and write integration (#7216)
* feat(authz): add authorization client.

Add a new authz crate to provide the interface for making authorization
checks from within IOx. This includes the default client that uses
the influxdata.iox.authz.v1 gRPC protocol. This feature is not used
by any IOx component yet.

* feat: optional authorization on write path

Support optionally enabling authorization checks on the /api/v2/write
handler. If an authrorizer is configured then the handler will
attempt to retrieve a token from the request's Authorization header.
If no such token exists then a response with a 401 error code is
returned. If the token is not valid, or does not have write permission
for the requested namespace then a response with a 403 error is
returned.

* chore: add unit test for authz in write handler

Add unit tests that test the correct functioning of the /api/v2/write
handler when an Authorizer is configured.

* chore(authz): use lazy connection

Change the initialization of the authz client to use a lazy connection.
This allows the client to be initialised synchronously.

* chore: Run cargo hakari tasks

* fix(authz): protolint complaints

* fix: authz tests

* fix: benches and lint

* chore: Update clap_blocks/src/authz.rs

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>

* chore: Update authz/src/lib.rs

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>

* chore: Update clap_blocks/src/authz.rs

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>

* chore: review suggestions

* chore: review suggestions

Apply a number of suggestions from review comments. The main
behavioural change is that if the authz service is configured
applictions will perform a probe request to ensure it can communicate
before continuing startup.

* chore: Update router/src/server/http.rs

Co-authored-by: Dom <dom@itsallbroken.com>

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
Co-authored-by: Dom <dom@itsallbroken.com>
2023-03-17 15:20:14 +00:00
Stuart Carnie 2b74f07fe5
feat: Support `GROUP BY` with tags in raw `SELECT` queries (#7109)
* chore: Normalise name of Call expression to lowercase

Simplifies matching functions in planner, as they are guaranteed to be
lowercase.

This also ensures compatibility with InfluxQL when generating column
alias names, which are reflected in updated tests.

* chore: Ensure aggregate functions fail gracefully.

* feat: GROUP BY tag support

* feat: Ensure schema-level metadata is propagated

Requires: https://github.com/apache/arrow-rs/issues/3779

* chore: Add some tests to validate GROUP BY output

* chore: Add clarifying comment

* chore: Declare message in flight.proto

The metadata is public API, so best practice is to encode this in a way
that is most compatible for clients in other languages, and will also
document the history of schema changes.

Added tests to validate the metadata is encoded correctly.

* chore: Placate linters

* chore: Use correct column in test cases

* chore: Add `is_projected` to the TagKeyColumn message

`is_projected` is necessary to inform a client whether it should include
the tag key is used exclusively for the group key (false) or also
projected in the `SELECT` column list.

* refactor: Move constants to `schema` crate per PR feedback

* chore: rustfmt 🙄

* chore: Update docs for InfluxQlMetadata

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

---------

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2023-03-07 22:40:23 +00:00
Carol (Nichols || Goulding) faae5eb438 chore: Rerun cargo hakari manage-deps 2023-02-27 11:56:15 +01:00
Carol (Nichols || Goulding) 65ba208f88
fix: Remove shard_id from the Parquet File protobuf in the catalog service 2023-02-17 13:53:03 -05:00
Carol (Nichols || Goulding) 20250d883e
fix: Remove shard_id from the catalog service Partition 2023-02-17 12:56:51 -05:00
Dom Dwyer 7ae6dda87c
docs(proto): ingester persist endpoint
This endpoint has some serious usability caveats that should be known by
users of this API!
2023-02-09 14:11:37 +01:00
Dom Dwyer 08cf71e0ac
refactor(proto): move PersistService to new file
Separate the PersistService into it's own file.
2023-02-09 14:04:46 +01:00
Carol (Nichols || Goulding) 30fea67701
fix: Move variables within format strings. Thanks clippy!
Changes made automatically using `cargo clippy --fix`.
2023-02-03 13:06:17 -05:00
Dom Dwyer 52ac1b97a9
docs: namespace retention protobuf mappings
Document that the caller can specify 0 or NULL for an infinite retention
period, and that IOx will respond with NULL.

Document that negative retention periods are rejected.
2023-02-01 14:37:21 +01:00
dependabot[bot] d0e6b16450
chore(deps): Bump bytes from 1.3.0 to 1.4.0
Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.3.0 to 1.4.0.
- [Release notes](https://github.com/tokio-rs/bytes/releases)
- [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md)
- [Commits](https://github.com/tokio-rs/bytes/compare/v1.3.0...v1.4.0)

---
updated-dependencies:
- dependency-name: bytes
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-02-01 00:30:56 +00:00
Nga Tran b8a80869d4
feat: introduce a new way of max_sequence_number for ingester, compactor and querier (#6692)
* feat: introduce a new way of max_sequence_number for ingester, compactor and querier

* chore: cleanup

* feat: new column max_l0_created_at to order files for deduplication

* chore: cleanup

* chore: debug info for chnaging cpu.parquet

* fix: update test parquet file

Co-authored-by: Marco Neumann <marco@crepererum.net>
2023-01-26 10:52:47 +00:00
Carol (Nichols || Goulding) 4658510102
fix: For Ingester2, persist a particular namespace on demand and share MiniClusters
This should hopefully help CI from running out of Postgres
connections 😬

The old architecture will still need to be non-shared and persist
everything.
2023-01-25 10:36:56 -05:00