The generated types emit types that depend on prost (through Message
derives), and therefore all users of generated_types already depend on
prost.
It would be wrong for users of the generated_types crate to use a
different version of prost than what is used in generated_types.
By re-exporting prost, users can just depend on generated_types, and
always use the right prost version. prost prost prost. prost.
Define the gossip message types used to disseminate schema changes to
other peers.
Currently there are two types defined: an initial "create" operation,
intended to be followed by "update" operations where appropriate. Both
messages are trivial CRDTs in that they are effectively add-only column
sets (a monotonic type) with other fields required to be immutable (as
they currently are in IOx).
* feat: Make parquet_file.partition_id optional in the catalog
This will acquire a short lock on the table in postgres, per:
<https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads>
This allows us to persist data for new partitions and associate the
Parquet file catalog records with the partition records using only the
partition hash ID, rather than both that are used now.
* fix: Support transition partition ID in the catalog service
* fix: Use transition partition ID in import/export
This commit also removes support for the `--partition-id` flag of the
`influxdb_iox remote store get-table` command, which Andrew approved.
The `--partition-id` filter was getting the results of the catalog gRPC
service's query for Parquet files of a table and then keeping only the
files whose partition IDs matched. The gRPC query is no longer returning
the partition ID from the Parquet file table, and really, this command
should instead be using `GetParquetFilesByPartitionId` to only request
what's needed rather than filtering.
* feat: Support looking up Parquet files by either kind of Partition id
Regardless of which is actually stored on the Parquet file record.
That is, say there's a Partition in the catalog with:
Partition {
id: 3,
hash_id: abcdefg,
}
and a Parquet file that has:
ParquetFile {
partition_hash_id: abcdefg,
}
calling `list_by_partition_not_to_delete(PartitionId(3))` should still
return this Parquet file because it is associated with the partition
that has ID 3.
This is important for the compactor, which is currently only dealing in
PartitionIds, and I'd like to keep it that way for now to avoid having
to change Even More in this PR.
* fix: Use and set new partition ID fields everywhere they want to be
---------
Co-authored-by: Dom <dom@itsallbroken.com>
This commit removes the op-level sequence number from the proto
definition, now reading and writing solely to the per table (and thus
per partition) sequence number map. Tables/partitions within the same
write op are still assigned the same number for now, so there should be
no semantic different
* feat: Allow passing service protection limits in create db gRPC call
* fix: Move the impl into the catalog namespace trait
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Nothing gets the partition ID out of the metadata. The parts of the code
interacting with object storage that need the ID to create the object
store path were using the partition ID from the metadata out of
convenience, but I changed those places to pass in the partition ID in a
separate argument instead.
This will make the transition to deterministic partition IDs a bit
smoother.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Writes are partitioned before being placed in the buffer tree. This
has the effect of splitting up the persistence of a DmlWrite's contents
and thus the persistence of data referred to by write operations placed
into a single WAL entry for a write op.
This change associates the currently assigned sequence number
with every `TableId` in the write, so that persist events for a single
write can be tracked on a per table/partition level.
Making this partial change enables a transition period where changes
can be rolled back and WAL files can still be processed.
A future change will produce a new sequence number per table
ID.
This commit fixes loads of crates (47!) had unused dependencies, or
mis-configured dependencies (test deps as normal deps).
I added the "unused_crate_dependencies" to all crates to help prevent
this mess from growing again!
https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html
This has the minor downside of false-positives when specifying
dev-dependencies for test/bench binaries - these are files in /test or
/benches (not normal tests). This commit includes a workaround,
importing them in lib.rs (gated by a feature flag). I think the
trade-off of better dependency management is worth it!
Introduce a new header called `iox-debug` which when set enables certain
debug features. The first one will be the `system.queries` table which
is a process-local, namespace-scoped query log. In most prod setups this
is only useful for debugging and will confuse the user a lot because
when multiple queries are deployed then the K8s routing decides which
pod/process the users hits. This leads to an inconsistent view. However
the log is still useful for debugging.
This also wires the "debug header set" flag through the Flight ticket,
because JDBC proved (integration tests FTW!) that headers are only
passed to `GetFlightInfo` but not to `DoGet` and the ticket must encode
all the relevant information.
Closes#7119.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
I was going back and forth on this, but the MVP is tags only. If we
expand it to be the more general "columns" in the future, we can change
the proto to reflect the more generalised implementation and have a more
descriptive field name now!
Still insert them into the database and associate them with namespaces,
but don't ever query them back out.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Part of the wider effort to consistently use tht term "database"
for the user-facing terminology, update the authorization system.
Whilst this system is technically user-facing, it is unlikely many
users will see it. It is however new enough that the change is
relatively little effort.
This commit also cleans up the code formatting for the gRPC handler and
simplifies some of the gRPC handler tests for the new update service
limit API.
This adds a message type to encapsulate service protection limits
for a namespace, an RPC to update any single limit and exposes
the limits on a namespace as part of the pre-existing Namespace message.
* feat(authz): add authorization client.
Add a new authz crate to provide the interface for making authorization
checks from within IOx. This includes the default client that uses
the influxdata.iox.authz.v1 gRPC protocol. This feature is not used
by any IOx component yet.
* feat: optional authorization on write path
Support optionally enabling authorization checks on the /api/v2/write
handler. If an authrorizer is configured then the handler will
attempt to retrieve a token from the request's Authorization header.
If no such token exists then a response with a 401 error code is
returned. If the token is not valid, or does not have write permission
for the requested namespace then a response with a 403 error is
returned.
* chore: add unit test for authz in write handler
Add unit tests that test the correct functioning of the /api/v2/write
handler when an Authorizer is configured.
* chore(authz): use lazy connection
Change the initialization of the authz client to use a lazy connection.
This allows the client to be initialised synchronously.
* chore: Run cargo hakari tasks
* fix(authz): protolint complaints
* fix: authz tests
* fix: benches and lint
* chore: Update clap_blocks/src/authz.rs
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
* chore: Update authz/src/lib.rs
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
* chore: Update clap_blocks/src/authz.rs
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
* chore: review suggestions
* chore: review suggestions
Apply a number of suggestions from review comments. The main
behavioural change is that if the authz service is configured
applictions will perform a probe request to ensure it can communicate
before continuing startup.
* chore: Update router/src/server/http.rs
Co-authored-by: Dom <dom@itsallbroken.com>
---------
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
Co-authored-by: Dom <dom@itsallbroken.com>
* chore: Normalise name of Call expression to lowercase
Simplifies matching functions in planner, as they are guaranteed to be
lowercase.
This also ensures compatibility with InfluxQL when generating column
alias names, which are reflected in updated tests.
* chore: Ensure aggregate functions fail gracefully.
* feat: GROUP BY tag support
* feat: Ensure schema-level metadata is propagated
Requires: https://github.com/apache/arrow-rs/issues/3779
* chore: Add some tests to validate GROUP BY output
* chore: Add clarifying comment
* chore: Declare message in flight.proto
The metadata is public API, so best practice is to encode this in a way
that is most compatible for clients in other languages, and will also
document the history of schema changes.
Added tests to validate the metadata is encoded correctly.
* chore: Placate linters
* chore: Use correct column in test cases
* chore: Add `is_projected` to the TagKeyColumn message
`is_projected` is necessary to inform a client whether it should include
the tag key is used exclusively for the group key (false) or also
projected in the `SELECT` column list.
* refactor: Move constants to `schema` crate per PR feedback
* chore: rustfmt 🙄
* chore: Update docs for InfluxQlMetadata
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
---------
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Document that the caller can specify 0 or NULL for an infinite retention
period, and that IOx will respond with NULL.
Document that negative retention periods are rejected.
* feat: introduce a new way of max_sequence_number for ingester, compactor and querier
* chore: cleanup
* feat: new column max_l0_created_at to order files for deduplication
* chore: cleanup
* chore: debug info for chnaging cpu.parquet
* fix: update test parquet file
Co-authored-by: Marco Neumann <marco@crepererum.net>