Commit Graph

580 Commits (be9064c75fc912c93859f180a17afa260c6ac242)

Author SHA1 Message Date
Dom c4a9a46b3e
Merge branch 'main' into dom/gossip-compaction-crate 2023-09-04 14:52:10 +01:00
Dom Dwyer 96d099a295
refactor(proto): SyncMessage -> ConsistencyProbe
Rename "SyncMessage" to "ConsistencyProbe" - it's a clearer name for the
use - the message does no syncing!
2023-09-04 15:12:19 +02:00
Dom Dwyer 044d5bfdcf
refactor(gossip): add Topic::CompactionEvents
This adds a separate compaction event gossip topic.
2023-09-04 13:54:05 +02:00
Dom Dwyer 77bfd57579
docs: typo entire
Fix typo.
2023-09-01 17:20:28 +02:00
Dom Dwyer 855e8bb34c
feat(gossip): define CompactionEvent protobuf
This commit adds the CompactionEvent protobuf type that'll be gossiped
between peers to provide a (compact) description of completed
compactions.
2023-09-01 15:16:34 +02:00
Dom Dwyer 3a2e70157e
refactor(proto): unsigned port numbers
I was trying to remember what the notation for integer types was in
proto - -1 is not a port number...
2023-08-31 15:57:41 +02:00
Fraser Savage 683b513e40
feat(proto): Define GetTables RPC for TableService 2023-08-31 12:35:34 +01:00
kodiakhq[bot] c46b754529
Merge branch 'main' into savage/cli-should-display-partition-template 2023-08-30 15:29:07 +00:00
Dom Dwyer c2d50c0a95
feat(gossip): schema cache convergence topic
Adds a new topic to be used for schema cache convergence messages.
2023-08-30 16:58:44 +02:00
Fraser Savage 1c80c853b4
feat(table): Return PartitionTemplate in table create proto response
If tables can be created with a custom partition template (or the namespace
has a custom partition template) then this should be exposed to the user
interacting with it through the CLI too!
2023-08-30 15:53:54 +01:00
Fraser Savage e602f067ad
feat(namespace): Return namespace custom partition templates to API
This exposes the custom partitionining scheme of a namespace (if any)
in the response to namespace create and list calls.
2023-08-30 15:53:43 +01:00
Dom Dwyer 8751720f1f
feat(gossip): sync message proto
Defines the message one peer will send to another to solicit a
consistency check.
2023-08-30 16:44:26 +02:00
Dom b68d108baf
Merge branch 'main' into dom/gossip-parquet-proto 2023-08-30 10:55:46 +01:00
Dom Dwyer ae3f73f65e
refactor(proto): optional ParquetFile::to_delete
This field is nullable, so lets model it as nullable.
2023-08-29 12:35:44 +02:00
Dom Dwyer bd4a3fbbb8
refactor: impl Hash for NamespaceSchema
Allow the NamespaceSchema to be hashed (including the underlying proto
types it contains).
2023-08-29 12:19:41 +02:00
Dom Dwyer 04051aea4c
refactor(gossip): define NewParquetFiles topic
Separate the parquet file gossiping onto a new topic to allow specific
interest subscriptions and minimise cluster traffic.
2023-08-24 11:23:17 +02:00
Dom Dwyer a1211b0d03
feat(proto): define parquet file gossip type
Specify the gossip message used to notify peers of new parquet files.

This reuses the existing ParquetFile type in the "catalog" proto
package.

This will probably expand in the future to differentiate between new
files (via ingest) and compacted files (which make other files
obsolete).
2023-08-23 13:10:08 +02:00
Dom Dwyer d35bd48f65
refactor(gossip): rename GossipMessage
Now there's a Topic, there's no need for a giant "all message types"
enum.

As part of this shift, the gossip_message::GossipMessage used for schema
gossiping is sounding overly generic. This commit changes the name to
schema_message::SchemaMessage and updates the code.

This is a backwards-compatible change (and if anything goes wrong, the
"old" routers simply log a warning if a message is unreadable).
2023-08-22 12:06:49 +02:00
Nga Tran 3e98f7ea5c
feat: fill sort_key_ids when partition is inserted and updated (#8517)
* feat: read null sort_key_ids

* chore: clearer explanation about test strategy

* chore: Apply suggestions from code review

Co-authored-by: Marco Neumann <marco@crepererum.net>

* test: tests that add partition with NULL sort_key_ids

* feat: set sort_key_ids to empty array {} during partition insertion

* feat: initial step to update sort_key_ids

* chore: address review comments

* chore: remove unecessary comments and tests

* fix: typos

* chore: remove unecessary tests

* feat: continue the work of updating sort_key_ids

* fix: chec duplicates for SortedColumnSet

* test: tests for sort ley ids

* test: fix a test

* chore: remove unused comments

* chore: address first half of review comments and removing tests of tests

* chore: address review commnets for fetching colums in ingester

---------

Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-21 14:26:57 +00:00
Dom Dwyer 736f9987eb
feat(gossip): topic enum
Defines a Topic enum in the common generated_types package (alongside
the proto definitions of the actual gossip payloads).

This Topic enum will be used to identify message types across the gossip
transport.
2023-08-17 14:53:39 +02:00
NGA-TRAN 9bf1c8c11c chore: revert fill sort_key_ids 2023-08-11 11:36:27 -04:00
Nga Tran da92a5c9e1
feat: fill catalog `sort_key_ids` for partitions with coming data (#8462)
* feat: fill catalog sort_key_ids for partition with coming data

* test: sort_key_ids has empty array for newly create partition

* test: name of non-existing column

* chore: add comments to ask Andrew about the code

* chore: make comments clearer

* chore: fix a comment to avoid failure in doc

* chore: add comment for the panic if column name of sort key not found

* fix: during import files the partition has to be created with empty sort key first. Then after its files are created, the partition will be uodated with sort key

* chore: remove no longer needed comments after the bug in build_catalog test is fixed

* chore: address review comments

* refactor: Use ColumnSet type

* chore: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* chore: fix a clippy

---------

Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2023-08-10 18:12:40 +00:00
Dom Dwyer 58269bf463
refactor: re-export prost in generated_types
The generated types emit types that depend on prost (through Message
derives), and therefore all users of generated_types already depend on
prost.

It would be wrong for users of the generated_types crate to use a
different version of prost than what is used in generated_types.

By re-exporting prost, users can just depend on generated_types, and
always use the right prost version. prost prost prost. prost.
2023-08-01 13:22:27 +02:00
Dom Dwyer 081dc03a32
feat(proto): schema gossip message definitions
Define the gossip message types used to disseminate schema changes to
other peers.

Currently there are two types defined: an initial "create" operation,
intended to be followed by "update" operations where appropriate. Both
messages are trivial CRDTs in that they are effectively add-only column
sets (a monotonic type) with other fields required to be immutable (as
they currently are in IOx).
2023-08-01 13:22:26 +02:00
Carol (Nichols || Goulding) 4a9e76b8b7
feat: Make parquet_file.partition_id optional in the catalog (#8339)
* feat: Make parquet_file.partition_id optional in the catalog

This will acquire a short lock on the table in postgres, per:
<https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads>

This allows us to persist data for new partitions and associate the
Parquet file catalog records with the partition records using only the
partition hash ID, rather than both that are used now.

* fix: Support transition partition ID in the catalog service

* fix: Use transition partition ID in import/export

This commit also removes support for the `--partition-id` flag of the
`influxdb_iox remote store get-table` command, which Andrew approved.

The `--partition-id` filter was getting the results of the catalog gRPC
service's query for Parquet files of a table and then keeping only the
files whose partition IDs matched. The gRPC query is no longer returning
the partition ID from the Parquet file table, and really, this command
should instead be using `GetParquetFilesByPartitionId` to only request
what's needed rather than filtering.

* feat: Support looking up Parquet files by either kind of Partition id

Regardless of which is actually stored on the Parquet file record.

That is, say there's a Partition in the catalog with:

Partition {
    id: 3,
    hash_id: abcdefg,
}

and a Parquet file that has:

ParquetFile {
    partition_hash_id: abcdefg,
}

calling `list_by_partition_not_to_delete(PartitionId(3))` should still
return this Parquet file because it is associated with the partition
that has ID 3.

This is important for the compactor, which is currently only dealing in
PartitionIds, and I'd like to keep it that way for now to avoid having
to change Even More in this PR.

* fix: Use and set new partition ID fields everywhere they want to be

---------

Co-authored-by: Dom <dom@itsallbroken.com>
2023-07-31 12:40:56 +00:00
Fraser Savage 30939cfe96
refactor(wal): Remove op-level `sequence_number`, use per table map
This commit removes the op-level sequence number from the proto
definition, now reading and writing solely to the per table (and thus
per partition) sequence number map. Tables/partitions within the same
write op are still assigned the same number for now, so there should be
no semantic different
2023-07-05 14:20:43 +01:00
kodiakhq[bot] e7effc62b5
Merge branch 'main' into savage/sequence-per-partition 2023-06-08 14:28:44 +00:00
Marko Mikulicic d26ad8e079
feat: Allow passing service protection limits in create db gRPC call (#7941)
* feat: Allow passing service protection limits in create db gRPC call

* fix: Move the impl into the catalog namespace trait

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-08 14:28:32 +00:00
Dom Dwyer ee4f633dba
refactor: remove unused replication proto
This was from an earlier design.
2023-06-08 16:04:49 +02:00
Carol (Nichols || Goulding) bf699a8b60
fix: Remove partition ID from the metadata serialized into Parquet files (#7947)
Nothing gets the partition ID out of the metadata. The parts of the code
interacting with object storage that need the ID to create the object
store path were using the partition ID from the metadata out of
convenience, but I changed those places to pass in the partition ID in a
separate argument instead.

This will make the transition to deterministic partition IDs a bit
smoother.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-08 14:03:21 +00:00
Fraser Savage d1031c5ec6
docs(wal): Explicitly call out transitive relation between table and partition in a write
Co-authored-by: Dom <dom@itsallbroken.com>
2023-06-08 10:17:47 +01:00
Fraser Savage 7de98a6f11
refactor(wal): Associate sequence numbers to table ID in `SequencedWalOp`s
Writes are partitioned before being placed in the buffer tree. This
has the effect of splitting up the persistence of a DmlWrite's contents
and thus the persistence of data referred to by write operations placed
into a single WAL entry for a write op.

This change associates the currently assigned sequence number
with every `TableId` in the write, so that persist events for a single
write can be tracked on a per table/partition level.
Making this partial change enables a transition period where changes
can be rolled back and WAL files can still be processed.

A future change will produce a new sequence number per table
ID.
2023-06-06 17:49:09 +01:00
Nga Tran 566869aa30
refactor: replace namespace with database for flight proto (#7910)
* refactor: replace namespace with database for flight proto

* chore: address review comments

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-05 16:53:09 +00:00
Carol (Nichols || Goulding) c2e19b3826
docs: Mention tag column creation in the table creation service description
Co-authored-by: Dom <dom@itsallbroken.com>
2023-05-25 14:02:37 -04:00
Carol (Nichols || Goulding) 32195748a3
feat: Add proto definitions for a table create gRPC API 2023-05-25 10:44:57 -04:00
Carol (Nichols || Goulding) 6f92bccc99
feat: Use protobuf for PartitionTemplate in CreateNamespace gRPC API
The service implementation doesn't use this field yet.
2023-05-24 10:10:34 -04:00
Dom Dwyer 928a4d163e
build: remove unused dependencies from crates
This commit fixes loads of crates (47!) had unused dependencies, or
mis-configured dependencies (test deps as normal deps).

I added the "unused_crate_dependencies" to all crates to help prevent
this mess from growing again!

    https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html

This has the minor downside of false-positives when specifying
dev-dependencies for test/bench binaries - these are files in /test or
/benches (not normal tests). This commit includes a workaround,
importing them in lib.rs (gated by a feature flag). I think the
trade-off of better dependency management is worth it!
2023-05-23 14:55:43 +02:00
Marco Neumann 31b8813760
feat: hide `system.queries` table from prod by default (#7810)
Introduce a new header called `iox-debug` which when set enables certain
debug features. The first one will be the `system.queries` table which
is a process-local, namespace-scoped query log. In most prod setups this
is only useful for debugging and will confuse the user a lot because
when multiple queries are deployed then the K8s routing decides which
pod/process the users hits. This leads to an inconsistent view. However
the log is still useful for debugging.

This also wires the "debug header set" flag through the Flight ticket,
because JDBC proved (integration tests FTW!) that headers are only
passed to `GetFlightInfo` but not to `DoGet` and the ticket must encode
all the relevant information.

Closes #7119.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-22 12:29:24 +00:00
Dom 5fbf2d3d69
Merge branch 'main' into dom/partition-template-rpc 2023-05-17 15:06:17 +01:00
Dom Dwyer 63de1a3bc8
refactor(proto): use "tag" instead of "column"
I was going back and forth on this, but the MVP is tags only. If we
expand it to be the more general "columns" in the future, we can change
the proto to reflect the more generalised implementation and have a more
descriptive field name now!
2023-05-17 14:03:31 +02:00
Martin Hilton c9cd1fdc44
chore: add a go_package option to the authz proto file (#7802)
This is to fix a downstream service that builds a go package from
these definitions.
2023-05-16 16:51:51 +00:00
Dom Dwyer bc33ad1548
feat: PartitionTemplate proto definition
Defines the PartitionTemplate as a re-usable proto type.
2023-05-16 16:54:36 +02:00
Dom Dwyer 1814514c17
refactor: sort proto imports
Sorts the path lines.
2023-05-16 16:31:34 +02:00
Carol (Nichols || Goulding) 14007808bd
fix: Move remaining conversions between data types and proto into data_types
And have data_types depend on generated_types rather than vice versa.
2023-05-12 13:31:04 -04:00
Carol (Nichols || Goulding) 1770d0f4d8
fix: Move ingester-querier gRPC communication to its own crate 2023-05-12 13:28:30 -04:00
Carol (Nichols || Goulding) 4c7f96ead8
fix: Remove unused delete predicate proto conversion code 2023-05-12 11:27:46 -04:00
Carol (Nichols || Goulding) 3d5df5574a
fix: Remove vestiges of shards 2023-05-08 20:24:36 -04:00
Carol (Nichols || Goulding) 7e9a449623
fix: Remove write buffer proto definitions 2023-05-08 20:24:35 -04:00
Carol (Nichols || Goulding) 56916cf942
fix: Rename ingester2 to ingester 2023-05-08 12:03:05 -04:00
Carol (Nichols || Goulding) b0959667d5
fix: Move topic and query pool within iox catalog (#7734)
Still insert them into the database and associate them with namespaces,
but don't ever query them back out.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-04 13:45:56 +00:00