Changes the partitioning logic to be fallible. This prevents an invalid
partition template from causing a panic, previously possible through two
known code paths:
* TagValue formatter referencing a non-tag column
* Time formatter using an invalid strftime format string
If either occurs, the write attempt is now aborted and an error returned
to the user with a HTTP 500 status code.
Additionally unexpected partitioner errors now map to a catch-all error
instead of panicking.
This commit fixes loads of crates (47!) had unused dependencies, or
mis-configured dependencies (test deps as normal deps).
I added the "unused_crate_dependencies" to all crates to help prevent
this mess from growing again!
https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html
This has the minor downside of false-positives when specifying
dev-dependencies for test/bench binaries - these are files in /test or
/benches (not normal tests). This commit includes a workaround,
importing them in lib.rs (gated by a feature flag). I think the
trade-off of better dependency management is worth it!
So that the different kinds aren't mixed up. Also extracts the logic
having to do with which template takes precedence onto the
PartitionTemplate type itself.
Expose the Table and Namespace IDs encoded within the serialised DML
write (added in #6036).
This makes the IDs available for use in the consumers, ending the
transition period. This commit DOES NOT remove the strings sent over the
wire.
This commit is part of a two-part change in order to add the table &
namespace IDs to the write buffer wire format. This commit forms the
first half; changing the producer to send the IDs.
In this commit the new ID values are never read on the consumer side,
ensuring there is no consumer dependency on them. This ensures they
remain operational during a rollout, where the consumer may be updated
to the latest code dependent on the IDs before the producer is updated
to send them. This also ensures we have a window of time where where the
consumers can be rolled back after being updated, and still handle
replaying messages in Kafka.
Changes the DmlWrite type to require a PartitionKey be specified,
instead of accepting an Option.
This requirement was already in place - the write buffer upheld an
invariant that all writes contained a partition key value (was not
"None") or it panicked at runtime when attempting to enqueue the write.
It is now possible to encode this invariant in the type system, which is
what this change does.
Changes the kafka message wire format to include the partition key for
serialised DML writes on the wire.
After this commit, the kafka messages will contain the partition key for
each op, but this information will go unused in the ingester - this
enables us to roll out the producer side, before making the value's
presence necessary on the consumer side.
A follow-up PR will change the ingester to utilise this embedded
partition key.
This has the unfortunate side effect of making the partition key part of
the public gRPC write API:
https://github.com/influxdata/influxdb_iox/issues/4866
This commit changes the protobuf record batch encoding to skip entirely
NULL columns when serialising. This prevents the deserialisation from
erroring due to a column type inference failure.
Prior to this commit, when the system was presented a record batch such
as this:
| time | A | B |
| ---------- | ---- | ---- |
| 1970-01-01 | 1 | NULL |
| 1970-07-05 | NULL | 1 |
Which would be partitioned by YMD into two separate partitions:
| time | A | B |
| ---------- | ---- | ---- |
| 1970-01-01 | 1 | NULL |
and:
| time | A | B |
| ---------- | ---- | ---- |
| 1970-07-05 | NULL | 1 |
Both partitions would contain an entirely NULL column.
Both of these partitioned record batches would be successfully encoded,
but decoding the partition fails due to the inability to infer a column
type from the serialised format which contains no values, which on the
wire, looks like:
Column {
column_name: "B",
semantic_type: Field,
values: Some(
Values {
i64_values: [],
f64_values: [],
u64_values: [],
string_values: [],
bool_values: [],
bytes_values: [],
packed_string_values: None,
interned_string_values: None,
},
),
null_mask: [
1,
],
},
In a column that is not entirely NULL, one of the "Values" fields would
be non-empty, and the decoder would use this to infer the type of the
column.
Because we have chosen to not differentiate between "NULL" and "empty"
in our proto encoding, the decoder cannot infer which field within the
"Values" struct the column belongs to - all are valid, but empty.
This commit prevents this type inference failure by skipping any columns
that are entirely NULL during serialisation, preventing the deserialiser
from having to process columns with ambiguous types.
* refactor: remove InfluxColumnType::IOx
Remove unused column variant - see #3554 for context.
* refactor: reserve SEMANTIC_TYPE_IOX name in proto
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* fix: Add tokio rt-multi-thread feature so cargo test -p client_util compiles
* fix: Alphabetize dependencies
* fix: Add the data_types_conversions feature to get tests passing
* fix: Remove dev dependencies already listed under normal dependencies
* fix: Make sure the workspace is using the new resolver