Commit Graph

39 Commits (803122e3b46c3d7206bd7853516289f0cda102ae)

Author SHA1 Message Date
Dom Dwyer 47214ec9a0
fix: prevent panics in partitioning logic
Changes the partitioning logic to be fallible. This prevents an invalid
partition template from causing a panic, previously possible through two
known code paths:

    * TagValue formatter referencing a non-tag column
    * Time formatter using an invalid strftime format string

If either occurs, the write attempt is now aborted and an error returned
to the user with a HTTP 500 status code.

Additionally unexpected partitioner errors now map to a catch-all error
instead of panicking.
2023-06-01 17:44:44 +02:00
Carol (Nichols || Goulding) 9c0faa66f0
feat: Set a table partition template explicitly or from the namespace
And use the table partition template when partitioning writes to that
table.
2023-05-24 10:34:30 -04:00
Dom Dwyer 928a4d163e
build: remove unused dependencies from crates
This commit fixes loads of crates (47!) had unused dependencies, or
mis-configured dependencies (test deps as normal deps).

I added the "unused_crate_dependencies" to all crates to help prevent
this mess from growing again!

    https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html

This has the minor downside of false-positives when specifying
dev-dependencies for test/bench binaries - these are files in /test or
/benches (not normal tests). This commit includes a workaround,
importing them in lib.rs (gated by a feature flag). I think the
trade-off of better dependency management is worth it!
2023-05-23 14:55:43 +02:00
Andrew Lamb 6344fe8c3f
chore: Add rationale for `clippy::future_not_send` (#7822)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-18 16:58:56 +00:00
Carol (Nichols || Goulding) 23c0110b32
feat: Create newtypes for different partition templates
So that the different kinds aren't mixed up. Also extracts the logic
having to do with which template takes precedence onto the
PartitionTemplate type itself.
2023-05-09 14:55:02 +02:00
Carol (Nichols || Goulding) c1a8408572
fix: Consolidate the default partition template; remove --partition-key-pattern CLI option 2023-05-09 14:54:57 +02:00
Carol (Nichols || Goulding) 2aa8713d1d
fix: Remove partition TemplatePart::Table; partitioning is already per-table 2023-05-09 14:54:57 +02:00
Carol (Nichols || Goulding) faae5eb438 chore: Rerun cargo hakari manage-deps 2023-02-27 11:56:15 +01:00
Andrew Lamb f93baf7693
chore: Update DataFusion and `arrow` / `arrow-flight` / `parquet` to `33.0.0` (#7045)
* chore: Update DataFusion and arrow/arrow-flight/parquet to 33.0.0

* fix: Update test output

* fix: update more test output

* fix: Update querier test output

* chore: Run cargo hakari tasks

* test: fix formatting

Fix formatting of batch pretty printing.

* test: fix formatting

Fix formatting of batch pretty printing.

* test: fix formatting for selector tests

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Dom Dwyer <dom@itsallbroken.com>
Co-authored-by: Christopher Wolff <chris.wolff@influxdata.com>
2023-02-22 21:24:20 +00:00
Carol (Nichols || Goulding) f27c33d73e
fix: Simplify hashmap insert now that we have Copy keys 2022-11-18 10:40:40 -05:00
Carol (Nichols || Goulding) 02c3083192
fix: Remove table names from Dml operations 2022-11-18 10:40:38 -05:00
Carol (Nichols || Goulding) f4b8fe7751
fix: Remove now-unused db_name argument to encode_write 2022-11-14 16:46:04 -05:00
Carol (Nichols || Goulding) 4d4cdebbda
fix: Remove database_name field from DatabaseBatch
It was only being used in one error message.
2022-11-14 16:46:03 -05:00
Marco Neumann 746032af0f fix: compatibility after hashbrown upgrade
- Some methods need explicit types
- `hashbrown::HashMap` now takes 32 bytes, not 64
2022-11-11 13:25:39 -05:00
Jake Goulding cc17e5a54b refactor: use a workspace dependency for hashbrown 2022-11-11 13:25:39 -05:00
dependabot[bot] 5024523f00 chore(deps): Bump hashbrown from 0.12.3 to 0.13.1
Bumps [hashbrown](https://github.com/rust-lang/hashbrown) from 0.12.3 to 0.13.1.
- [Release notes](https://github.com/rust-lang/hashbrown/releases)
- [Changelog](https://github.com/rust-lang/hashbrown/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/hashbrown/compare/v0.12.3...v0.13.1)

---
updated-dependencies:
- dependency-name: hashbrown
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-11-11 13:24:56 -05:00
Dom Dwyer 8ebea0df37 feat: table/namespace IDs in write protocol
Expose the Table and Namespace IDs encoded within the serialised DML
write (added in #6036).

This makes the IDs available for use in the consumers, ending the
transition period. This commit DOES NOT remove the strings sent over the
wire.
2022-11-08 16:57:53 +01:00
Andrew Lamb 4fb2843d05
refactor: Rename `schema::selection::Selection` to `schema::projection::Projection` (#6037)
* chore: Rename `schema::selection::Selection` to `schema::projection::Projection`

* fix: docs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-02 18:15:04 +00:00
Dom Dwyer ddd6ab0ba4 refactor(write_buffer): pass IDs in wire format
This commit is part of a two-part change in order to add the table &
namespace IDs to the write buffer wire format. This commit forms the
first half; changing the producer to send the IDs.

In this commit the new ID values are never read on the consumer side,
ensuring there is no consumer dependency on them. This ensures they
remain operational during a rollout, where the consumer may be updated
to the latest code dependent on the IDs before the producer is updated
to send them. This also ensures we have a window of time where where the
consumers can be rolled back after being updated, and still handle
replaying messages in Kafka.
2022-11-02 13:28:56 +01:00
Dom Dwyer 72a358e52f refactor(dml): PartitionKey required for writes
Changes the DmlWrite type to require a PartitionKey be specified,
instead of accepting an Option.

This requirement was already in place - the write buffer upheld an
invariant that all writes contained a partition key value (was not
"None") or it panicked at runtime when attempting to enqueue the write.

It is now possible to encode this invariant in the type system, which is
what this change does.
2022-10-28 10:57:30 +02:00
Carol (Nichols || Goulding) 2e83e04eab
feat: Use workspace package metadata to reduce differences and repetition 2022-10-24 13:04:09 -04:00
Dom Dwyer cd4087e00d style: add no todo!() or dbg!() lints
Some crates had theme, some not - lets be consistent and have the
compiler spot dbg!() and todo!() macro calls - they should never be in
prod code!
2022-09-29 13:10:07 +02:00
Marko Mikulicic 2bbc419f95
fix: Tell which column failed typecheck (#5220) 2022-07-27 08:16:15 +00:00
Dom Dwyer c1f7154031 feat: propagate partition key through kafka
Changes the kafka message wire format to include the partition key for
serialised DML writes on the wire.

After this commit, the kafka messages will contain the partition key for
each op, but this information will go unused in the ingester - this
enables us to roll out the producer side, before making the value's
presence necessary on the consumer side.

A follow-up PR will change the ingester to utilise this embedded
partition key.

This has the unfortunate side effect of making the partition key part of
the public gRPC write API:

    https://github.com/influxdata/influxdb_iox/issues/4866
2022-06-20 13:42:51 +01:00
Dom Dwyer 61182f506b refactor: emit PartitionKey from partitioner
Changes the partitioning code to emit a PartitionKey, instead of a bare
String.
2022-06-15 15:38:02 +01:00
Andrew Lamb dde3c3922c
refactor: use consistent spelling of serialize (#4717) 2022-05-27 14:42:59 +00:00
Dom Dwyer 43300878bc fix(pb): encoding entirely NULL columns (#4272)
This commit changes the protobuf record batch encoding to skip entirely
NULL columns when serialising. This prevents the deserialisation from
erroring due to a column type inference failure.

Prior to this commit, when the system was presented a record batch such
as this:

            | time       | A    | B    |
            | ---------- | ---- | ---- |
            | 1970-01-01 | 1    | NULL |
            | 1970-07-05 | NULL | 1    |

Which would be partitioned by YMD into two separate partitions:

            | time       | A    | B    |
            | ---------- | ---- | ---- |
            | 1970-01-01 | 1    | NULL |

and:

            | time       | A    | B    |
            | ---------- | ---- | ---- |
            | 1970-07-05 | NULL | 1    |

Both partitions would contain an entirely NULL column.

Both of these partitioned record batches would be successfully encoded,
but decoding the partition fails due to the inability to infer a column
type from the serialised format which contains no values, which on the
wire, looks like:

            Column {
                column_name: "B",
                semantic_type: Field,
                values: Some(
                    Values {
                        i64_values: [],
                        f64_values: [],
                        u64_values: [],
                        string_values: [],
                        bool_values: [],
                        bytes_values: [],
                        packed_string_values: None,
                        interned_string_values: None,
                    },
                ),
                null_mask: [
                    1,
                ],
            },

In a column that is not entirely NULL, one of the "Values" fields would
be non-empty, and the decoder would use this to infer the type of the
column.

Because we have chosen to not differentiate between "NULL" and "empty"
in our proto encoding, the decoder cannot infer which field within the
"Values" struct the column belongs to - all are valid, but empty.

This commit prevents this type inference failure by skipping any columns
that are entirely NULL during serialisation, preventing the deserialiser
from having to process columns with ambiguous types.
2022-05-18 13:33:26 +01:00
Dom 32d7c4cbfe
refactor: remove InfluxColumnType::IOx (#3565)
* refactor: remove InfluxColumnType::IOx

Remove unused column variant - see #3554 for context.

* refactor: reserve SEMANTIC_TYPE_IOX name in proto

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 21:15:36 +00:00
Andrew Lamb 2062267d0f
chore: Update hashbrown (#3551)
* chore: Update hashbrown

* fix: hakari

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 15:34:10 +00:00
Marco Neumann f3f6f335a9
chore: upgrade to snafu 0.7 (#3440) 2022-01-11 19:22:36 +00:00
Marco Neumann 405b2029d7 fix: accept PB batches w/ trimmed repeated tail values
Fixes #3128.
2021-12-01 17:58:55 +01:00
Carol (Nichols || Goulding) 9fd4a560f5
feat: Results of running cargo hakari manage-deps 2021-11-19 09:21:57 -05:00
Carol (Nichols || Goulding) a2454b542d
fix: Small cleanups in Cargo.tomls (#3160)
* fix: Add tokio rt-multi-thread feature so cargo test -p client_util compiles

* fix: Alphabetize dependencies

* fix: Add the data_types_conversions feature to get tests passing

* fix: Remove dev dependencies already listed under normal dependencies

* fix: Make sure the workspace is using the new resolver
2021-11-18 22:26:33 +00:00
Raphael Taylor-Davies 6f268f8260
refactor: extract DML types (#2731) (#3084)
* refactor: extract DML types (#2731)

* chore: fmt
2021-11-11 12:34:07 +00:00
Raphael Taylor-Davies 08fcd87337
feat: use protobuf encoding in write buffer (#2724) (#3018) 2021-11-03 15:19:05 +00:00
Raphael Taylor-Davies e444fa4cb2
feat: pbdata encode (#2724) (#3009) 2021-11-02 18:31:53 +00:00
Raphael Taylor-Davies 39a7f3069f
feat: add alternative string encodings to pbdata (#2724) (#3006)
* feat: add alternative string encodings to pbdata (#2724)

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-11-02 18:14:34 +00:00
Marco Neumann bc7244c48e chore: use Rust edition 2021 2021-10-25 10:58:20 +02:00
Raphael Taylor-Davies d2b41c5a13
feat: write pbdata to MutableBatch (#2724) (#2927)
* feat: write pbdata to MutableBatch (#2724)

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-10-21 14:32:35 +00:00