Commit Graph

13273 Commits (3a2d41df478817ae05e386ad10212317ece47405)

Author SHA1 Message Date
Chunchun Ye 3a2d41df47
chore(doc): add doc for `namespace create` and `table create` with partition template examples (#8385)
chore(doc): fix typos and format

chore(doc): fix format

chore(doc): rename file

chore(doc): add new file to doc README

chore(doc): format

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-02 14:36:41 +00:00
Marco Neumann 3fb173ec56
refactor: use errors instead of panic (#8389)
* refactor: use errors instead of panic

* fix: typo
2023-08-02 13:58:55 +00:00
kodiakhq[bot] 4a14e6041e
Merge pull request #8392 from influxdata/savage/benchmark-router-namespace-schema-cache
perf(router): Add benchmark for additions to namespace schema cache
2023-08-02 12:50:46 +00:00
kodiakhq[bot] 101e1eee52
Merge branch 'main' into savage/benchmark-router-namespace-schema-cache 2023-08-02 12:45:55 +00:00
Fraser Savage ff207ec158
fix(router): Use BatchSize::NumIterations(1) for namespace schema cache benchmark
Batches share the same set-up step between iterations, so using a batch
size of more than 1 per setup provides inaccurate readings.
2023-08-02 13:35:55 +01:00
Dom d40ce5c2e7
Merge pull request #8391 from influxdata/dom/template-proto
refactor: expose partition template protos
2023-08-02 13:06:48 +01:00
Dom 1e5247a6c8
Merge branch 'main' into dom/template-proto 2023-08-02 13:01:48 +01:00
Marco Neumann 9e4e205ffd
refactor: migration checksum type (#8388)
* refactor: use `Box<[...]>` instead of `Vec<...>`

We are not planning to modify the vector, so storing a capacity and a
length is somewhat pointless.

* feat: add printout test for PG migrations

* refactor: use dedicated checksum type

* feat: checksum string roundtrips

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-02 11:57:18 +00:00
Fraser Savage 33e4098cf8
perf(router): Add benchmark for additions to namespace schema cache
This benchmark covers two axis of performance for calls to the
namespace cache's `put_schema()` stack. These are the cost of adding
varying numbers of new columns to an existing table in the namespace, as
well as adding new tables with their own set of columns to an existing
namespace.
2023-08-02 12:45:30 +01:00
Dom Dwyer adad6bb631
refactor: must_use annotation for gossip::Builder
Not all use cases will involve sending messages (some will only want to
subscribe to messages) which might result in someone dropping the handle
they're not expecting to use.
2023-08-02 13:36:36 +02:00
Dom Dwyer 117d70d807
docs: GossipHandle::broadcast() blocking semantics
Document what happens when the gossip message queue is full.
2023-08-02 13:36:36 +02:00
Dom Dwyer 6ea8c99c01
refactor: accessor for table partition proto
Allow the Table partition template protobuf to be accessed (if
specified).
2023-08-02 13:36:35 +02:00
Dom Dwyer e3ec091881
refactor: accessor for namespace partition proto
Allow the Namespace partition template protobuf to be accessed (if
specified).
2023-08-02 13:36:34 +02:00
Dom Dwyer 2ebd2e2236
feat: ColumnSchema instantiation from gossip
Implement converting a Column received via gossip into a ColumnSchema.
2023-08-02 13:36:24 +02:00
kodiakhq[bot] 76c766330a
Merge pull request #8382 from influxdata/dom/router-schema-gossip-skeleton
feat(router): schema gossip skeleton
2023-08-02 11:04:53 +00:00
Dom 855a74d9e4
Merge branch 'main' into dom/router-schema-gossip-skeleton 2023-08-02 11:59:15 +01:00
Marco Neumann 65846e45a8
docs: explain migration transaction handling (#8387)
Forgot to update the docs in #8373.

This will be updated again after I've finished #7897.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-02 10:04:15 +00:00
dependabot[bot] 5004f3e460
chore(deps): Bump cc from 1.0.79 to 1.0.80 (#8386)
Bumps [cc](https://github.com/rust-lang/cc-rs) from 1.0.79 to 1.0.80.
- [Release notes](https://github.com/rust-lang/cc-rs/releases)
- [Commits](https://github.com/rust-lang/cc-rs/compare/1.0.79...1.0.80)

---
updated-dependencies:
- dependency-name: cc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-02 08:29:07 +00:00
Nga Tran dac0db2196
feat: add sort_key_ids into sqlite catalog (#8384) 2023-08-01 20:15:27 +00:00
Dom Dwyer 41c9604e46
feat(router): schema gossip skeleton
Adds the supporting types required to integrate the generic gossip crate
into a schema-specific broadcast primitive.

This commit implements the two "halves":

    * GossipMessageDispatcher: async processing of incoming gossip msgs
    * Handle: the send-side handle for async sending of gossip msgs

These types are responsible for converting into/from the serialised
bytes sent over the gossip primitive into application-level / protobuf
types.
2023-08-01 17:11:09 +02:00
kodiakhq[bot] 4a18df53c6
Merge pull request #8379 from influxdata/dom/schema-gossip-proto
feat(gossip): schema diff message proto
2023-08-01 14:42:40 +00:00
Dom 7be84a8e36
Merge branch 'main' into dom/schema-gossip-proto 2023-08-01 15:37:43 +01:00
Chunchun Ye c8242c7469
chore(cli): add `--partition-template` to `namespace create` (#8365)
* chore(cli): add `--partition-template` to namespace create

* chore: fix typo in doc for `PartitionTemplateConfig`

chore: add max limit 8 for partition template in doc

* chore: add e2e tests

* chore: fmt

* chore: add more e2e tests for namespace create with partition template

* chore: show doc comments in cli help interface

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-01 14:37:00 +00:00
Dom ea4950a7f2
Merge branch 'main' into dom/schema-gossip-proto 2023-08-01 15:36:56 +01:00
Nga Tran 73f38077b6
feat: add sort_key_ids as array of bigints into catalog partition (#8375)
* feat: add sort_key_ids as array of bigints into catalog partition

* chore: add comments

* chore: remove comments to avoid changing them in the future due to checksum requirement

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-01 14:28:30 +00:00
Dom 2ab91e31fa
Merge branch 'main' into dom/schema-gossip-proto 2023-08-01 15:10:23 +01:00
kodiakhq[bot] 4994157910
Merge pull request #8380 from influxdata/savage/configure-router-health-probe-numbers
feat(router): Expose `num_probes` request count used to health-check ingesters as config option
2023-08-01 13:57:41 +00:00
Fraser Savage df2c1850fb
refactor(router): Try to fix rustfmt having a nap 2023-08-01 14:51:20 +01:00
Fraser Savage a05fecd8dd
docs(router): Clearer documentation of probe request behaviour
Co-authored-by: Dom <dom@itsallbroken.com>
2023-08-01 14:48:18 +01:00
Fraser Savage e643014900
docs(router): Fix typo in circuit breaker document comment 2023-08-01 14:46:17 +01:00
Fraser Savage e4a5d2efaa
feat(router): Expose `num_probes` request count used to health-check ingesters as config option
This allows routers to be configured to mark downstreams as healthy/
unhealthy with a requirement for the number of probe requests
which can/must be collected to transition the health checkers circuit
state to healthy/unhealthy.
2023-08-01 14:21:56 +01:00
Dom Dwyer 58269bf463
refactor: re-export prost in generated_types
The generated types emit types that depend on prost (through Message
derives), and therefore all users of generated_types already depend on
prost.

It would be wrong for users of the generated_types crate to use a
different version of prost than what is used in generated_types.

By re-exporting prost, users can just depend on generated_types, and
always use the right prost version. prost prost prost. prost.
2023-08-01 13:22:27 +02:00
Dom Dwyer 081dc03a32
feat(proto): schema gossip message definitions
Define the gossip message types used to disseminate schema changes to
other peers.

Currently there are two types defined: an initial "create" operation,
intended to be followed by "update" operations where appropriate. Both
messages are trivial CRDTs in that they are effectively add-only column
sets (a monotonic type) with other fields required to be immutable (as
they currently are in IOx).
2023-08-01 13:22:26 +02:00
dependabot[bot] 72feefc3cc
chore(deps): Bump serde from 1.0.179 to 1.0.180 (#8376)
Bumps [serde](https://github.com/serde-rs/serde) from 1.0.179 to 1.0.180.
- [Release notes](https://github.com/serde-rs/serde/releases)
- [Commits](https://github.com/serde-rs/serde/compare/v1.0.179...v1.0.180)

---
updated-dependencies:
- dependency-name: serde
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-01 08:24:30 +00:00
Marco Neumann 743a59aa64
feat: use single per-migration txn when possible (#8373)
* test: improve `test_step_sql_statement_no_transaction`

* feat: also print number of steps in "applying migration step"

* feat: use single per-migration txn when possible

If all steps can (and want) to run in a transaction block, then wrap the
migration bookkeeping and the migration script into a single
transaction. This way we avoid the dirty state altogether because its
now an "all or nothing" migration.

Note that we still guarantee that there is only a single migration
running at the same time due to the locking mechanism. Otherwise we
would potentially run into nasty transaction failures during schema
modifications.

This is related to #7897 but only fixes / self-heals the "dirty" state
for transaction that can run in transactions. For concurrent index
migrations (which we need in prod) we need to be a bit smarter and this
will be done in a follow-up. However I feel that not leaving half-done
migrations for the cases where it's technically possible (e.g. adding
columns) is already a huge step forward.

* test: make `test_migrator_uses_single_transaction_when_possible` harder

* test: explain test

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-01 08:18:39 +00:00
Martin Hilton 25c3ce805d
refactor(influxql): make MOVING_AVERAGE a user-defined window function (#8377)
Update the implementation of the MOVING_AVERAGE function to be a
user-defined window function allowing the values to be calculated
for the entire window in one go.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-01 06:30:06 +00:00
Andrew Lamb de79619e71
chore: Update datafusion (#8355)
* chore: Update datafusion pin

* fix: Update for change in API

* chore: Update plan

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-31 15:41:00 +00:00
wiedld b58c26368f
Merge pull request #8367 from influxdata/idpe-17789/provide-job-on-commit
feat(idpe-17789): provide job from compactor --> scheduler, on commit
2023-07-31 08:35:49 -07:00
wiedld cc70a2c38b
Merge branch 'main' into idpe-17789/provide-job-on-commit 2023-07-31 08:20:45 -07:00
Dom 878f217631
Merge pull request #8372 from influxdata/dom/enable-gossip
feat: optional gossip clustering for router/ingester
2023-07-31 15:49:32 +01:00
Dom e98188b181
Merge branch 'main' into dom/enable-gossip 2023-07-31 15:27:44 +01:00
Dom 336a50017e
Merge pull request #8362 from influxdata/dom/persist-list-stat-cache
perf(ingester): persisting list & cached statistics
2023-07-31 14:34:57 +01:00
Dom Dwyer 5d4ce7eacc
docs: fix up comments
Fixes some outdated comments.
2023-07-31 15:19:31 +02:00
Dom Dwyer e3550f78a3
perf(ingester): persisting list & cached statistics
This commit breaks the ordered list of persisting buffers into its own
type (PersistingList) for clarity, and implements a cache within it of
the merged set of schemas across all persisting buffer FSMs, and
row/timestamp summaries.

This cleans up the code, and prevents N persisting schemas from being
merged at query time (for every query!), instead schemas and statistics
are incrementally maintained, pushing the computation to persist time
rather than query time.
2023-07-31 15:19:30 +02:00
Joe-Blount 44e266d000
fix: compaction looping fixes (#8363)
* fix: selectively merge L1 to L2 when L0s still exist

* fix: avoid grouping files that undo previous splits

* chore: add test case for new fixes

* chore: insta test churn

* chore: lint cleanup
2023-07-31 13:15:49 +00:00
Marco Neumann aa7a38be55
fix: re-design LRU cache to be deadlock-free (#8345)
* fix: re-design LRU cache to be deadlock-free

Fixes #8334.

* test: explain test

* test: add regression test

* docs: extend "overdelete" section
2023-07-31 13:04:34 +00:00
kodiakhq[bot] 8d0caae186
Merge pull request #8374 from influxdata/savage/notify-watchers-of-disk-usage-changes
refactor(tracker): Return disk usage watcher from `DiskUsageMetrics`
2023-07-31 12:49:00 +00:00
kodiakhq[bot] 8197dd10a7
Merge branch 'main' into savage/notify-watchers-of-disk-usage-changes 2023-07-31 12:44:05 +00:00
Carol (Nichols || Goulding) 4a9e76b8b7
feat: Make parquet_file.partition_id optional in the catalog (#8339)
* feat: Make parquet_file.partition_id optional in the catalog

This will acquire a short lock on the table in postgres, per:
<https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads>

This allows us to persist data for new partitions and associate the
Parquet file catalog records with the partition records using only the
partition hash ID, rather than both that are used now.

* fix: Support transition partition ID in the catalog service

* fix: Use transition partition ID in import/export

This commit also removes support for the `--partition-id` flag of the
`influxdb_iox remote store get-table` command, which Andrew approved.

The `--partition-id` filter was getting the results of the catalog gRPC
service's query for Parquet files of a table and then keeping only the
files whose partition IDs matched. The gRPC query is no longer returning
the partition ID from the Parquet file table, and really, this command
should instead be using `GetParquetFilesByPartitionId` to only request
what's needed rather than filtering.

* feat: Support looking up Parquet files by either kind of Partition id

Regardless of which is actually stored on the Parquet file record.

That is, say there's a Partition in the catalog with:

Partition {
    id: 3,
    hash_id: abcdefg,
}

and a Parquet file that has:

ParquetFile {
    partition_hash_id: abcdefg,
}

calling `list_by_partition_not_to_delete(PartitionId(3))` should still
return this Parquet file because it is associated with the partition
that has ID 3.

This is important for the compactor, which is currently only dealing in
PartitionIds, and I'd like to keep it that way for now to avoid having
to change Even More in this PR.

* fix: Use and set new partition ID fields everywhere they want to be

---------

Co-authored-by: Dom <dom@itsallbroken.com>
2023-07-31 12:40:56 +00:00
Fraser Savage 8e0cee8e73
refactor(tracker): Return disk usage watcher from `DiskUsageMetrics`
This allows the creator to pass around a handle to the latest observed
disk usage statistics, allowing other threads to act upon changes.
2023-07-31 12:14:13 +01:00