Commit Graph

8045 Commits (04531e77dd1f57188b1a4045fb8cd713b265496a)

Author SHA1 Message Date
dependabot[bot] 292f71759e
chore(deps): Bump http-body from 0.4.4 to 0.4.5 (#4654)
Bumps [http-body](https://github.com/hyperium/http-body) from 0.4.4 to 0.4.5.
- [Release notes](https://github.com/hyperium/http-body/releases)
- [Changelog](https://github.com/hyperium/http-body/blob/master/CHANGELOG.md)
- [Commits](https://github.com/hyperium/http-body/compare/v0.4.4...v0.4.5)

---
updated-dependencies:
- dependency-name: http-body
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-23 08:30:49 +00:00
dependabot[bot] 1bc02b1487
chore(deps): Bump regex-syntax from 0.6.25 to 0.6.26 (#4653)
Bumps [regex-syntax](https://github.com/rust-lang/regex) from 0.6.25 to 0.6.26.
- [Release notes](https://github.com/rust-lang/regex/releases)
- [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/regex/commits)

---
updated-dependencies:
- dependency-name: regex-syntax
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-23 08:21:39 +00:00
Carol (Nichols || Goulding) 05bd9de4d3
test: Add a test for the sequence number skipping metric
Ok, so... this needed lots of... channels. Channels everywhere.

The stream method on TestWriteBufferStreamHandler previously assumed it
would only be called once. In a test where reset_to_earliest is called,
stream might be called again to get the reset stream.

We want to be able to control which of the streams gets which
operations, so that's why the macro now takes a vec of vec of
operations-- one vec of operations per expected call to stream, and the
stream will send all the operations in its vec.

The test thread needs to wait for the handler stream to consume the last
item from the last receiver stream, so when the
TestWriteBufferStreamHandler has set up the last expected call to
stream, pass back the last transmitter and have it wait until it's at
full expected capacity (which means all operations have been consumed by
the receiver).
2022-05-20 20:50:02 -04:00
Carol (Nichols || Goulding) bda231051a
feat: Record metrics when resetting the write buffer and skipping sequence numbers 2022-05-20 20:48:17 -04:00
Carol (Nichols || Goulding) e5e08e5b16
test: Add a test of reset_to_earliest for all write buffer implementations
This is the basic test case; I've filed #4651 for the more complex test
needing deletion of records from the write buffer.
2022-05-20 20:48:17 -04:00
Carol (Nichols || Goulding) bcbf7b4f46
refactor: Move error handling logic to be all together 2022-05-20 20:48:17 -04:00
Carol (Nichols || Goulding) 549dd497ea
refactor: Extract an ingester verification function 2022-05-20 20:48:16 -04:00
kodiakhq[bot] b79db5f609
Merge pull request #4645 from influxdata/cn/update-rustc
chore: Update to Rust 1.61
2022-05-21 00:47:20 +00:00
kodiakhq[bot] f6b3296136
Merge branch 'main' into cn/update-rustc 2022-05-21 00:41:42 +00:00
Carol (Nichols || Goulding) 2aa76622c3
refactor: Extract a test setup function 2022-05-20 11:51:57 -04:00
Carol (Nichols || Goulding) ab72c93a5e
docs: Updating wrapping, content, and grammar of comments 2022-05-20 10:51:07 -04:00
Carol (Nichols || Goulding) c811bebdb7
feat: Add ingester CLI option to skip to oldest available WB seq num
The default behavior of the ingester is to panic if the min unpersisted
sequence number in the catalog is unknown to the write buffer due to the
retention policies having evicted that sequence number.

Specifying `--skip-to-oldest-available` changes this behavior to skip to
the oldest sequence number the write buffer does have available and go
from there.

Fixes #4624.
2022-05-20 10:51:07 -04:00
Carol (Nichols || Goulding) b3f97bdb9d
test: Capture existing behavior for unknown sequence number 2022-05-20 10:51:06 -04:00
Jake Goulding 359046f3f2 ci: give the doc builder more memory 2022-05-20 10:44:06 -04:00
Dom Dwyer 00dc95829d style: enable more lints
Enable more lints on the parquet_file crate to keep it a little cleaner
- adds the following:

    clippy::clone_on_ref_ptr,
    unreachable_pub,
    missing_docs,
    clippy::todo,
    clippy::dbg_macro

This commit includes fixes for any new lint failures.
2022-05-20 15:17:40 +01:00
Dom Dwyer 7df7c4844c refactor: remove redundant ParquetChunk errors
Eliminates unused / refactors away unnecessary errors for the
parquet::chunk module.
2022-05-20 15:17:40 +01:00
Dom Dwyer 661f8599a6 refactor: internalise Parquet path generation
Derive the ParquetFilePath from the IoxMetadata within the
ParquetStorage::read_filter() call.

This prevents the "put/get RecordBatches" abstraction from leaking out
the object store path generation concern - an implementation detail of
the ParquetStorage layer.
2022-05-20 15:17:40 +01:00
Dom Dwyer cdb341d45a test: ParquetStorage upload() and read_filter()
Adds tests for the previously untested (directly at least) Parquet
(de)serialisation & persistence layer, provided by the ParquetStorage
type.
2022-05-20 15:17:40 +01:00
Dom Dwyer 302301659e refactor: derive ParquetFilePath from IoxMetadata
Allow directly converting an IoxMetadata to a ParquetFilePath.
2022-05-20 15:17:40 +01:00
Dom Dwyer b9a745d42d feat: RecordBatch stream to Parquet file upload
Implements an upload() method on the ParquetStorage type, consuming a
stream of RecordBatch, serialising the Parquet file, and uploading the
result to object storage. Returns the IOx-specific file metadata.

Currently while the upload() method accepts a stream of RecordBatch, the
actual resulting Parquet file is buffered in memory before uploading to
object store, due to lack of streaming upload functionality in the
ObjectStore abstraction - this isn't the end of the world, as the files
tend to be relatively small with our current usage.

This impl should be easily modified to be fully streaming once streaming
object store puts are implemented:

    https://github.com/influxdata/object_store_rs/issues/9
2022-05-20 15:17:40 +01:00
Dom Dwyer 76e08d14a3 perf: IoxParquetMetaData direct from file metadata
Construct a IoxParquetMetaData instance directly from the FileMetaData
instance returned by the ArrowWriter.

This change will allow us to avoid the inefficient impl currently in
use:

    * Serialise batches into memory
    * Wrap buffer in arrow cursor
    * Read parquet metadata with arrow file reader
    * Serialise schema with thrift
    * Serialise each row group's metadata with thrift
    * Construct our own FileMetaData instance
    * Serialise FileMetaData with thrift
    * zstd encode resulting thrift bytes
    * Wrap in IoxParquetMetaData

Now we "only":

    * Stream batches into opaque Write impl
    * Serialise FileMetaData with thrift
    * zstd encode resulting thrift bytes
    * Wrap in IoxParquetMetaData

Then accessing any data within the IoxParquetMetaData (as before this
change) requires deserialising it first.

There are still a number of easy performance improvements to be had
w.r.t the metadata handling.
2022-05-20 15:17:40 +01:00
Dom Dwyer 70856a645f feat: streaming RecordBatch -> parquet encoding
Implements a streaming RecordBatch to Parquet file serialiser.

This impl automatically discovers the schema of the RecordBatch stream,
and accepts &mut destination types (internalising the handle
cloning/etc) to simplify caller usage.

This encoder returns the resulting FileMetaData to allow callers to
inspect the resulting metadata without reading back the file.

Currently unused / not yet plumbed in.
2022-05-20 15:09:26 +01:00
dependabot[bot] 7010af30b7
chore(deps): Bump prometheus from 0.13.0 to 0.13.1 (#4648)
Bumps [prometheus](https://github.com/tikv/rust-prometheus) from 0.13.0 to 0.13.1.
- [Release notes](https://github.com/tikv/rust-prometheus/releases)
- [Changelog](https://github.com/tikv/rust-prometheus/blob/master/CHANGELOG.md)
- [Commits](https://github.com/tikv/rust-prometheus/compare/v0.13.0...v0.13.1)

---
updated-dependencies:
- dependency-name: prometheus
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-20 08:12:45 +00:00
dependabot[bot] 40a69c6e29
chore(deps): Bump pprof from 0.9.0 to 0.9.1 (#4647)
Bumps [pprof](https://github.com/tikv/pprof-rs) from 0.9.0 to 0.9.1.
- [Release notes](https://github.com/tikv/pprof-rs/releases)
- [Changelog](https://github.com/tikv/pprof-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/tikv/pprof-rs/commits)

---
updated-dependencies:
- dependency-name: pprof
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-20 08:04:46 +00:00
dependabot[bot] 6bc0c74c7d
chore(deps): Bump once_cell from 1.10.0 to 1.11.0 (#4646)
* chore(deps): Bump once_cell from 1.10.0 to 1.11.0

Bumps [once_cell](https://github.com/matklad/once_cell) from 1.10.0 to 1.11.0.
- [Release notes](https://github.com/matklad/once_cell/releases)
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md)
- [Commits](https://github.com/matklad/once_cell/compare/v1.10.0...v1.11.0)

---
updated-dependencies:
- dependency-name: once_cell
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-20 07:40:38 +00:00
Marco Neumann addc45327e
fix: ensure that query tokio background tasks are canceled (#4643)
* fix: ensure that query tokio background tasks are canceled

While I am not entirely sure if this explains some of the memory leaks I
am seeing in prod, not canceling the tasks correctly certainly makes
debugging way harder and also renders certain form of throttling (e.g.
max. concurrent queries) somewhat ineffective.

Note that parquet file downloads are currently NOT canceled because
tokios `spawn_blocking` cannot be canceled.

* refactor: `Vec` -> `Option`

* refactor: `spawn_blocking` creates a join handle, even though it is useless

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-20 07:18:52 +00:00
kodiakhq[bot] 7a645a9083
Merge pull request #4642 from influxdata/crepererum/debug_querier
feat: add `measurement_fields` support to `influxdb_iox storage`
2022-05-20 07:12:04 +00:00
kodiakhq[bot] a3e1a494e7
Merge branch 'main' into crepererum/debug_querier 2022-05-19 20:37:15 +00:00
Carol (Nichols || Goulding) 4bad553dc6
fix: Use a method instead of holding a lock across an await point 2022-05-19 16:09:47 -04:00
Carol (Nichols || Goulding) 15ee3f326f
fix: Rearrange lock usage to not hold lock across an await point 2022-05-19 15:40:33 -04:00
Carol (Nichols || Goulding) b2279fae39
fix: Don't lint on holding lock across await points in tracker tests
These have explicit drops that Clippy isn't recognizing; none of these
are actually being held across await points.

See <https://github.com/rust-lang/rust-clippy/issues/6446>.
2022-05-19 15:26:56 -04:00
Andrew Lamb a18a49736d
refactor: Encapsulate reconciliation logic more (#4644)
* refactor: extract code from state_reconciler

* refactor: Encapsulate reconcilation logic more

* fix: docs
2022-05-19 19:25:36 +00:00
Carol (Nichols || Goulding) 53a94c4c7b
fix: Don't use clippy::use_self in influxdb2_client due to false positive
See https://github.com/rust-lang/rust-clippy/issues/6902

It's an interaction between clippy and serde; the lint produces
confusing and incorrect warnings.
2022-05-19 15:20:11 -04:00
Carol (Nichols || Goulding) 792c394cf2
fix: Update expected value to new debug formatting
Debug formatting is always considered unstable. This changed in Rust
1.61.

References:

- https://github.com/rust-lang/rust/issues/95732
- https://github.com/rust-lang/rust/pull/95345
2022-05-19 14:50:02 -04:00
Carol (Nichols || Goulding) 5fcf18cc02
fix: Add missing assert call around contains tests
`contains` is now must_use. Thanks Rust!
2022-05-19 14:39:51 -04:00
Carol (Nichols || Goulding) 8eece6135b
chore: Update to Rust 1.61 2022-05-19 14:39:05 -04:00
Marco Neumann 20fa70d54b feat: add `measurement_fields` support to `influxdb_iox storage` 2022-05-19 16:50:46 +02:00
kodiakhq[bot] 0c21693826
Merge pull request #4641 from influxdata/dom/parquet-store
refactor: parquet store
2022-05-19 12:58:44 +00:00
Dom Dwyer baa86d846f refactor: use ParquetStore instead of ObjectStore
Changes the code paths that interact with Parquet files in the object
store to reference the ParquetStorage directly (DRY refactor).

This change takes us from a dependency graph of:

                    ┌─────────────────┐
                    │                 │
                    ▼                 │
            Parquet Consumer          │
                    │         ┌──────────────┐
                    ├────────▶│ParquetStorage│
                    ▼         └──────────────┘
            ┌──────────────┐
            │ ObjectStore  │
            └──────────────┘
                    │
               ┌────┴────┐
               ▼         ▼
             File       s3
            System    (etc)

to:

                Parquet Consumer
                        │
                        ▼
                ┌──────────────┐
                │ParquetStorage│
                └──────────────┘
                        │
                        ▼
                ┌──────────────┐
                │ ObjectStore  │
                └──────────────┘
                        │
                   ┌────┴────┐
                   ▼         ▼
                 File       s3
                System    (etc)

With the ParquetStorage being solely responsible for managing
interactions with the object store when dealing with Parquet files.
2022-05-19 13:52:51 +01:00
Dom Dwyer d3548653d5 refactor: rename Storage -> ParquetStorage
Renames the Storage type so the context is clear in usage (i.e. fn
args), rather than having to rely on knowing the fully-qualified import
path to know what the type stores.
2022-05-19 13:51:07 +01:00
Dom Dwyer e20b02b914 refactor: tidy ParquetChunk constructor
Removes two unused constructors for a ParquetChunk, and moves the bare
fn constructor that is actually used to be an associated method (a
conventional constructor).
2022-05-19 13:51:07 +01:00
Andrew Lamb ed41622593
chore: Remove dead code from QueryDatabase (#4637)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-19 10:29:54 +00:00
Marco Neumann 7d16f57c85
ci: simplify cargo deny (#4640)
Taken from https://github.com/influxdata/object_store_rs/pull/5

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-19 09:51:15 +00:00
Marco Neumann 6577887440
feat: instrument querier cache loaders w/ metrics (#4635)
* feat: `MetricsLoader`

Add ability to instrument cache loaders w/ metrics.

* feat: instrument querier cache loaders w/ metrics

* fix: fix metric descriptions and names
2022-05-19 08:30:34 +00:00
dependabot[bot] 409ae0ee0d
chore(deps): Bump handlebars from 4.2.2 to 4.3.0 (#4639)
Bumps [handlebars](https://github.com/sunng87/handlebars-rust) from 4.2.2 to 4.3.0.
- [Release notes](https://github.com/sunng87/handlebars-rust/releases)
- [Changelog](https://github.com/sunng87/handlebars-rust/blob/master/CHANGELOG.md)
- [Commits](https://github.com/sunng87/handlebars-rust/compare/v4.2.2...v4.3.0)

---
updated-dependencies:
- dependency-name: handlebars
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-19 08:15:51 +00:00
Marco Neumann 770293a973
feat: add LRU cache metrics (#4632)
* refactor: require `Resource`s to be convertible to `u64`

* refactor: require `Resource`s to have a unit name

* refactor: make LRU cache IDs static

* feat: add LRU cache metrics

* docs: improve type names in LRU doctest

* docs: epxlain `MeasuredT`

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* docs: explain `test_metrics`

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-05-19 08:05:17 +00:00
kodiakhq[bot] 39de0f5712
Merge pull request #4636 from influxdata/dom/remove-unused
refactor: remove unused max_row_group_size
2022-05-18 15:58:07 +00:00
Dom Dwyer 7a8e6d1a38 refactor: remove unused max_row_group_size
The Parquet writer references an unused max_row_group_size property in
the parquet file metadata.
2022-05-18 16:45:15 +01:00
Marco Neumann 4bd899369e
feat: check for overlapping ingester partititions in querier (#4633)
Right now this would clearly indicate a bug and before I am trying to
understand some prod issues, I wanna rule that one out.
2022-05-18 13:16:27 +00:00
kodiakhq[bot] c7944e8ba8
Merge pull request #4620 from influxdata/dom/partition-null-column-serialisation
fix: partition null column serialisation
2022-05-18 12:39:09 +00:00