influxdb

Commit Graph

Author	SHA1	Message	Date
Dom Dwyer	302301659e	refactor: derive ParquetFilePath from IoxMetadata Allow directly converting an IoxMetadata to a ParquetFilePath.	2022-05-20 15:17:40 +01:00
Dom Dwyer	b9a745d42d	feat: RecordBatch stream to Parquet file upload Implements an upload() method on the ParquetStorage type, consuming a stream of RecordBatch, serialising the Parquet file, and uploading the result to object storage. Returns the IOx-specific file metadata. Currently while the upload() method accepts a stream of RecordBatch, the actual resulting Parquet file is buffered in memory before uploading to object store, due to lack of streaming upload functionality in the ObjectStore abstraction - this isn't the end of the world, as the files tend to be relatively small with our current usage. This impl should be easily modified to be fully streaming once streaming object store puts are implemented: https://github.com/influxdata/object_store_rs/issues/9	2022-05-20 15:17:40 +01:00
Dom Dwyer	76e08d14a3	perf: IoxParquetMetaData direct from file metadata Construct a IoxParquetMetaData instance directly from the FileMetaData instance returned by the ArrowWriter. This change will allow us to avoid the inefficient impl currently in use: * Serialise batches into memory * Wrap buffer in arrow cursor * Read parquet metadata with arrow file reader * Serialise schema with thrift * Serialise each row group's metadata with thrift * Construct our own FileMetaData instance * Serialise FileMetaData with thrift * zstd encode resulting thrift bytes * Wrap in IoxParquetMetaData Now we "only": * Stream batches into opaque Write impl * Serialise FileMetaData with thrift * zstd encode resulting thrift bytes * Wrap in IoxParquetMetaData Then accessing any data within the IoxParquetMetaData (as before this change) requires deserialising it first. There are still a number of easy performance improvements to be had w.r.t the metadata handling.	2022-05-20 15:17:40 +01:00
Dom Dwyer	70856a645f	feat: streaming RecordBatch -> parquet encoding Implements a streaming RecordBatch to Parquet file serialiser. This impl automatically discovers the schema of the RecordBatch stream, and accepts &mut destination types (internalising the handle cloning/etc) to simplify caller usage. This encoder returns the resulting FileMetaData to allow callers to inspect the resulting metadata without reading back the file. Currently unused / not yet plumbed in.	2022-05-20 15:09:26 +01:00
dependabot[bot]	7010af30b7	chore(deps): Bump prometheus from 0.13.0 to 0.13.1 (#4648 ) Bumps [prometheus](https://github.com/tikv/rust-prometheus) from 0.13.0 to 0.13.1. - [Release notes](https://github.com/tikv/rust-prometheus/releases) - [Changelog](https://github.com/tikv/rust-prometheus/blob/master/CHANGELOG.md) - [Commits](https://github.com/tikv/rust-prometheus/compare/v0.13.0...v0.13.1) --- updated-dependencies: - dependency-name: prometheus dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-20 08:12:45 +00:00
dependabot[bot]	40a69c6e29	chore(deps): Bump pprof from 0.9.0 to 0.9.1 (#4647 ) Bumps [pprof](https://github.com/tikv/pprof-rs) from 0.9.0 to 0.9.1. - [Release notes](https://github.com/tikv/pprof-rs/releases) - [Changelog](https://github.com/tikv/pprof-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/tikv/pprof-rs/commits) --- updated-dependencies: - dependency-name: pprof dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-20 08:04:46 +00:00
dependabot[bot]	6bc0c74c7d	chore(deps): Bump once_cell from 1.10.0 to 1.11.0 (#4646 ) * chore(deps): Bump once_cell from 1.10.0 to 1.11.0 Bumps [once_cell](https://github.com/matklad/once_cell) from 1.10.0 to 1.11.0. - [Release notes](https://github.com/matklad/once_cell/releases) - [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md) - [Commits](https://github.com/matklad/once_cell/compare/v1.10.0...v1.11.0) --- updated-dependencies: - dependency-name: once_cell dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * chore: Run cargo hakari tasks Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-20 07:40:38 +00:00
Marco Neumann	addc45327e	fix: ensure that query tokio background tasks are canceled (#4643 ) * fix: ensure that query tokio background tasks are canceled While I am not entirely sure if this explains some of the memory leaks I am seeing in prod, not canceling the tasks correctly certainly makes debugging way harder and also renders certain form of throttling (e.g. max. concurrent queries) somewhat ineffective. Note that parquet file downloads are currently NOT canceled because tokios `spawn_blocking` cannot be canceled. * refactor: `Vec` -> `Option` * refactor: `spawn_blocking` creates a join handle, even though it is useless Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-20 07:18:52 +00:00
kodiakhq[bot]	7a645a9083	Merge pull request #4642 from influxdata/crepererum/debug_querier feat: add `measurement_fields` support to `influxdb_iox storage`	2022-05-20 07:12:04 +00:00
kodiakhq[bot]	a3e1a494e7	Merge branch 'main' into crepererum/debug_querier	2022-05-19 20:37:15 +00:00
Andrew Lamb	a18a49736d	refactor: Encapsulate reconciliation logic more (#4644 ) * refactor: extract code from state_reconciler * refactor: Encapsulate reconcilation logic more * fix: docs	2022-05-19 19:25:36 +00:00
Marco Neumann	20fa70d54b	feat: add `measurement_fields` support to `influxdb_iox storage`	2022-05-19 16:50:46 +02:00
kodiakhq[bot]	0c21693826	Merge pull request #4641 from influxdata/dom/parquet-store refactor: parquet store	2022-05-19 12:58:44 +00:00
Dom Dwyer	baa86d846f	refactor: use ParquetStore instead of ObjectStore Changes the code paths that interact with Parquet files in the object store to reference the ParquetStorage directly (DRY refactor). This change takes us from a dependency graph of: ┌─────────────────┐ │ │ ▼ │ Parquet Consumer │ │ ┌──────────────┐ ├────────▶│ParquetStorage│ ▼ └──────────────┘ ┌──────────────┐ │ ObjectStore │ └──────────────┘ │ ┌────┴────┐ ▼ ▼ File s3 System (etc) to: Parquet Consumer │ ▼ ┌──────────────┐ │ParquetStorage│ └──────────────┘ │ ▼ ┌──────────────┐ │ ObjectStore │ └──────────────┘ │ ┌────┴────┐ ▼ ▼ File s3 System (etc) With the ParquetStorage being solely responsible for managing interactions with the object store when dealing with Parquet files.	2022-05-19 13:52:51 +01:00
Dom Dwyer	d3548653d5	refactor: rename Storage -> ParquetStorage Renames the Storage type so the context is clear in usage (i.e. fn args), rather than having to rely on knowing the fully-qualified import path to know what the type stores.	2022-05-19 13:51:07 +01:00
Dom Dwyer	e20b02b914	refactor: tidy ParquetChunk constructor Removes two unused constructors for a ParquetChunk, and moves the bare fn constructor that is actually used to be an associated method (a conventional constructor).	2022-05-19 13:51:07 +01:00
Andrew Lamb	ed41622593	chore: Remove dead code from QueryDatabase (#4637 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-19 10:29:54 +00:00
Marco Neumann	7d16f57c85	ci: simplify cargo deny (#4640 ) Taken from https://github.com/influxdata/object_store_rs/pull/5 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-19 09:51:15 +00:00
Marco Neumann	6577887440	feat: instrument querier cache loaders w/ metrics (#4635 ) * feat: `MetricsLoader` Add ability to instrument cache loaders w/ metrics. * feat: instrument querier cache loaders w/ metrics * fix: fix metric descriptions and names	2022-05-19 08:30:34 +00:00
dependabot[bot]	409ae0ee0d	chore(deps): Bump handlebars from 4.2.2 to 4.3.0 (#4639 ) Bumps [handlebars](https://github.com/sunng87/handlebars-rust) from 4.2.2 to 4.3.0. - [Release notes](https://github.com/sunng87/handlebars-rust/releases) - [Changelog](https://github.com/sunng87/handlebars-rust/blob/master/CHANGELOG.md) - [Commits](https://github.com/sunng87/handlebars-rust/compare/v4.2.2...v4.3.0) --- updated-dependencies: - dependency-name: handlebars dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-19 08:15:51 +00:00
Marco Neumann	770293a973	feat: add LRU cache metrics (#4632 ) * refactor: require `Resource`s to be convertible to `u64` * refactor: require `Resource`s to have a unit name * refactor: make LRU cache IDs static * feat: add LRU cache metrics * docs: improve type names in LRU doctest * docs: epxlain `MeasuredT` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * docs: explain `test_metrics` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-05-19 08:05:17 +00:00
kodiakhq[bot]	39de0f5712	Merge pull request #4636 from influxdata/dom/remove-unused refactor: remove unused max_row_group_size	2022-05-18 15:58:07 +00:00
Dom Dwyer	7a8e6d1a38	refactor: remove unused max_row_group_size The Parquet writer references an unused max_row_group_size property in the parquet file metadata.	2022-05-18 16:45:15 +01:00
Marco Neumann	4bd899369e	feat: check for overlapping ingester partititions in querier (#4633 ) Right now this would clearly indicate a bug and before I am trying to understand some prod issues, I wanna rule that one out.	2022-05-18 13:16:27 +00:00
kodiakhq[bot]	c7944e8ba8	Merge pull request #4620 from influxdata/dom/partition-null-column-serialisation fix: partition null column serialisation	2022-05-18 12:39:09 +00:00
Dom Dwyer	43300878bc	fix(pb): encoding entirely NULL columns (#4272 ) This commit changes the protobuf record batch encoding to skip entirely NULL columns when serialising. This prevents the deserialisation from erroring due to a column type inference failure. Prior to this commit, when the system was presented a record batch such as this: \| time \| A \| B \| \| ---------- \| ---- \| ---- \| \| 1970-01-01 \| 1 \| NULL \| \| 1970-07-05 \| NULL \| 1 \| Which would be partitioned by YMD into two separate partitions: \| time \| A \| B \| \| ---------- \| ---- \| ---- \| \| 1970-01-01 \| 1 \| NULL \| and: \| time \| A \| B \| \| ---------- \| ---- \| ---- \| \| 1970-07-05 \| NULL \| 1 \| Both partitions would contain an entirely NULL column. Both of these partitioned record batches would be successfully encoded, but decoding the partition fails due to the inability to infer a column type from the serialised format which contains no values, which on the wire, looks like: Column { column_name: "B", semantic_type: Field, values: Some( Values { i64_values: [], f64_values: [], u64_values: [], string_values: [], bool_values: [], bytes_values: [], packed_string_values: None, interned_string_values: None, }, ), null_mask: [ 1, ], }, In a column that is not entirely NULL, one of the "Values" fields would be non-empty, and the decoder would use this to infer the type of the column. Because we have chosen to not differentiate between "NULL" and "empty" in our proto encoding, the decoder cannot infer which field within the "Values" struct the column belongs to - all are valid, but empty. This commit prevents this type inference failure by skipping any columns that are entirely NULL during serialisation, preventing the deserialiser from having to process columns with ambiguous types.	2022-05-18 13:33:26 +01:00
Dom Dwyer	41ee23463d	feat: is all set/unset BitSet query methods Adds is_all_set() and is_all_unset() methods to the BitSet, to query the state of all bits in the bitmap.	2022-05-18 13:33:24 +01:00
dependabot[bot]	c99ad9f4a4	chore(deps): Bump schemars from 0.8.9 to 0.8.10 (#4627 ) Bumps [schemars](https://github.com/GREsau/schemars) from 0.8.9 to 0.8.10. - [Release notes](https://github.com/GREsau/schemars/releases) - [Changelog](https://github.com/GREsau/schemars/blob/master/CHANGELOG.md) - [Commits](https://github.com/GREsau/schemars/compare/v0.8.9...v0.8.10) --- updated-dependencies: - dependency-name: schemars dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-18 10:23:34 +00:00
Dom	75a6bca0b6	Merge pull request #4630 from influxdata/dependabot/cargo/rustls-0.20.6 chore(deps): Bump rustls from 0.20.4 to 0.20.6	2022-05-18 11:17:18 +01:00
dependabot[bot]	b542e453e8	chore(deps): Bump rustls from 0.20.4 to 0.20.6 Bumps [rustls](https://github.com/rustls/rustls) from 0.20.4 to 0.20.6. - [Release notes](https://github.com/rustls/rustls/releases) - [Changelog](https://github.com/rustls/rustls/blob/main/RELEASE_NOTES.md) - [Commits](https://github.com/rustls/rustls/compare/v/0.20.4...v/0.20.6) --- updated-dependencies: - dependency-name: rustls dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2022-05-18 09:54:48 +00:00
dependabot[bot]	db384ba387	chore(deps): Bump libc from 0.2.125 to 0.2.126 (#4626 ) Bumps [libc](https://github.com/rust-lang/libc) from 0.2.125 to 0.2.126. - [Release notes](https://github.com/rust-lang/libc/releases) - [Commits](https://github.com/rust-lang/libc/compare/0.2.125...0.2.126) --- updated-dependencies: - dependency-name: libc dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-18 09:52:35 +00:00
kodiakhq[bot]	12f7f3b3a9	Merge pull request #4622 from influxdata/crepererum/cache_cleanups refactor: remove unused cache code	2022-05-18 09:45:12 +00:00
Marco Neumann	7aaabb37cb	refactor: `cache_system::backend::dual`	2022-05-18 11:39:30 +02:00
Marco Neumann	23b37a1991	refactor: remove unused `TableCache::id`	2022-05-18 11:39:30 +02:00
Marco Neumann	7c20acb2e6	refactor: remove unused `NamespaceCache::name`	2022-05-18 11:39:30 +02:00
Marco Neumann	52346642a0	ci: fix cargo deny (#4629 ) * ci: fix cargo deny * chore: downgrade `socket2`, version 0.4.5 was yanked * chore: rename `query` to `iox_query` `query` is already taken on crates.io and yanked and I am getting tired of working around that.	2022-05-18 09:38:35 +00:00
kodiakhq[bot]	7f60f69734	Merge pull request #4625 from influxdata/dom/bitset-oob fix: bounds check BitSet access	2022-05-17 16:37:43 +00:00
Dom Dwyer	43bc473469	fix: bounds check BitSet access Prior to this commit, it was possible to read/write to the allocated, but unused storage bits outside of the "length" of the BitSet. Bit access is now bounds checked.	2022-05-17 17:26:42 +01:00
Marco Neumann	12937ee724	feat: add SOCKS5 support to Kafka write buffer (#4623 )	2022-05-17 15:21:35 +00:00
Andrew Lamb	3a33e806c7	chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `14.0.0` (#4619 ) * chore: Update datafusion deps * chore: update arrow/parquet/arrow flight deps * chore: Run cargo hakari tasks * chore: Update location of utils * chore: Update some more APIs Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-05-17 14:13:03 +00:00
Marco Neumann	779f0e9cdf	feat: querier RAM pool (#4593 ) * feat: `SortKey::size` * feat: `FunctionEstimator` * feat: querier RAM pool Let's put all the caches into a single RAM pool, so we can at least somewhat control RAM usage. Note that this does NOT limit the peak memory during query execution though, but should at least stop unlimited cache growth. A follow-up PR will add metrics. * refactor: improve some size calculations Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-17 13:11:20 +00:00
dependabot[bot]	87d8a8b684	chore(deps): Bump schemars from 0.8.8 to 0.8.9 (#4614 ) Bumps [schemars](https://github.com/GREsau/schemars) from 0.8.8 to 0.8.9. - [Release notes](https://github.com/GREsau/schemars/releases) - [Changelog](https://github.com/GREsau/schemars/blob/master/CHANGELOG.md) - [Commits](https://github.com/GREsau/schemars/compare/v0.8.8...v0.8.9) --- updated-dependencies: - dependency-name: schemars dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-17 09:49:27 +00:00
dependabot[bot]	5a0006e40b	chore(deps): Bump syn from 1.0.94 to 1.0.95 (#4610 ) Bumps [syn](https://github.com/dtolnay/syn) from 1.0.94 to 1.0.95. - [Release notes](https://github.com/dtolnay/syn/releases) - [Commits](https://github.com/dtolnay/syn/compare/1.0.94...1.0.95) --- updated-dependencies: - dependency-name: syn dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-17 09:40:36 +00:00
kodiakhq[bot]	524c77de5d	Merge pull request #4607 from influxdata/cn/kube-update chore(deps): Merge all the kube deps together	2022-05-17 09:32:22 +00:00
kodiakhq[bot]	3ef01032ed	Merge branch 'main' into cn/kube-update	2022-05-17 09:26:43 +00:00
kodiakhq[bot]	d243ada6db	Merge pull request #4609 from influxdata/cn/clearer-logging feat: Add more logging in particular situations to debug flaky test	2022-05-17 00:47:01 +00:00
Carol (Nichols \|\| Goulding)	9eb21095e7	feat: Add more logging in particular situations to debug flaky test	2022-05-16 16:46:29 -04:00
Carol (Nichols \|\| Goulding)	a91a9d5789	Merge remote-tracking branch 'origin/dependabot/cargo/kube-runtime-0.72.0' into cn/kube-update	2022-05-16 10:50:51 -04:00
Carol (Nichols \|\| Goulding)	42b52da0d4	Merge remote-tracking branch 'origin/dependabot/cargo/kube-derive-0.72.0' into cn/kube-update	2022-05-16 10:49:51 -04:00
Carol (Nichols \|\| Goulding)	edd3cce73c	Merge remote-tracking branch 'origin/dependabot/cargo/kube-0.72.0' into cn/kube-update	2022-05-16 10:26:06 -04:00

1 2 3 4 5 ...

7970 Commits (302301659e09c31d5b26b277963d8001caff9fbd) All Branches Search

7970 Commits (302301659e09c31d5b26b277963d8001caff9fbd)

All Branches