influxdb

Commit Graph

Author	SHA1	Message	Date
Nga Tran	b60e1be0cf	chore: remove irrelaevant comments (#4791 )	2022-06-07 00:43:56 +00:00
Nga Tran	3e89daa0d4	feat: compact all overlapped files no matter how large they are (#4779 ) * feat: add an option to compact all overlapped files no matter how large they are * chore: Apply suggestions from code review * feat: always compact oerlapped files no matter how large they are * chore: cleaup	2022-06-06 23:39:09 +00:00
dependabot[bot]	04c685b3b7	chore(deps): Bump tokio-util from 0.7.2 to 0.7.3 (#4784 ) Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.2 to 0.7.3. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.2...tokio-util-0.7.3) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-06 14:46:27 +00:00
dependabot[bot]	e03bf94420	chore(deps): Bump tokio from 1.18.2 to 1.19.1 (#4783 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.18.2 to 1.19.1. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.18.2...tokio-1.19.1) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-06 14:15:12 +00:00
Carol (Nichols \|\| Goulding)	aa510ae4e6	fix: Remove test uses of parquet chunks and document as unused The querier is now using read buffer chunks only, but we're leaving the parquet chunk code around for the moment.	2022-06-03 09:16:04 -04:00
Andrew Lamb	3592aa52d8	chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0` (#4743 ) * chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0` * chore: Update APIs * chore: Run cargo hakari tasks * feat: normalize parquet file metadata * chore: update size tests * chore: add docs on metadata stripping * chore: TEMP UPDATE TO DF BRANCH * chore: Update for new API * fix: Update to latest DF * fix: cargo hakari Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>	2022-06-03 10:32:26 +00:00
dependabot[bot]	9a21292db8	chore(deps): Bump async-trait from 0.1.53 to 0.1.56 (#4774 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.53 to 0.1.56. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.53...0.1.56) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-03 09:10:40 +00:00
Ryan Russell	d279deddad	docs(various): Improve Readability (#4768 ) Signed-off-by: Ryan Russell <git@ryanrussell.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-02 18:01:06 +00:00
Nga Tran	79895b995c	chore: add debug info to see how many concurrent partitions being compacted in each cycle (#4772 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-02 15:19:08 +00:00
Dom Dwyer	9ae58c89b6	refactor: constructor for ParquetFileWithTombstone Use a constructor to initialise a ParquetFileWithTombstone struct, rather than making the fields pub. This allows IDEs to "go to" places where this is constructed when browsing the code, but also keeps the type closed for modification of internals (SOLID).	2022-06-01 15:58:06 +01:00
Nga Tran	79220720be	chore: increase size of a compactor job and level of concurrency (#4746 ) * fix: let us not compact no-data * fix: split time must be greater min_time, too * fix: resolve merge conflict * chore: increase size of a compactor job and level of concurrency Co-authored-by: Dom <dom@itsallbroken.com>	2022-05-31 19:57:06 +00:00
Nga Tran	dfd35c05a1	fix: let us not compact no-data (#4744 ) * fix: let us not compact no-data * fix: split time must be greater min_time, too * fix: resolve merge conflict Co-authored-by: Dom <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-31 17:02:14 +00:00
Dom Dwyer	70864b9f48	refactor: always use correct chunk sort key Don't use the same sort key for all files - sort keys may grow over time, and the information is already at hand.	2022-05-30 17:41:41 +01:00
Dom Dwyer	6aa2a6958a	refactor: assert consistent parquet file metadata Assert consistent metadata when evaluating candidate parquet files for compaction. Asserts all files have the same: * Sequencer ID * Namespace ID * Table ID * Partition ID * Sort key	2022-05-30 17:41:41 +01:00
Dom Dwyer	0f16d6cabb	refactor: consistent SortKey source Changes the compaction logic to always reference the same SortKey instance, rather than repeatedly querying for it. The Partition metadata is always read from the catalog as part of compact_partition(), where it previously threw away all metadata except the sort key, which was passed into compact(). Then compact() would always re-query the catalog to look up just the sort key again, and mix up the two instances during use - one passed into the fn, one freshly queried within the fn. Now the Partition metadata is resolved in compact_partition() as it was previously, but the entire Partition reference is passed to compact(), and this is consistently used do access the sort key. This also removes a catalog query per compaction call.	2022-05-30 17:41:41 +01:00
kodiakhq[bot]	842ef8e308	Merge branch 'main' into cn/fetch-from-parquet-file	2022-05-27 17:08:28 +00:00
Andrew Lamb	dde3c3922c	refactor: use consistent spelling of serialize (#4717 )	2022-05-27 14:42:59 +00:00
Nga Tran	ea81152fac	refactor: add partition ID into debug info and panic earlier to identify the bug easier (#4716 ) * chore: point tests to the new ticket * chore: cleanup * refactor: add partition ID into debug info and panic earlier to identify the bug easier	2022-05-27 12:20:36 +00:00
Carol (Nichols \|\| Goulding)	5fd3ffc17f	refactor: Rename ParquetChunkAdapter to only ChunkAdapter It might be creating chunks of different kinds other than ParquetChunks.	2022-05-26 16:52:14 -04:00
Carol (Nichols \|\| Goulding)	df10452e2e	refactor: Rename methods from new_querier_chunk to new_querier_parquet_chunk	2022-05-25 17:19:10 -04:00
Nga Tran	6cc767efcc	feat: teach compactor to compact smaller number of files (#4671 ) * refactor: split compact_partition into two functions to handle concurrency better * feat: limit number of files to compact * test: add test for limit num files * chore: fix cipply * feat: split group if over max size * fix: split the overlapped group to limit size or file num * chore: reduce config values * test: add tests and clearer comments for the split_overlapped_groups and test_limit_size_and_num_files * chore: more comments * chore: cleanup	2022-05-25 19:54:34 +00:00
Andrew Lamb	935743b525	refactor: Implement `new_querier_chunk` and `new_querier_chunk_from_file_with_metadata` (#4685 )	2022-05-24 21:58:27 +00:00
Dom Dwyer	c885b845dc	refactor: concurrent StreamSplitExec execution Changes the compactor to consume both StreamSplitExec output partitions concurrently. Practically speaking this means both Parquet files will be generated concurrently, and uploaded to object store concurrently.	2022-05-24 14:10:46 +01:00
Dom Dwyer	8f05250c96	feat: steaming compaction This commit changes the Compactor::compact() method to stream the RecordBatch instances directly to the parquet serialiser, before being uploaded directly to object storage.	2022-05-24 14:09:10 +01:00
Dom Dwyer	2e6c49be83	refactor: remove IoxMetadata min & max timestamp Removes the min/max timestamp fields from the IoxMetadata proto structure embedded within a Parquet file's metadata. These values are redundant as they already exist within the Parquet column statistics, and precluded streaming serialisation as these removed min/max values were needed before serialising the file.	2022-05-23 16:27:08 +01:00
Dom Dwyer	a142a9eb57	refactor: remove row_count from IoxMetadata Remove the redundant row_count from the IoxMetadata structure that is serialised into the Parquet file. The reasoning is twofold: * The Parquet file's native metadata already contains a row count * Needing to know the number of rows up-front precludes streaming	2022-05-23 16:18:35 +01:00
Dom	f0d0f1ba0c	Merge branch 'main' into dom/codec-object-store	2022-05-23 15:39:54 +01:00
Dom Dwyer	7df7c4844c	refactor: remove redundant ParquetChunk errors Eliminates unused / refactors away unnecessary errors for the parquet::chunk module.	2022-05-20 15:17:40 +01:00
Dom Dwyer	b9a745d42d	feat: RecordBatch stream to Parquet file upload Implements an upload() method on the ParquetStorage type, consuming a stream of RecordBatch, serialising the Parquet file, and uploading the result to object storage. Returns the IOx-specific file metadata. Currently while the upload() method accepts a stream of RecordBatch, the actual resulting Parquet file is buffered in memory before uploading to object store, due to lack of streaming upload functionality in the ObjectStore abstraction - this isn't the end of the world, as the files tend to be relatively small with our current usage. This impl should be easily modified to be fully streaming once streaming object store puts are implemented: https://github.com/influxdata/object_store_rs/issues/9	2022-05-20 15:17:40 +01:00
Carol (Nichols \|\| Goulding)	5fcf18cc02	fix: Add missing assert call around contains tests `contains` is now must_use. Thanks Rust!	2022-05-19 14:39:51 -04:00
Dom Dwyer	baa86d846f	refactor: use ParquetStore instead of ObjectStore Changes the code paths that interact with Parquet files in the object store to reference the ParquetStorage directly (DRY refactor). This change takes us from a dependency graph of: ┌─────────────────┐ │ │ ▼ │ Parquet Consumer │ │ ┌──────────────┐ ├────────▶│ParquetStorage│ ▼ └──────────────┘ ┌──────────────┐ │ ObjectStore │ └──────────────┘ │ ┌────┴────┐ ▼ ▼ File s3 System (etc) to: Parquet Consumer │ ▼ ┌──────────────┐ │ParquetStorage│ └──────────────┘ │ ▼ ┌──────────────┐ │ ObjectStore │ └──────────────┘ │ ┌────┴────┐ ▼ ▼ File s3 System (etc) With the ParquetStorage being solely responsible for managing interactions with the object store when dealing with Parquet files.	2022-05-19 13:52:51 +01:00
Dom Dwyer	d3548653d5	refactor: rename Storage -> ParquetStorage Renames the Storage type so the context is clear in usage (i.e. fn args), rather than having to rely on knowing the fully-qualified import path to know what the type stores.	2022-05-19 13:51:07 +01:00
Dom Dwyer	e20b02b914	refactor: tidy ParquetChunk constructor Removes two unused constructors for a ParquetChunk, and moves the bare fn constructor that is actually used to be an associated method (a conventional constructor).	2022-05-19 13:51:07 +01:00
Marco Neumann	770293a973	feat: add LRU cache metrics (#4632 ) * refactor: require `Resource`s to be convertible to `u64` * refactor: require `Resource`s to have a unit name * refactor: make LRU cache IDs static * feat: add LRU cache metrics * docs: improve type names in LRU doctest * docs: epxlain `MeasuredT` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * docs: explain `test_metrics` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-05-19 08:05:17 +00:00
Marco Neumann	52346642a0	ci: fix cargo deny (#4629 ) * ci: fix cargo deny * chore: downgrade `socket2`, version 0.4.5 was yanked * chore: rename `query` to `iox_query` `query` is already taken on crates.io and yanked and I am getting tired of working around that.	2022-05-18 09:38:35 +00:00
Andrew Lamb	3a33e806c7	chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `14.0.0` (#4619 ) * chore: Update datafusion deps * chore: update arrow/parquet/arrow flight deps * chore: Run cargo hakari tasks * chore: Update location of utils * chore: Update some more APIs Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-05-17 14:13:03 +00:00
Marco Neumann	779f0e9cdf	feat: querier RAM pool (#4593 ) * feat: `SortKey::size` * feat: `FunctionEstimator` * feat: querier RAM pool Let's put all the caches into a single RAM pool, so we can at least somewhat control RAM usage. Note that this does NOT limit the peak memory during query execution though, but should at least stop unlimited cache growth. A follow-up PR will add metrics. * refactor: improve some size calculations Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-17 13:11:20 +00:00
dependabot[bot]	259d2486c1	chore(deps): Bump tokio-util from 0.7.1 to 0.7.2 (#4605 ) Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.1 to 0.7.2. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.1...tokio-util-0.7.2) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-16 11:42:31 +00:00
Nga Tran	9530e73925	chore: move noisy debug to trace and fix some comments (#4598 ) * chore: move noisy debug to trace and fix some comments * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * chore: fix format Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-13 19:18:15 +00:00
Raphael Taylor-Davies	f2bb0fdf77	feat: update to crates.io object_store version (#4595 ) * feat: update to crates.io object_store version * chore: Run cargo hakari tasks * fix: tests * chore: remove object store integration test plumbing Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-05-13 16:26:07 +00:00
kodiakhq[bot]	0f8f294319	Merge branch 'main' into cn/remove-chunk-addr	2022-05-13 13:54:44 +00:00
Carol (Nichols \|\| Goulding)	55313d290a	fix: Update or remove comments that mention NG or OG Connects to #4450.	2022-05-12 16:09:08 -04:00
Carol (Nichols \|\| Goulding)	07c7c75067	fix: Remove ng_chunk method Connects to #4450.	2022-05-12 16:09:08 -04:00
Carol (Nichols \|\| Goulding)	b581a42fde	fix: Rename new_id_for_ng to new_id Connects to #4450.	2022-05-12 16:09:07 -04:00
Carol (Nichols \|\| Goulding)	faba90d992	fix: Remove ChunkAddr	2022-05-12 15:50:41 -04:00
Nga Tran	f9e3495e47	feat: add more metrics for compactor (#4575 ) * feat: add more metrics for compactor * chore: clearer comment	2022-05-12 13:20:43 +00:00
Raphael Taylor-Davies	8b379c83cc	refactor: simplify object_store path handling (#4534 ) * refactor: simplify object_store path handling * fix: aws integration tests * chore: lint * fix: update gcs tests * refactor: move errors into submodules * chore: lint * chore: review feedback * refactor: replace provider with Display * fix: failing tests Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-09 18:43:22 +00:00
Carol (Nichols \|\| Goulding)	d458e390ad	fix: Move allow dead code to be more specific in compactor; remove actually dead code	2022-05-06 16:58:02 -04:00
Jake Goulding	e07bcd40c2	refactor: Remove unused dependencies These were found by iterating over all of the dependencies of each Cargo.toml, then grepping that crate for the dependency's name. If it didn't show up, I attempted to remove it. I left a few dependencies that this process flagged: * generated_types - `pbjson`,`serde`. Apparently used by the generated code. * grpc-router-test-gen - `prost`. Apparently used by the generated code. * influxdb_iox - `heappy`. Doesn't appear used, but is behind enough feature flags that I don't care to reason about and it's already optional. - `tikv_jemalloc_sys`. Appears to be setting a feature flag of an indirect dependency. * iox_gitops_adapter - `k8s_openapi`. Appears to be setting a feature flag of an indirect dependency.	2022-05-06 15:57:58 -04:00
Carol (Nichols \|\| Goulding)	068096e7e1	fix: Rename data_types2 to data_types	2022-05-06 14:45:39 -04:00

1 2 3

135 Commits (4509e3db57096587abb27108495d5595c61afb3c)