influxdb

Commit Graph

Author	SHA1	Message	Date
Nga Tran	e56ebb5e31	fix: remove unnecessary warning (#5594 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-09 14:15:54 +00:00
Marco Neumann	adeacf416c	ci: fix (#5569 ) * ci: use same feature set in `build_dev` and `build_release` * ci: also enable unstable tokio for `build_dev` * chore: update tokio to 1.21 (to fix console-subscriber 0.1.8 * fix: "must use"	2022-09-06 14:13:28 +00:00
dependabot[bot]	9f0b0328f7	chore(deps): Bump thiserror from 1.0.33 to 1.0.34 (#5556 ) Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.33 to 1.0.34. - [Release notes](https://github.com/dtolnay/thiserror/releases) - [Commits](https://github.com/dtolnay/thiserror/compare/1.0.33...1.0.34) --- updated-dependencies: - dependency-name: thiserror dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-09-06 09:18:41 +00:00
Marco Neumann	064f0e9b29	refactor: use DataFusion to read parquet files (#5531 ) Remove our own hand-rolled logic and let DataFusion read the parquet files. As a bonus, this now supports predicate pushdown to the deserialization step, so we can use parquets as in in-mem buffer. Note that this currently uses some "nested" DataFusion hack due to the way the `QueryChunk` interface works. Midterm I'll change the interface so that the `ParquetExec` nodes are directly visible to DataFusion instead of some opaque `SendableRecordBatchStream`.	2022-09-05 09:25:04 +00:00
dependabot[bot]	00ed79ff1b	chore(deps): Bump thiserror from 1.0.32 to 1.0.33 (#5524 ) Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.32 to 1.0.33. - [Release notes](https://github.com/dtolnay/thiserror/releases) - [Commits](https://github.com/dtolnay/thiserror/compare/1.0.32...1.0.33) --- updated-dependencies: - dependency-name: thiserror dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-01 09:11:31 +00:00
Andrew Lamb	6669d85fb4	chore: Update datafusion + arrow/parquet to `21.0.0` (#5519 ) * chore: Update arrow/arrow-flight/parquet to 21.0.0 * chore: Update datafusion pin * chore: Fix arrow update script * chore: Update Cargo.lock * chore: Update for new API	2022-08-31 13:30:47 +00:00
Dom Dwyer	7698264768	refactor: raise error for no rows in parquet file Previously when attempting to serialise a stream of one or more RecordBatch containing no rows (resulting in an empty file), the parquet serialisation code would panic. This changes the code path to raise an error instead, to support the compactor making multiple splits at once, which may overlap a single chunk: ────────────── Time ────────────▶ │ │ ┌█████──────────────────────█████┐ │█████ │ Chunk 1 │ █████│ └█████──────────────────────█████┘ │ │ │ │ Split T1 Split T2 In the example above, the chunk has an unusual distribution of write timestamps over the time range it covers, with all data having a timestamp before T1, or after T2. When a running a SplitExec to slice this chunk at T1 and T2, the middle of the resulting 3 subsets will contain no rows. Because we store only the min/max timestamps in the chunk statistics, it is unfortunately impossible to prune one of these split points from the plan ahead of time.	2022-08-30 14:52:31 +02:00
Carol (Nichols \|\| Goulding)	fe9c474620	fix: rustfmt	2022-08-29 14:06:45 -04:00
Carol (Nichols \|\| Goulding)	240946d8f5	fix: Deprecate proto sequencer_id fields; add shard_id fields	2022-08-29 14:06:44 -04:00
Jake Goulding	4abf21c724	refactor: Rename Sequencer (and its entourage) to Shard	2022-08-29 14:06:43 -04:00
Andrew Lamb	7f0ae53d6f	chore: Update to (almost) released object_store 0.4.0 (#5419 ) * chore: update object_store * chore: update hakari config * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-08-17 13:44:48 +00:00
Carol (Nichols \|\| Goulding)	b982bdaf2f	fix: Derive Eq when we derive PartialEq and members can derive Eq Allow this in generated code that we don't control, though. Recommended by clippy now. https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq	2022-08-11 15:04:06 -04:00
Andrew Lamb	16ddc5efc6	chore: Update datafusion / arrow/parquet/arrow-flight and prost/tonic ecosystem (#5360 ) * chore: Update datafusion and arrow * chore: Update Cargo.lock * chore: update to Decimal128 * chore: Update tonic/prost/pbjson/etc * chore: Run cargo hakari tasks * fix: doctest in generated types Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-08-09 17:30:44 +00:00
Marco Neumann	e24cecd926	fix: buffer allocation while reading parquet files (#5312 ) Work around https://github.com/apache/arrow-rs/issues/2321 by limiting reader batch size to number of rows (based on file-level metadata). Fixes https://github.com/influxdata/conductor/issues/1103 .	2022-08-04 14:21:05 +00:00
dependabot[bot]	55e1e2ec2b	chore(deps): Bump thiserror from 1.0.31 to 1.0.32 (#5294 ) * chore(deps): Bump thiserror from 1.0.31 to 1.0.32 Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.31 to 1.0.32. - [Release notes](https://github.com/dtolnay/thiserror/releases) - [Commits](https://github.com/dtolnay/thiserror/compare/1.0.31...1.0.32) --- updated-dependencies: - dependency-name: thiserror dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * chore: Run cargo hakari tasks Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-03 16:20:36 +00:00
Nga Tran	ee151c8b41	fix: make batch size back to 1024 to see if the OOM in the compactor go away (#5289 ) * fix: make batch size back to 1024 to see if the OOM in the compactor go away * fix: address review comments * chore: Apply suggestions from code review Co-authored-by: Marco Neumann <marco@crepererum.net> * fix: import needed constant Co-authored-by: Marco Neumann <marco@crepererum.net> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-03 15:13:04 +00:00
Nga Tran	8f1b6f2465	chore: reduce log info (#5254 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-01 16:00:34 +00:00
Marco Neumann	3ae89de324	refactor: increase batch size when reading parquet (#5250 ) * refactor: increase batch size when reading parquet This reduced our overhead when reading parquet files quite a lot. In some internal benchmark, this reduces the size to perform a single series aggregation of a rather large series with cold caches from 58s to 48s for cold caches. No real difference could be measured for warm caches (~21ms for both). This should also help the compactor since the record batches should be larger. * refactor: ensure that parquet row group size is in-sync Ensure that we use the same row group size for reading and writing parquet files. This is the same value as upstream currently uses as a default, but let's make sure we don't diverge from that: `3032a521c9/parquet/src/file/properties.rs (L65)` Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-01 10:31:26 +00:00
Andrew Lamb	9215a534d0	chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0` (#5229 ) * chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0` * chore: Run cargo hakari tasks * fix: Update for API changes * fix: clippy Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-28 08:10:47 +00:00
Nga Tran	d05f383a98	refactor: reduce compacting size and compacted file size to prevent compactor from waiting for reading a large file forever (#5206 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-25 20:08:11 +00:00
Nga Tran	69cb3f2b19	refactor: remove min_sequence_number from Compactor and Querier, add `count_by_overlaps_with_level_0` and `count_by_overlaps_with_level_1` to catalog (#5151 ) * refactor: remove min_sequnce_number * fix: typos * fix: remove min_sequencer_number from new files from merging main * fix: add back throwing error if the compactor compacts files persisted by the ingester after the ingester sends max seq_num back to querier * test: add test_compactor_collision back but modify the input to make it work woth new changes Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-21 13:51:54 +00:00
dependabot[bot]	278a7f91af	chore(deps): Bump bytes from 1.1.0 to 1.2.0 (#5156 ) Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.1.0 to 1.2.0. - [Release notes](https://github.com/tokio-rs/bytes/releases) - [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md) - [Commits](https://github.com/tokio-rs/bytes/compare/v1.1.0...v1.2.0) --- updated-dependencies: - dependency-name: bytes dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-20 10:00:08 +00:00
Andrew Lamb	e2d871b00b	chore: Update datafusion and arrow/parquet/arrow-flight to `18.0.0` (#5079 ) * chore: Update datafusion to 10.0.0, arrow/parquet/arrow-flight to 18 * chore: Run cargo hakari tasks * fix: update cargo pin Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-18 15:01:03 +00:00
Jake Goulding	635f535e0e	refactor: replace level_2 with level_1	2022-07-16 21:49:45 -04:00
dependabot[bot]	9b67de2f43	chore(deps): Bump tokio from 1.19.2 to 1.20.0 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.19.2 to 1.20.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.19.2...tokio-1.20.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2022-07-14 01:21:43 +00:00
Carol (Nichols \|\| Goulding)	61c023139b	refactor: Switch compaction levels to an enum with values rather than separate consts Bonuses: - Type checking - Validation - Less casting - Exhaustiveness checking - Less use of the numerical value	2022-07-13 11:30:36 -04:00
Marco Neumann	b1b2cb5d4a	feat: load read buffer on demand (#5091 ) * refactor: extract `select_schema` * refactor: improve `InternalLostInputField` error message * test: improve SQL runner output * feat: load read buffer on demand Closes #5032. * refactor: move `[Half]OwnedSelection` to `schema` crate`	2022-07-13 08:51:40 +00:00
Carol (Nichols \|\| Goulding)	80b6c5c82f	fix: Correct typo in constant name so searching for COMPACTION_LEVEL returns all (#5077 )	2022-07-08 16:31:52 +00:00
Andrew Lamb	c46e1c6347	chore: Update datafusion + arrow/parquet/arrow-flight to `17.0.0` (#5021 ) * fix: correct nullability declaration of system tables * chore: Update datafusion and arrow/parquet/arrow-flight * chore: Run cargo hakari tasks * fix: Update tests * fix: Update tests * fix: predicate pruning * fix: add some tests * fix: query_functions * fix: fix read_buffer test * fix: fix clippy Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-07 19:22:15 +00:00
Marco Neumann	16bd3e67c0	refactor: unify `apply_predicate_to_metadata` (#5030 ) Instead of using some hand-rolled timestamp-based logic (or just "unknown") all over the place, just use logic introduced in #5017. This requires slightly improved table summaries within the querier that at least has min/max for the timestamp column. For that, the former `IngesterChunk`-specific `calculate_summary` method was extended to `create_basic_summary` to include that data and is now also used by `QuerierParquetChunk`. Note: `QuerierRBChunk` already has detailled metrics that are provided by the read buffer implementation. Should we ever need even better pruning for `QuerierParquetChunk` (or `IngesterChunk`) then we _only_ need add extra data to the table summaries. Closes #4976. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-05 12:51:59 +00:00
Marco Neumann	016dd93d9c	feat: filter chunks before requesting read buffers (#4996 ) Fixes #4976.	2022-07-01 08:59:07 +00:00
Marco Neumann	be53716e4d	refactor: use IDs for `parquet_file.column_set` (#4965 ) * feat: `ColumnRepo::list_by_table_id` * refactor: use IDs for `parquet_file.column_set` Closes #4959. * refactor: introduce `TableSchema::column_id_map`	2022-06-30 15:08:41 +00:00
Marco Neumann	05189fdb00	fix: ensure panic is propagated throw `AdapterStream` (#4980 ) Prior to this change background tasks that we feed into `AdapterStream` can panic but that would just end the stream without any user-visible error (except for the panic message on stdout/stderr). This was found while developing #4964. I have proposed another fix in #4966 but found that I actually developed an existing solution a 2nd time: `watch_task`. But I also see a major issue with the existing API: one can create `AdapterStream` with ordinary tokio tasks that are not watched at all, leaving the burden to the implementor to check for that (and actually we forgot that in `parquet_file`). So this change takes a slightly different approach: The `AdapterStream` does NOT accept ordinary join handles any longer but requires that you pass a "watched task". The newly introduced `WatchedTask` does the same as we did manually before: wrapping a future into a tokio task, watch it and wrap the watcher into a task. It is now way more difficult to do anything stupid (sure you can still mix up the tasks and the channels, but we need at least some flexibility here to allow for "split" and potential future fan-in/out constructs). Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-30 07:57:58 +00:00
Raphael Taylor-Davies	835e1c91c7	chore: update object_store to 0.3.0 (#4707 ) * chore: update object_store to 0.3.0 * chore: review feedback Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-29 21:44:03 +00:00
Nga Tran	cfcc4b8426	refactor: change level 1 to level 2 preparing for next design changes (#4954 ) * refactor: change level 1 to level 2 preparing for next design changes * fix: make level-2 consistent everywhere * chore: remove unused comments * refactor: change all the name level_1 to level_2 to completely replace 1 with 2 to amke everything consistent * chore: add correspinding constants for the comapction levels in the comments Co-authored-by: Dom <dom@itsallbroken.com>	2022-06-29 14:08:58 +00:00
Marco Neumann	9b66d02229	fix: parquet reader sort order (#4964 )	2022-06-28 15:28:38 +00:00
Marco Neumann	215f297162	refactor: parquet file metadata from catalog (#4949 ) * refactor: remove `ParquetFileWithMetadata` * refactor: remove `ParquetFileRepo::parquet_metadata` * refactor: parquet file metadata from catalog Closes #4124.	2022-06-27 15:38:39 +00:00
Marco Neumann	1a74f84494	refactor: remove `ParquetFileWithMetadata` usage outside the catalog (#4948 ) * refactor: remove `DecodedParquetFile` from `iox_tests` * refactor: remove `DecodedParquetFile` from querier Also pull out all the chunk schema and sort key handling into a function so that RB chunks and parquet chunks mostly use the same code path. * refactor: remove `DecodedParquetFile` * refactor: remove `ParquetFileWithMetadata` usage * fix: test data consistency	2022-06-27 15:19:29 +00:00
Nga Tran	3c0fb6e8ef	fix: avoid using min_time, which can be negative, for ChunkId. Using object store id which is uuid instead (#4942 ) * fix: avoid using min_time, which can be negative, for ChunkId. Using object store id which is uuid instead * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * chore: run fmt Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-23 19:00:13 +00:00
Marco Neumann	bd6c4659af	refactor: slim down parquet chunk (remove Metadata) (#4934 ) * feat: conversion from `ParquetFile` to `ParquetFilePath` * refactor: slim down parquet chunk - ensure it works without binary parquet metadata - timestamp range is no longer optional (ensured by the NG type system) - remove table summary: this is only needed for SOME API users. The compactor can perfectly work without statistics since has the timestamp range which is sufficient for the current overlap check (we don't use any other primary key stats at the moment). The querier currently does NOT use parquet chunks (was replaced by read buffer) but if it will again in some future it will likely need to find a way to fetch and cache the statistics. - the schema is now provided by the API user since it can be reconstructed using the NG catalog only (and "wrong" column orders are tolerated as of #4921) Ref #4124 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-23 10:55:16 +00:00
Marco Neumann	c899c3a0f4	fix: column handling when reading parquet files (#4921 ) * fix: column handling when reading parquet files This improves/fixes/tests a few aspects when reading parquet files: - fix usage of `Selection::Some(...)`. This was broken since #4912 but apparently no test caught that. - ensure that the order of `Selection::Some(...)` is preserved - ensure that schema metadata is attached to output batches - ignore parquet columns that we don't care about (i.e. do not select) - allow parquet file to have a different column order than our internal bookkeeping, this makes it way simpler to read parquet files w/o scanning the metadata first - extend the test coverage Ref #4124. * test: even more tests for parquet reader	2022-06-22 13:51:30 +00:00
Marco Neumann	59accfe862	refactor: assorted fixes and prep work for #4124 (#4912 ) * refactor: `TestPartition::update_sort_key` should return an `Arc` The whole test framework is built around `Arc`s, so let's fix this consistency issue. * fix: actually calculate correct column set in test framework * feat: check expected parquet file schema While working on the querier I made some mistakes regarding schemas and such a check would have greatly improved the debugging experience. * feat: namespace cache expiration * fix: improve parquet schema check * fix: remove clone	2022-06-21 16:08:28 +00:00
Marco Neumann	0f63be26c3	refactor: pass path instead of metadata around to load parquet files (#4909 )	2022-06-21 10:57:10 +00:00
Marco Neumann	c3912e34e9	refactor: store per-file column set in catalog (#4908 ) * refactor: store per-file column set in catalog Together with the table-wide schema and the partition-wide sort key, this should be everything we need to read a parquet file directly into memory without peeking any file-level metadata. The querier will use this to directly load parquet files into the read buffer. WARNING: This requires a catalog wipe! Ref #4124. * refactor: use proper `ColumnSet` type	2022-06-21 10:26:12 +00:00
Marco Neumann	0fbff981ec	chore(deps): Bump sqlx to 0.6.0 and uuid to 1 (#4894 ) Closes #4889. Closes #4890. Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-17 10:28:28 +00:00
Andrew Lamb	e91d00b10c	chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `16.0.0 (#4851 ) * chore: TEMP Update DataFusion to pre-release * chore: update arrow et al to 16.0.0 * chore: Run cargo hakari tasks * fix: update reader read_dictionary API * chore: Update to real Datafusion release * fix: Update parquet API * fix: update test Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-06-14 16:31:40 +00:00
Dom Dwyer	b41ea1d718	refactor: PartitionKey type This commit changes the code base to use a new reference-counted PartitionKey type wrapper, instead of passing a bare String around. This allows the compiler to type check & verify usage of the partition key, instead of passing a bare string around. By reference counting the underlying string, we reduce memory usage for some use cases.	2022-06-14 14:47:56 +01:00
dependabot[bot]	e03bf94420	chore(deps): Bump tokio from 1.18.2 to 1.19.1 (#4783 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.18.2 to 1.19.1. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.18.2...tokio-1.19.1) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-06 14:15:12 +00:00
Andrew Lamb	3592aa52d8	chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0` (#4743 ) * chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0` * chore: Update APIs * chore: Run cargo hakari tasks * feat: normalize parquet file metadata * chore: update size tests * chore: add docs on metadata stripping * chore: TEMP UPDATE TO DF BRANCH * chore: Update for new API * fix: Update to latest DF * fix: cargo hakari Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>	2022-06-03 10:32:26 +00:00
Ryan Russell	d279deddad	docs(various): Improve Readability (#4768 ) Signed-off-by: Ryan Russell <git@ryanrussell.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-02 18:01:06 +00:00

1 2 3 4 5 ...

487 Commits (c62c7d32b1fad95dc3b4d11fe8c9a7dcd3dae891)