influxdb

Commit Graph

Author	SHA1	Message	Date
Raphael Taylor-Davies	835e1c91c7	chore: update object_store to 0.3.0 (#4707 ) * chore: update object_store to 0.3.0 * chore: review feedback Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-29 21:44:03 +00:00
Marco Neumann	bd6c4659af	refactor: slim down parquet chunk (remove Metadata) (#4934 ) * feat: conversion from `ParquetFile` to `ParquetFilePath` * refactor: slim down parquet chunk - ensure it works without binary parquet metadata - timestamp range is no longer optional (ensured by the NG type system) - remove table summary: this is only needed for SOME API users. The compactor can perfectly work without statistics since has the timestamp range which is sufficient for the current overlap check (we don't use any other primary key stats at the moment). The querier currently does NOT use parquet chunks (was replaced by read buffer) but if it will again in some future it will likely need to find a way to fetch and cache the statistics. - the schema is now provided by the API user since it can be reconstructed using the NG catalog only (and "wrong" column orders are tolerated as of #4921) Ref #4124 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-23 10:55:16 +00:00
Marco Neumann	0fbff981ec	chore(deps): Bump sqlx to 0.6.0 and uuid to 1 (#4894 ) Closes #4889. Closes #4890. Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-17 10:28:28 +00:00
Andrew Lamb	e91d00b10c	chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `16.0.0 (#4851 ) * chore: TEMP Update DataFusion to pre-release * chore: update arrow et al to 16.0.0 * chore: Run cargo hakari tasks * fix: update reader read_dictionary API * chore: Update to real Datafusion release * fix: Update parquet API * fix: update test Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-06-14 16:31:40 +00:00
dependabot[bot]	e03bf94420	chore(deps): Bump tokio from 1.18.2 to 1.19.1 (#4783 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.18.2 to 1.19.1. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.18.2...tokio-1.19.1) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-06 14:15:12 +00:00
Andrew Lamb	3592aa52d8	chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0` (#4743 ) * chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0` * chore: Update APIs * chore: Run cargo hakari tasks * feat: normalize parquet file metadata * chore: update size tests * chore: add docs on metadata stripping * chore: TEMP UPDATE TO DF BRANCH * chore: Update for new API * fix: Update to latest DF * fix: cargo hakari Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>	2022-06-03 10:32:26 +00:00
Marco Neumann	988bd38e93	refactor: remove unused code (#4742 )	2022-05-31 11:36:02 +00:00
Marco Neumann	5a95da7327	refactor: do NOT use ANY file IO for parquet reading (#4741 )	2022-05-31 11:18:24 +00:00
Marco Neumann	79c054ffc9	fix: do NOT block in parquet file IO (#4727 ) * fix: do NOT block in parquet file IO I think for historical reason we were using blocking IO to read parquet files. With the current streaming `SendableRecordStream` approach this is technically NOT required anymore. Now one might think that the sync-async dance that we did is kinda harmless, but looking at our producition querier I think it is really bad. The querier seems to be stuck but looking at `strace` and other health signal it seems it is not entirely dead. Looking at GDB backtraces it seems that nearly all threads are busy in `download_and_scan_parquet`. Looking at the tokio docs (<https://docs.rs/tokio/1.18.2/tokio/task/fn.spawn_blocking.html>) for `spawn_blocking` (which is used to start the sync download) this makes sense: tokio only starts replacement threads for the current runtime thread (which calls `spawn_blocking`) if this does NOT exceed the runtime thread limit. However we set the runtime thread limit to the number of CPU cores available to IOx, so this is a limiting factor. This means that there are only a few threads left to do actual work (I've seen postgres data flowing back and forth for example) but tokio is not able to use its full potential anymore. This is esp. bad because the sync code in `download_and_scan_parquet` then uses `futures` `block_on` functionality to call back into async code, so it waits for tokio itself. The change is rather simple: just use async task spawns. * fix: use async IO to write stream to temp file * fix: do not block tokio thread during parquet file reading * refactor: ensure parquet IO tasks are cancelled if they are not needed anymore There is no REAL way to cancel sync tasks, but at least we can try our best.	2022-05-30 13:32:20 +00:00
Dom Dwyer	70856a645f	feat: streaming RecordBatch -> parquet encoding Implements a streaming RecordBatch to Parquet file serialiser. This impl automatically discovers the schema of the RecordBatch stream, and accepts &mut destination types (internalising the handle cloning/etc) to simplify caller usage. This encoder returns the resulting FileMetaData to allow callers to inspect the resulting metadata without reading back the file. Currently unused / not yet plumbed in.	2022-05-20 15:09:26 +01:00
Andrew Lamb	3a33e806c7	chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `14.0.0` (#4619 ) * chore: Update datafusion deps * chore: update arrow/parquet/arrow flight deps * chore: Run cargo hakari tasks * chore: Update location of utils * chore: Update some more APIs Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-05-17 14:13:03 +00:00
Raphael Taylor-Davies	f2bb0fdf77	feat: update to crates.io object_store version (#4595 ) * feat: update to crates.io object_store version * chore: Run cargo hakari tasks * fix: tests * chore: remove object store integration test plumbing Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-05-13 16:26:07 +00:00
Jake Goulding	e07bcd40c2	refactor: Remove unused dependencies These were found by iterating over all of the dependencies of each Cargo.toml, then grepping that crate for the dependency's name. If it didn't show up, I attempted to remove it. I left a few dependencies that this process flagged: * generated_types - `pbjson`,`serde`. Apparently used by the generated code. * grpc-router-test-gen - `prost`. Apparently used by the generated code. * influxdb_iox - `heappy`. Doesn't appear used, but is behind enough feature flags that I don't care to reason about and it's already optional. - `tikv_jemalloc_sys`. Appears to be setting a feature flag of an indirect dependency. * iox_gitops_adapter - `k8s_openapi`. Appears to be setting a feature flag of an indirect dependency.	2022-05-06 15:57:58 -04:00
Carol (Nichols \|\| Goulding)	068096e7e1	fix: Rename data_types2 to data_types	2022-05-06 14:45:39 -04:00
Carol (Nichols \|\| Goulding)	0541c6e40f	fix: Remove data_types crate where it's no longer used	2022-05-06 14:45:39 -04:00
Carol (Nichols \|\| Goulding)	ea46830954	fix: Remove iox_object_store crate; move ParquetFilePath to parquet_file	2022-05-06 14:45:36 -04:00
Carol (Nichols \|\| Goulding)	ba8191c1eb	fix: Remove persistence_windows	2022-05-06 11:30:35 -04:00
Andrew Lamb	02893e598c	chore: Update datafusion and upgrade arrow/parquet/arrow-flight to 13 (#4516 ) * chore: Tool for automating arrow version update * chore: Update datafusion and arrow/parquet/arrow-flight * fix: update for changes in Arrow API Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-05 00:21:02 +00:00
dependabot[bot]	420c306caa	chore(deps): Bump tokio from 1.17.0 to 1.18.0 (#4453 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.17.0 to 1.18.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.17.0...tokio-1.18.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-04-28 08:21:17 +00:00
二手掉包工程师	4b47d723b1	refactor: Rename time to iox_time (#4416 ) Signed-off-by: hi-rustin <rustin.liu@gmail.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-26 00:19:59 +00:00
Andrew Lamb	73bed810da	chore: Update arrow, arrow-flight, parquet, tonic, prost, etc (#4357 ) * chore: Update datafusion * chore: Update arrow/arrow-flight/parquet to 12 * chore: update datafusion correctly * chore: Update prost, tonic, and dependents * fix: Fixup some api changes * fix: Update test output in db * fix: Update test output in parquet_file * fix: remove old pbjson types * fix: Add "--experimental_allow_proto3_optional" flag * chore: Run cargo hakari tasks * fix: compile error * chore: Update heappy Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-20 11:12:17 +00:00
Andrew Lamb	5c69a3f43b	chore: Update deps: datafusion, arrow/arrow-flight/parquet to 11, zstd to 0.11 (#4119 ) * chore: update datafusion * chore(deps): Bump arrow from 10.0.0 to 11.0.0 Bumps [arrow](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0) --- updated-dependencies: - dependency-name: arrow dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): Bump arrow-flight from 10.0.0 to 11.0.0 Bumps [arrow-flight](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0) --- updated-dependencies: - dependency-name: arrow-flight dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * chore: update parquet to 11.0.0 * fix: error on create schema, test for same * fix: upgrade zstd * chore: Run cargo hakari tasks * fix: fix logical merge conflict * fix: hakari * fix: hakari * fix: update newly introduced dep Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-24 15:27:36 +00:00
Andrew Lamb	2c3d30ca32	chore: Update datafusion, arrow, flight and parquet (#4000 ) * chore: Update datafusion, arrow, flight and parquet * fix: api change * fix: fmt * fix: update test metadata size * fix: Update sizes in parquet test * fix: more metadata size update	2022-03-10 12:24:47 +00:00
Carol (Nichols \|\| Goulding)	8f3e44bf76	refactor: Extract a crate for shared data types in the new design	2022-03-02 12:16:15 -05:00
dependabot[bot]	b63f920d4c	chore(deps): Bump parquet from 9.0.2 to 9.1.0 (#3828 ) * chore(deps): Bump parquet from 9.0.2 to 9.1.0 Bumps [parquet](https://github.com/apache/arrow-rs) from 9.0.2 to 9.1.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0) --- updated-dependencies: - dependency-name: parquet dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * chore: update chunk size test Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-23 11:25:15 +00:00
dependabot[bot]	3b7d31c88a	chore(deps): Bump arrow from 9.0.2 to 9.1.0 (#3826 ) Bumps [arrow](https://github.com/apache/arrow-rs) from 9.0.2 to 9.1.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0) --- updated-dependencies: - dependency-name: arrow dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-02-23 09:25:46 +00:00
dependabot[bot]	ad3868ed7c	chore(deps): Bump tokio from 1.16.1 to 1.17.0 (#3814 ) * chore(deps): Bump tokio from 1.16.1 to 1.17.0 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.16.1 to 1.17.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.16.1...tokio-1.17.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * build: update workspace-hack Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dom Dwyer <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-22 16:27:43 +00:00
Andrew Lamb	a30803e692	chore: Update datafusion, update `arrow`/`parquet`/`arrow-flight` to 9.0 (#3733 ) * chore: Update datafusion * chore: Update arrow * fix: missing updates * chore: Update cargo.lock * fix: update for smaller parquet size * fix: update test for smaller parquet files * test: ensure parquet_file tests write multiple row groups * fix: update callsite * fix: Update for tests * fix: harkari * fix: use IoxObjectStore::existing Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-15 12:10:24 +00:00
Marco Neumann	22778a3a80	chore: upgrade rskafka and parking_lot (#3592 )	2022-02-01 11:50:42 +00:00
Carol (Nichols \|\| Goulding)	bf89162fa5	refactor: Move IoxMetadata to parquet_file	2022-01-31 10:36:33 -05:00
Andrew Lamb	5488c257d1	chore: Update datafusion, upgrade to arrow/parqet/arrow-flight 8.0.0 (#3517 ) * chore: Update datafusion * chore: update to arrow 8 * fix: update to use new DataFusion APIs * fix: update case for sortedness * fix: cargo hakari	2022-01-27 13:33:27 +00:00
Andrew Lamb	dd23056efd	chore: update datafusion, arrow, prost, tonic, pbjson, etc (#3455 ) * chore: update datafusion, arrow, prost, tonic, etc * fix: update pprof as well * chore: update hakari * fix: update pbjson * chore: update heappy * fix: hakari * fix: workaround https://github.com/influxdata/influxdb_iox/issues/3458 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-01-13 17:07:15 +00:00
Marco Neumann	f3f6f335a9	chore: upgrade to snafu 0.7 (#3440 )	2022-01-11 19:22:36 +00:00
Carol (Nichols \|\| Goulding)	7499eac067	fix: Disable uuid serde feature; we're not actually serializing any UUIDs Connects to #3117.	2021-12-06 09:37:31 -05:00
Carol (Nichols \|\| Goulding)	02c297e850	fix: Always specify the parking_lot feature of tokio to get potential perf boost	2021-12-06 09:37:15 -05:00
Carol (Nichols \|\| Goulding)	0b24b3c227	fix: Use a consistent version specifier when depending on the futures crate	2021-12-06 09:37:12 -05:00
Carol (Nichols \|\| Goulding)	9fd4a560f5	feat: Results of running cargo hakari manage-deps	2021-11-19 09:21:57 -05:00
dependabot[bot]	c540b40f05	chore(deps): bump tokio from 1.12.0 to 1.13.0 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.12.0 to 1.13.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.12.0...tokio-1.13.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2021-11-01 11:21:59 +00:00
Marco Neumann	bc7244c48e	chore: use Rust edition 2021	2021-10-25 10:58:20 +02:00
Andrew Lamb	a82dc6f5f0	chore: Update datafusion + arrow (#2903 ) * chore: Update datafusion to latest, arrow to 6.0.0 * fix: Update tests * fix: bubble internal error Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-10-19 17:14:08 +00:00
Marco Neumann	d8f35d8ee9	chore: remove unused `parquet_file` => `chrono` dep	2021-10-19 14:45:56 +02:00
Raphael Taylor-Davies	b39e01f7ba	feat: migrate PersistenceWindows to TimeProvider (#2722 ) (#2798 )	2021-10-11 20:40:00 +00:00
Raphael Taylor-Davies	afe34751e7	refactor: split out schema crate (#2781 ) * refactor: split out schema crate * chore: fix doc	2021-10-11 09:45:08 +00:00
Raphael Taylor-Davies	86cee568d5	feat: use upstream pbjson (#2650 ) * feat: use upstream pbjson * chore: fmt	2021-09-28 16:29:26 +00:00
Marco Neumann	d7b697dfe9	chore: remove unused `object_store` => `tracker` dep	2021-09-22 11:13:40 +02:00
Marco Neumann	9c80d32af5	refactor: use normal google timestamps in parquet metadata again We changed from Google timestamp (which use variable-sized integers) to our own fixed-sized integer timestamps so that the size of the parquet metadata does not depend on the timestamp. However with the introduction of compression this is the case anyways (since slightly different timestamps lead to different compression results) and we need now derministic timestamps for tests. So there is now point in using our own timestamp type. Switching back to the variable-sized type also shrinks the post-compression results a bit.	2021-09-20 09:34:03 +02:00
Marco Neumann	afc507ae14	feat: compress encoded parquet metadata Depending on the number of columns, this should safe between 60% and 75%.	2021-09-20 09:33:18 +02:00
Marco Neumann	509c07330d	refactor: decouple `parquet_file` from `query`	2021-09-14 18:26:16 +02:00
Marco Neumann	bfaba78dc3	refactor: move `predicate` into its own crate Two reasons: 1. I wanna decouple `parquet_file` from `query` (nearly done, needs a small follow-up PR). 2. `predicate` will have more and more features (like serialization) which justifies a new home	2021-09-14 17:13:02 +02:00
Raphael Taylor-Davies	44918e4afc	feat: migrate chunk metrics (#2491 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-09-09 16:02:16 +00:00

1 2

78 Commits (a48e6ae733418433ad68b52b9c31b5bf1bd1d680)