Andrew Lamb
e2d871b00b
chore: Update datafusion and arrow/parquet/arrow-flight to `18.0.0` ( #5079 )
...
* chore: Update datafusion to 10.0.0, arrow/parquet/arrow-flight to 18
* chore: Run cargo hakari tasks
* fix: update cargo pin
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-18 15:01:03 +00:00
dependabot[bot]
9b67de2f43
chore(deps): Bump tokio from 1.19.2 to 1.20.0
...
Bumps [tokio](https://github.com/tokio-rs/tokio ) from 1.19.2 to 1.20.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases )
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.19.2...tokio-1.20.0 )
---
updated-dependencies:
- dependency-name: tokio
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-07-14 01:21:43 +00:00
Andrew Lamb
c46e1c6347
chore: Update datafusion + arrow/parquet/arrow-flight to `17.0.0` ( #5021 )
...
* fix: correct nullability declaration of system tables
* chore: Update datafusion and arrow/parquet/arrow-flight
* chore: Run cargo hakari tasks
* fix: Update tests
* fix: Update tests
* fix: predicate pruning
* fix: add some tests
* fix: query_functions
* fix: fix read_buffer test
* fix: fix clippy
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-07 19:22:15 +00:00
Raphael Taylor-Davies
835e1c91c7
chore: update object_store to 0.3.0 ( #4707 )
...
* chore: update object_store to 0.3.0
* chore: review feedback
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-29 21:44:03 +00:00
Marco Neumann
bd6c4659af
refactor: slim down parquet chunk (remove Metadata) ( #4934 )
...
* feat: conversion from `ParquetFile` to `ParquetFilePath`
* refactor: slim down parquet chunk
- ensure it works without binary parquet metadata
- timestamp range is no longer optional (ensured by the NG type system)
- remove table summary: this is only needed for SOME API users. The
compactor can perfectly work without statistics since has the timestamp
range which is sufficient for the current overlap check (we don't use
any other primary key stats at the moment). The querier currently does
NOT use parquet chunks (was replaced by read buffer) but if it will
again in some future it will likely need to find a way to fetch and
cache the statistics.
- the schema is now provided by the API user since it can be
reconstructed using the NG catalog only (and "wrong" column orders are
tolerated as of #4921 )
Ref #4124
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-23 10:55:16 +00:00
Marco Neumann
0fbff981ec
chore(deps): Bump sqlx to 0.6.0 and uuid to 1 ( #4894 )
...
Closes #4889 .
Closes #4890 .
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-17 10:28:28 +00:00
Andrew Lamb
e91d00b10c
chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `16.0.0 ( #4851 )
...
* chore: TEMP Update DataFusion to pre-release
* chore: update arrow et al to 16.0.0
* chore: Run cargo hakari tasks
* fix: update reader read_dictionary API
* chore: Update to real Datafusion release
* fix: Update parquet API
* fix: update test
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-06-14 16:31:40 +00:00
dependabot[bot]
e03bf94420
chore(deps): Bump tokio from 1.18.2 to 1.19.1 ( #4783 )
...
Bumps [tokio](https://github.com/tokio-rs/tokio ) from 1.18.2 to 1.19.1.
- [Release notes](https://github.com/tokio-rs/tokio/releases )
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.18.2...tokio-1.19.1 )
---
updated-dependencies:
- dependency-name: tokio
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-06 14:15:12 +00:00
Andrew Lamb
3592aa52d8
chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0` ( #4743 )
...
* chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0`
* chore: Update APIs
* chore: Run cargo hakari tasks
* feat: normalize parquet file metadata
* chore: update size tests
* chore: add docs on metadata stripping
* chore: TEMP UPDATE TO DF BRANCH
* chore: Update for new API
* fix: Update to latest DF
* fix: cargo hakari
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
2022-06-03 10:32:26 +00:00
Marco Neumann
988bd38e93
refactor: remove unused code ( #4742 )
2022-05-31 11:36:02 +00:00
Marco Neumann
5a95da7327
refactor: do NOT use ANY file IO for parquet reading ( #4741 )
2022-05-31 11:18:24 +00:00
Marco Neumann
79c054ffc9
fix: do NOT block in parquet file IO ( #4727 )
...
* fix: do NOT block in parquet file IO
I think for historical reason we were using blocking IO to read parquet
files. With the current streaming `SendableRecordStream` approach this
is technically NOT required anymore.
Now one might think that the sync-async dance that we did is kinda
harmless, but looking at our producition querier I think it is really
bad. The querier seems to be stuck but looking at `strace` and other
health signal it seems it is not entirely dead. Looking at GDB
backtraces it seems that nearly all threads are busy in
`download_and_scan_parquet`. Looking at the tokio docs
(<https://docs.rs/tokio/1.18.2/tokio/task/fn.spawn_blocking.html >)
for `spawn_blocking` (which is used to start the sync download) this
makes sense: tokio only starts replacement threads for the current
runtime thread (which calls `spawn_blocking`) if this does NOT exceed the
runtime thread limit. However we set the runtime thread limit to the
number of CPU cores available to IOx, so this is a limiting factor. This
means that there are only a few threads left to do actual work (I've
seen postgres data flowing back and forth for example) but tokio is not
able to use its full potential anymore. This is esp. bad because the
sync code in `download_and_scan_parquet` then uses `futures` `block_on`
functionality to call back into async code, so it waits for tokio
itself.
The change is rather simple: just use async task spawns.
* fix: use async IO to write stream to temp file
* fix: do not block tokio thread during parquet file reading
* refactor: ensure parquet IO tasks are cancelled if they are not needed anymore
There is no REAL way to cancel sync tasks, but at least we can try our
best.
2022-05-30 13:32:20 +00:00
Dom Dwyer
70856a645f
feat: streaming RecordBatch -> parquet encoding
...
Implements a streaming RecordBatch to Parquet file serialiser.
This impl automatically discovers the schema of the RecordBatch stream,
and accepts &mut destination types (internalising the handle
cloning/etc) to simplify caller usage.
This encoder returns the resulting FileMetaData to allow callers to
inspect the resulting metadata without reading back the file.
Currently unused / not yet plumbed in.
2022-05-20 15:09:26 +01:00
Andrew Lamb
3a33e806c7
chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `14.0.0` ( #4619 )
...
* chore: Update datafusion deps
* chore: update arrow/parquet/arrow flight deps
* chore: Run cargo hakari tasks
* chore: Update location of utils
* chore: Update some more APIs
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-05-17 14:13:03 +00:00
Raphael Taylor-Davies
f2bb0fdf77
feat: update to crates.io object_store version ( #4595 )
...
* feat: update to crates.io object_store version
* chore: Run cargo hakari tasks
* fix: tests
* chore: remove object store integration test plumbing
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-05-13 16:26:07 +00:00
Jake Goulding
e07bcd40c2
refactor: Remove unused dependencies
...
These were found by iterating over all of the dependencies of each
Cargo.toml, then grepping that crate for the dependency's name. If it
didn't show up, I attempted to remove it.
I left a few dependencies that this process flagged:
* generated_types
- `pbjson`,`serde`. Apparently used by the generated code.
* grpc-router-test-gen
- `prost`. Apparently used by the generated code.
* influxdb_iox
- `heappy`. Doesn't appear used, but is behind enough feature
flags that I don't care to reason about and it's already optional.
- `tikv_jemalloc_sys`. Appears to be setting a feature flag of an
indirect dependency.
* iox_gitops_adapter
- `k8s_openapi`. Appears to be setting a feature flag of an indirect
dependency.
2022-05-06 15:57:58 -04:00
Carol (Nichols || Goulding)
068096e7e1
fix: Rename data_types2 to data_types
2022-05-06 14:45:39 -04:00
Carol (Nichols || Goulding)
0541c6e40f
fix: Remove data_types crate where it's no longer used
2022-05-06 14:45:39 -04:00
Carol (Nichols || Goulding)
ea46830954
fix: Remove iox_object_store crate; move ParquetFilePath to parquet_file
2022-05-06 14:45:36 -04:00
Carol (Nichols || Goulding)
ba8191c1eb
fix: Remove persistence_windows
2022-05-06 11:30:35 -04:00
Andrew Lamb
02893e598c
chore: Update datafusion and upgrade arrow/parquet/arrow-flight to 13 ( #4516 )
...
* chore: Tool for automating arrow version update
* chore: Update datafusion and arrow/parquet/arrow-flight
* fix: update for changes in Arrow API
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-05 00:21:02 +00:00
dependabot[bot]
420c306caa
chore(deps): Bump tokio from 1.17.0 to 1.18.0 ( #4453 )
...
Bumps [tokio](https://github.com/tokio-rs/tokio ) from 1.17.0 to 1.18.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases )
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.17.0...tokio-1.18.0 )
---
updated-dependencies:
- dependency-name: tokio
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-28 08:21:17 +00:00
二手掉包工程师
4b47d723b1
refactor: Rename time to iox_time ( #4416 )
...
Signed-off-by: hi-rustin <rustin.liu@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-26 00:19:59 +00:00
Andrew Lamb
73bed810da
chore: Update arrow, arrow-flight, parquet, tonic, prost, etc ( #4357 )
...
* chore: Update datafusion
* chore: Update arrow/arrow-flight/parquet to 12
* chore: update datafusion correctly
* chore: Update prost, tonic, and dependents
* fix: Fixup some api changes
* fix: Update test output in db
* fix: Update test output in parquet_file
* fix: remove old pbjson types
* fix: Add "--experimental_allow_proto3_optional" flag
* chore: Run cargo hakari tasks
* fix: compile error
* chore: Update heappy
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-20 11:12:17 +00:00
Andrew Lamb
5c69a3f43b
chore: Update deps: datafusion, arrow/arrow-flight/parquet to 11, zstd to 0.11 ( #4119 )
...
* chore: update datafusion
* chore(deps): Bump arrow from 10.0.0 to 11.0.0
Bumps [arrow](https://github.com/apache/arrow-rs ) from 10.0.0 to 11.0.0.
- [Release notes](https://github.com/apache/arrow-rs/releases )
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0 )
---
updated-dependencies:
- dependency-name: arrow
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
* chore(deps): Bump arrow-flight from 10.0.0 to 11.0.0
Bumps [arrow-flight](https://github.com/apache/arrow-rs ) from 10.0.0 to 11.0.0.
- [Release notes](https://github.com/apache/arrow-rs/releases )
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0 )
---
updated-dependencies:
- dependency-name: arrow-flight
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
* chore: update parquet to 11.0.0
* fix: error on create schema, test for same
* fix: upgrade zstd
* chore: Run cargo hakari tasks
* fix: fix logical merge conflict
* fix: hakari
* fix: hakari
* fix: update newly introduced dep
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-24 15:27:36 +00:00
Andrew Lamb
2c3d30ca32
chore: Update datafusion, arrow, flight and parquet ( #4000 )
...
* chore: Update datafusion, arrow, flight and parquet
* fix: api change
* fix: fmt
* fix: update test metadata size
* fix: Update sizes in parquet test
* fix: more metadata size update
2022-03-10 12:24:47 +00:00
Carol (Nichols || Goulding)
8f3e44bf76
refactor: Extract a crate for shared data types in the new design
2022-03-02 12:16:15 -05:00
dependabot[bot]
b63f920d4c
chore(deps): Bump parquet from 9.0.2 to 9.1.0 ( #3828 )
...
* chore(deps): Bump parquet from 9.0.2 to 9.1.0
Bumps [parquet](https://github.com/apache/arrow-rs ) from 9.0.2 to 9.1.0.
- [Release notes](https://github.com/apache/arrow-rs/releases )
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0 )
---
updated-dependencies:
- dependency-name: parquet
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* chore: update chunk size test
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-23 11:25:15 +00:00
dependabot[bot]
3b7d31c88a
chore(deps): Bump arrow from 9.0.2 to 9.1.0 ( #3826 )
...
Bumps [arrow](https://github.com/apache/arrow-rs ) from 9.0.2 to 9.1.0.
- [Release notes](https://github.com/apache/arrow-rs/releases )
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0 )
---
updated-dependencies:
- dependency-name: arrow
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-23 09:25:46 +00:00
dependabot[bot]
ad3868ed7c
chore(deps): Bump tokio from 1.16.1 to 1.17.0 ( #3814 )
...
* chore(deps): Bump tokio from 1.16.1 to 1.17.0
Bumps [tokio](https://github.com/tokio-rs/tokio ) from 1.16.1 to 1.17.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases )
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.16.1...tokio-1.17.0 )
---
updated-dependencies:
- dependency-name: tokio
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* build: update workspace-hack
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom Dwyer <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-22 16:27:43 +00:00
Andrew Lamb
a30803e692
chore: Update datafusion, update `arrow`/`parquet`/`arrow-flight` to 9.0 ( #3733 )
...
* chore: Update datafusion
* chore: Update arrow
* fix: missing updates
* chore: Update cargo.lock
* fix: update for smaller parquet size
* fix: update test for smaller parquet files
* test: ensure parquet_file tests write multiple row groups
* fix: update callsite
* fix: Update for tests
* fix: harkari
* fix: use IoxObjectStore::existing
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-15 12:10:24 +00:00
Marco Neumann
22778a3a80
chore: upgrade rskafka and parking_lot ( #3592 )
2022-02-01 11:50:42 +00:00
Carol (Nichols || Goulding)
bf89162fa5
refactor: Move IoxMetadata to parquet_file
2022-01-31 10:36:33 -05:00
Andrew Lamb
5488c257d1
chore: Update datafusion, upgrade to arrow/parqet/arrow-flight 8.0.0 ( #3517 )
...
* chore: Update datafusion
* chore: update to arrow 8
* fix: update to use new DataFusion APIs
* fix: update case for sortedness
* fix: cargo hakari
2022-01-27 13:33:27 +00:00
Andrew Lamb
dd23056efd
chore: update datafusion, arrow, prost, tonic, pbjson, etc ( #3455 )
...
* chore: update datafusion, arrow, prost, tonic, etc
* fix: update pprof as well
* chore: update hakari
* fix: update pbjson
* chore: update heappy
* fix: hakari
* fix: workaround https://github.com/influxdata/influxdb_iox/issues/3458
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-13 17:07:15 +00:00
Marco Neumann
f3f6f335a9
chore: upgrade to snafu 0.7 ( #3440 )
2022-01-11 19:22:36 +00:00
Carol (Nichols || Goulding)
7499eac067
fix: Disable uuid serde feature; we're not actually serializing any UUIDs
...
Connects to #3117 .
2021-12-06 09:37:31 -05:00
Carol (Nichols || Goulding)
02c297e850
fix: Always specify the parking_lot feature of tokio to get potential perf boost
2021-12-06 09:37:15 -05:00
Carol (Nichols || Goulding)
0b24b3c227
fix: Use a consistent version specifier when depending on the futures crate
2021-12-06 09:37:12 -05:00
Carol (Nichols || Goulding)
9fd4a560f5
feat: Results of running cargo hakari manage-deps
2021-11-19 09:21:57 -05:00
dependabot[bot]
c540b40f05
chore(deps): bump tokio from 1.12.0 to 1.13.0
...
Bumps [tokio](https://github.com/tokio-rs/tokio ) from 1.12.0 to 1.13.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases )
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.12.0...tokio-1.13.0 )
---
updated-dependencies:
- dependency-name: tokio
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2021-11-01 11:21:59 +00:00
Marco Neumann
bc7244c48e
chore: use Rust edition 2021
2021-10-25 10:58:20 +02:00
Andrew Lamb
a82dc6f5f0
chore: Update datafusion + arrow ( #2903 )
...
* chore: Update datafusion to latest, arrow to 6.0.0
* fix: Update tests
* fix: bubble internal error
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-10-19 17:14:08 +00:00
Marco Neumann
d8f35d8ee9
chore: remove unused `parquet_file` => `chrono` dep
2021-10-19 14:45:56 +02:00
Raphael Taylor-Davies
b39e01f7ba
feat: migrate PersistenceWindows to TimeProvider ( #2722 ) ( #2798 )
2021-10-11 20:40:00 +00:00
Raphael Taylor-Davies
afe34751e7
refactor: split out schema crate ( #2781 )
...
* refactor: split out schema crate
* chore: fix doc
2021-10-11 09:45:08 +00:00
Raphael Taylor-Davies
86cee568d5
feat: use upstream pbjson ( #2650 )
...
* feat: use upstream pbjson
* chore: fmt
2021-09-28 16:29:26 +00:00
Marco Neumann
d7b697dfe9
chore: remove unused `object_store` => `tracker` dep
2021-09-22 11:13:40 +02:00
Marco Neumann
9c80d32af5
refactor: use normal google timestamps in parquet metadata again
...
We changed from Google timestamp (which use variable-sized integers) to
our own fixed-sized integer timestamps so that the size of the parquet
metadata does not depend on the timestamp. However with the introduction
of compression this is the case anyways (since slightly different
timestamps lead to different compression results) and we need now
derministic timestamps for tests. So there is now point in using our own
timestamp type. Switching back to the variable-sized type also shrinks
the post-compression results a bit.
2021-09-20 09:34:03 +02:00
Marco Neumann
afc507ae14
feat: compress encoded parquet metadata
...
Depending on the number of columns, this should safe between 60% and
75%.
2021-09-20 09:33:18 +02:00