Andrew Lamb
5c69a3f43b
chore: Update deps: datafusion, arrow/arrow-flight/parquet to 11, zstd to 0.11 ( #4119 )
...
* chore: update datafusion
* chore(deps): Bump arrow from 10.0.0 to 11.0.0
Bumps [arrow](https://github.com/apache/arrow-rs ) from 10.0.0 to 11.0.0.
- [Release notes](https://github.com/apache/arrow-rs/releases )
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0 )
---
updated-dependencies:
- dependency-name: arrow
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
* chore(deps): Bump arrow-flight from 10.0.0 to 11.0.0
Bumps [arrow-flight](https://github.com/apache/arrow-rs ) from 10.0.0 to 11.0.0.
- [Release notes](https://github.com/apache/arrow-rs/releases )
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0 )
---
updated-dependencies:
- dependency-name: arrow-flight
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
* chore: update parquet to 11.0.0
* fix: error on create schema, test for same
* fix: upgrade zstd
* chore: Run cargo hakari tasks
* fix: fix logical merge conflict
* fix: hakari
* fix: hakari
* fix: update newly introduced dep
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-24 15:27:36 +00:00
Marco Neumann
51da6dd7fa
feat: store sort key in NG metadata ( #4110 )
...
The sort key is optional and currently only produced by `iox_tests`.
Writing it within the ingester/compactor is tracked by #3968 . The sort
key is read by the querier (and this will be verified by the query tests
and is required to merge #4103 ).
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-23 18:24:46 +00:00
Dom Dwyer
1d5066c421
refactor: rename ObjectStore -> ObjectStoreImpl
...
Frees up the name for so we can use `dyn ObjectStore` throughout the
code instead of `ObjectStoreApi`.
2022-03-15 16:29:43 +00:00
Carol (Nichols || Goulding)
ecd06c6ec3
fix: ParquetFileRepo create should be responsible for setting INITIAL_COMPACTION_LEVEL
...
When created in the catalog, parquet files should always have compaction
level 0. Updating the compaction level should always happen in the
compactor.
Only the catalog should need to know about the initial compaction level
value.
2022-03-10 13:51:18 -05:00
Carol (Nichols || Goulding)
ff31407dce
refactor: Extract a ParquetFileParams type for create
...
This has the advantages of:
- Not needing to create fake parquet file IDs or fake deleted_at
values that aren't used by create before insertion
- Not needing too many arguments for create
- Naming the arguments so it's easier to see what value is what
argument, especially in tests
- Easier to reuse arguments or parts of arguments by using copies of
params, which makes it easier to see differences, especially in tests
2022-03-10 13:51:18 -05:00
Paul Dix
27999ff72f
feat: add compaction_level and created_at to parquet_file ( #3972 )
2022-03-10 15:56:57 +00:00
Andrew Lamb
2c3d30ca32
chore: Update datafusion, arrow, flight and parquet ( #4000 )
...
* chore: Update datafusion, arrow, flight and parquet
* fix: api change
* fix: fmt
* fix: update test metadata size
* fix: Update sizes in parquet test
* fix: more metadata size update
2022-03-10 12:24:47 +00:00
Nga Tran
c6cab3538f
refactor: move parquet chunk's new and decode to parquet_file crate ( #3987 )
2022-03-08 22:04:32 +00:00
Andrew Lamb
e09f39d6a0
chore: Update datafusion ( #3943 )
...
* chore: Update datafusion
* refactor: update for new datafusion
* chore: Run cargo hakari tasks
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-03-04 19:37:46 +00:00
Andrew Lamb
677a272095
refactor: Clean up some future clippy warnings from nightly ( #3892 )
...
* refactor: clean up new clippy lints
* refactor: complete other cleanups
* fix: ignore overzealous clippy
* fix: re-remove old code
2022-03-03 19:14:27 +00:00
Carol (Nichols || Goulding)
8f3e44bf76
refactor: Extract a crate for shared data types in the new design
2022-03-02 12:16:15 -05:00
Marco Neumann
33851be3a5
chore: upgrade Rust to 1.59 ( #3875 )
...
Mostly a few new clippy crates around `flat_map`, `and_then`, and
"underscore locks" (!!!):
https://rust-lang.github.io/rust-clippy/master/index.html#let_underscore_lock
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-28 15:14:19 +00:00
Raphael Taylor-Davies
2a842fbb1a
feat: correctly sort data and store in catalog metadata ( #3864 )
...
* feat: respect sort order in ChunkTableProvider (#3214 )
feat: persist sort order in catalog (#3845 )
refactor: owned SortKey (#3845 )
* fix: size tests
* refactor: immutable SortKey
* test: test sort order restart (#3845 )
* chore: explicit None for sort key
* chore: test cleanup
* fix: handling of sort keys containing fields
* chore: remove unused selected_sort_key
* chore: more docs
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-25 17:56:27 +00:00
Marco Neumann
f966f4c7a4
feat: create `ParquetChunk` in querier ( #3857 )
...
Adds a small adapter that is able to produce `ParquetChunk`s for NG.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-25 08:54:16 +00:00
Marco Neumann
49d1be30e7
feat: wire up `ParquetFilePath` for NG ( #3853 )
...
It's a bit of a duck-type hack, but if we wanna just `ParquetFileChunk`
in the new architecture, we somehow need it to accept new-gen paths.
Also path handling should be somewhat centralized since
ingester/compactor/querier all need to construct them. So having a
`ParquetFilePath` that supports both path styles seems to be a
not-to-bad solution. This should obviously be cleaned up in some
not-to-distant future.
2022-02-24 16:05:38 +00:00
Carol (Nichols || Goulding)
252ced7adf
feat: Add row count to the parquet_file record in the catalog ( #3847 )
...
Fixes #3842 .
2022-02-24 15:20:50 +00:00
Marco Neumann
d62a052394
feat: extend catalog so we can recover `ParquetChunk`s from it ( #3852 )
...
* refactor: less parquet data copying
* feat: `PartitionRepo::get_by_id`
* feat: `TableRepo::get_by_id`
* feat: `ParquetFile::file_size_bytes`
* feat: `ParquetFile::parquet_metadata`
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-24 13:16:15 +00:00
dependabot[bot]
b63f920d4c
chore(deps): Bump parquet from 9.0.2 to 9.1.0 ( #3828 )
...
* chore(deps): Bump parquet from 9.0.2 to 9.1.0
Bumps [parquet](https://github.com/apache/arrow-rs ) from 9.0.2 to 9.1.0.
- [Release notes](https://github.com/apache/arrow-rs/releases )
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0 )
---
updated-dependencies:
- dependency-name: parquet
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* chore: update chunk size test
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-23 11:25:15 +00:00
dependabot[bot]
3b7d31c88a
chore(deps): Bump arrow from 9.0.2 to 9.1.0 ( #3826 )
...
Bumps [arrow](https://github.com/apache/arrow-rs ) from 9.0.2 to 9.1.0.
- [Release notes](https://github.com/apache/arrow-rs/releases )
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0 )
---
updated-dependencies:
- dependency-name: arrow
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-23 09:25:46 +00:00
dependabot[bot]
ad3868ed7c
chore(deps): Bump tokio from 1.16.1 to 1.17.0 ( #3814 )
...
* chore(deps): Bump tokio from 1.16.1 to 1.17.0
Bumps [tokio](https://github.com/tokio-rs/tokio ) from 1.16.1 to 1.17.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases )
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.16.1...tokio-1.17.0 )
---
updated-dependencies:
- dependency-name: tokio
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* build: update workspace-hack
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom Dwyer <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-22 16:27:43 +00:00
Andrew Lamb
a30803e692
chore: Update datafusion, update `arrow`/`parquet`/`arrow-flight` to 9.0 ( #3733 )
...
* chore: Update datafusion
* chore: Update arrow
* fix: missing updates
* chore: Update cargo.lock
* fix: update for smaller parquet size
* fix: update test for smaller parquet files
* test: ensure parquet_file tests write multiple row groups
* fix: update callsite
* fix: Update for tests
* fix: harkari
* fix: use IoxObjectStore::existing
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-15 12:10:24 +00:00
Carol (Nichols || Goulding)
73828323ac
feat: Ingester Flight gRPC API ( #3623 )
...
* feat: Add a way to run ingester with an in-memory catalog from the CLI
If you set the --catalog-dsn string to "mem", rather than using that as
a Postgres connection URL, create an in-memory catalog.
Planning on using this in tests, so not documenting.
* fix: Set default topic to the same value as SHARED_KAFKA_TOPIC
Namely, both should use an underscore. I don't think there's a way to
directly share these values between a constant and an annotation.
* feat: Add a flight API (handshake only) to ingester
* fix: Create partitions if using file-based write buffer
* fix: Change the server fixture to handle ingester server type
For now, the ingester doesn't implement the deployment API. Not sure if
it should or not.
* feat: Start implementing ingester do_get, namely decoding the query
Skip serialization of the predicate for the moment.
* refactor: Rename ingest protos to ingester to match crate name
* refactor: Rename QueryResults to QueryData
* feat: Move ingester flight client to new querier crate
* fix: Off by one error, different starting indexes in sequencers
* fix: Create new CLI argument to pick the catalog type
* fix: Create a CLI option to set the number of topics to auto-create in the write buffer
* fix: Check the arrow flight service's health to tell that the ingester gRPC is up
* fix: Set postgres as the default catalog type
* fix: Return an error rather than panicking if CLI args aren't right
2022-02-09 19:07:44 +00:00
Carol (Nichols || Goulding)
2e30483f1f
refactor: Remove predicate module from predicate crate ( #3648 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-07 14:54:07 +00:00
Nga Tran
17fbeaaade
feat: insert the persisted info into the catalog in one transaction ( #3636 )
...
* feat: add ProcessedTombstoneRepo
* feat: add function add_parquet_file_with_tombstones
* fix: remove unecessary use
* feat: handling transaction when adding parquet file and its processed tombstones
* feat: tests update catalog for parquet file and processed tombstones
* fix: make add parquet file & its processed tombstones fully transactional
* chore: cleanup
* test: add integration tests for new catalog update functions
* chore: remove catalog_update.rs
* chore: cleanup
* fix: assert the right values
* fix: create unique namespace
* fix: support non transaction create_many
* test: remove tests that do not work in a transaction
* fix: one more case with unique namespace
* chore: more verification around for better understanding why certain tests fail
* fix: compare difference rather than absolute becasue the DB already has data
* fix: fix the argument provided to SQL
* fix: return non-empty processed tombstones
* fix: insert the right parquet file
* chore: remove unsed file
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-07 14:44:15 +00:00
Carol (Nichols || Goulding)
62a2ad289b
feat: Implement deserializing IoxMetadata from protobuf ( #3589 )
...
Fixes #3587 .
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-02 16:05:21 +00:00
Marco Neumann
22778a3a80
chore: upgrade rskafka and parking_lot ( #3592 )
2022-02-01 11:50:42 +00:00
Carol (Nichols || Goulding)
093d5acfd4
fix: Unify temporary multiple definitions of IoxMetadata
2022-01-31 10:48:29 -05:00
Carol (Nichols || Goulding)
8f81ce5501
refactor: Share parquet_file::storage code between new and old metadata
2022-01-31 10:36:33 -05:00
Carol (Nichols || Goulding)
bf89162fa5
refactor: Move IoxMetadata to parquet_file
2022-01-31 10:36:33 -05:00
Carol (Nichols || Goulding)
0f72a881ef
refactor: Rename Rust struct parquet_file::IoxMetadata to be IoxMetadataOld
2022-01-31 10:36:33 -05:00
Carol (Nichols || Goulding)
1b298bb5bd
refactor: Alias the old proto definitions to make clearer the new ones coming in
2022-01-31 10:36:33 -05:00
Dom
32d7c4cbfe
refactor: remove InfluxColumnType::IOx ( #3565 )
...
* refactor: remove InfluxColumnType::IOx
Remove unused column variant - see #3554 for context.
* refactor: reserve SEMANTIC_TYPE_IOX name in proto
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 21:15:36 +00:00
Andrew Lamb
5488c257d1
chore: Update datafusion, upgrade to arrow/parqet/arrow-flight 8.0.0 ( #3517 )
...
* chore: Update datafusion
* chore: update to arrow 8
* fix: update to use new DataFusion APIs
* fix: update case for sortedness
* fix: cargo hakari
2022-01-27 13:33:27 +00:00
Andrew Lamb
dd23056efd
chore: update datafusion, arrow, prost, tonic, pbjson, etc ( #3455 )
...
* chore: update datafusion, arrow, prost, tonic, etc
* fix: update pprof as well
* chore: update hakari
* fix: update pbjson
* chore: update heappy
* fix: hakari
* fix: workaround https://github.com/influxdata/influxdb_iox/issues/3458
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-13 17:07:15 +00:00
Andrew Lamb
cdf5c21cd4
fix: Fix max timestamp value comparison in chunk metadata ( #3453 )
...
* fix: Fix max timestamp value comparison in chunk metadata
* refactor: rename contains to overlaps
Co-authored-by: Edd Robinson <me@edd.io>
2022-01-13 16:58:30 +00:00
Raphael Taylor-Davies
c5cf03511c
fix: parquet column count statistics ( #2124 ) ( #3444 )
...
* fix: parquet metadata total_count (#2124 )
* chore: review feedback
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-11 21:56:24 +00:00
Marco Neumann
f3f6f335a9
chore: upgrade to snafu 0.7 ( #3440 )
2022-01-11 19:22:36 +00:00
Marco Neumann
37bb7f2120
chore: `cargo update`
...
dependabot currently doesn't work due to
https://github.com/dependabot/dependabot-core/issues/4574
Excluded `quote` due to
https://github.com/dtolnay/quote/issues/204
2022-01-11 14:57:51 +01:00
Nga Tran
ec8644a39a
refactor: return clearer error message
2021-12-07 12:24:28 -05:00
Nga Tran
561c5ed8e7
refactor: make checking no data happen during reading inout stream
2021-12-07 12:03:41 -05:00
Nga Tran
c992c82582
chore: Merge branch 'main' into ntran/compact_os_tests
2021-12-07 11:08:12 -05:00
Raphael Taylor-Davies
5fdaa5b4ab
chore: don't panic with invalid parquet ( #3309 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-12-06 21:15:35 +00:00
Carol (Nichols || Goulding)
7499eac067
fix: Disable uuid serde feature; we're not actually serializing any UUIDs
...
Connects to #3117 .
2021-12-06 09:37:31 -05:00
Carol (Nichols || Goulding)
02c297e850
fix: Always specify the parking_lot feature of tokio to get potential perf boost
2021-12-06 09:37:15 -05:00
Carol (Nichols || Goulding)
0b24b3c227
fix: Use a consistent version specifier when depending on the futures crate
2021-12-06 09:37:12 -05:00
Raphael Taylor-Davies
bca561366b
feat: don't copy parquet files out of disk object store ( #3282 ) ( #3293 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-12-05 16:31:40 +00:00
Raphael Taylor-Davies
11067bfe3f
feat: simplify parquet reader ( #3282 ) ( #3291 )
...
* feat: simplify parquet reader (#3282 )
* chore: add back log line
2021-12-03 23:21:58 +00:00
Nga Tran
86f9fe0bcb
refactor: no longer need to create and test no-row-groups parquet files
2021-12-03 15:14:04 -05:00
Nga Tran
152281e428
fix: Capture the right 'no data' while parquet has no data
2021-12-03 12:19:48 -05:00
kodiakhq[bot]
2857b6a990
Merge branch 'main' into er/feat/load_chunk_cli
2021-12-02 20:20:56 +00:00