dependabot[bot]
b1572c50a6
chore(deps): Bump once_cell from 1.15.0 to 1.16.0 ( #6009 )
...
Bumps [once_cell](https://github.com/matklad/once_cell ) from 1.15.0 to 1.16.0.
- [Release notes](https://github.com/matklad/once_cell/releases )
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md )
- [Commits](https://github.com/matklad/once_cell/compare/v1.15.0...v1.16.0 )
---
updated-dependencies:
- dependency-name: once_cell
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-31 16:23:40 +00:00
Carol (Nichols || Goulding)
3145e2c05b
feat: Use workspace dep inheritance for the arrow crate
2022-10-26 10:34:29 -04:00
Carol (Nichols || Goulding)
44936f661a
feat: Use workspace dep inheritance for datafusion instead of shim crate
2022-10-26 10:33:56 -04:00
Carol (Nichols || Goulding)
2e83e04eab
feat: Use workspace package metadata to reduce differences and repetition
2022-10-24 13:04:09 -04:00
Marco Neumann
e0062f2d40
refactor: do NOT use fake DF context for parquet reading ( #5942 )
...
Use the proper top-level DataFusion context and register the object
store there.
Note that we still hide the `ParquetExec` behind an opaque record batch
stream. Fixing that is next on my list.
Helps with #5897 .
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-24 08:20:26 +00:00
dependabot[bot]
9fb1433317
chore(deps): Bump futures from 0.3.24 to 0.3.25 ( #5938 )
...
Bumps [futures](https://github.com/rust-lang/futures-rs ) from 0.3.24 to 0.3.25.
- [Release notes](https://github.com/rust-lang/futures-rs/releases )
- [Changelog](https://github.com/rust-lang/futures-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/rust-lang/futures-rs/compare/0.3.24...0.3.25 )
---
updated-dependencies:
- dependency-name: futures
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-21 11:03:20 +00:00
Marco Neumann
42b89ade03
refactor: use `SendableRecordBatchStream` to write parquets ( #5911 )
...
Use a proper typed stream instead of peeking the first element. This is
more in line with our remaining stack and shall also improve error
handling.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-19 12:59:53 +00:00
Marco Neumann
eb5a661ab3
refactor: prep work for #5897 ( #5907 )
...
* refactor: add ID to `ParquetStorage`
* refactor: remove duplicate code
* refactor: use dedicated `StorageId`
2022-10-19 11:54:42 +00:00
Andrew Lamb
d706f8221d
chore: Update datafusion and arrow / parquet / arrow-flight 25.0.0 ( #5900 )
...
* chore: Update datafusion and `arrow` / `parquet` / `arrow-flight` 25.0.0
* chore: Update for structure changes
* chore: Update for new projection pushdown
* chore: Run cargo hakari tasks
* fix: fmt
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-18 20:58:47 +00:00
Carol (Nichols || Goulding)
efb964c390
feat: Enforce table column limits from the schema cache ( #5819 )
...
* fix: Avoid some allocations by collecting instead of inserting into a vec
* refactor: Encode that adding columns is for one table at a time
* test: Add another test of column limits
* test: Add below/above limit tests for create_or_get_many
* fix: Explicitly DO NOT check column limits when inserting many columns
* feat: Cache the max_columns_per_table on the NamespaceSchema
* feat: Add a function to validate column limits in-memory
* fix: Provide more useful information when over column limits
* fix: Swap types to remove intermediate allocation
* docs: Explain the interactions of the cache and the column limits
* test: Actually set up test that showcases column limit race condition
* fix: Allow writing to existing columns even if table is over column limit
Co-authored-by: Dom <dom@itsallbroken.com>
2022-10-14 11:34:17 +00:00
Andrew Lamb
d57c99638c
chore: Update datafusion + `arrow`, `arrow-flight`, and `parquet` to 24.0.0.0 ( #5792 )
...
* chore: Update datafusion + `arrow`, `arrow-flight`, and `parquet` to 24.0.0.0
* fix: Update for coercion, fix explain plans for change in column name display
* chore: Update datafusion lock
* fix: Update for other API changes
* chore: Update to latest datafusion pin
* chore: Run cargo hakari tasks
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-12 16:19:14 +00:00
dependabot[bot]
933493fab3
chore(deps): Bump object_store from 0.5.0 to 0.5.1
...
Bumps [object_store](https://github.com/apache/arrow-rs ) from 0.5.0 to 0.5.1.
- [Release notes](https://github.com/apache/arrow-rs/releases )
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md )
- [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.0...object_store_0.5.1 )
---
updated-dependencies:
- dependency-name: object_store
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-10-11 01:19:10 +00:00
Jake Goulding
8187b003c3
refactor: improve panic message
2022-09-29 12:33:26 -04:00
Dom Dwyer
cd4087e00d
style: add no todo!() or dbg!() lints
...
Some crates had theme, some not - lets be consistent and have the
compiler spot dbg!() and todo!() macro calls - they should never be in
prod code!
2022-09-29 13:10:07 +02:00
Andrew Lamb
66dbb9541f
chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to 23.0.0, `thrift` to 0.16.0 ( #5694 )
...
* chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to 23.0.0
* chore: Update thrift / remove parquet_format
* fix: Update APIs
* chore: Update lock + Run cargo hakari tasks
* fix: use patched version of arrow-rs to work around https://github.com/apache/arrow-rs/issues/2779
* chore: Run cargo hakari tasks
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-27 12:50:54 +00:00
dependabot[bot]
0d18943ad2
chore(deps): Bump once_cell from 1.14.0 to 1.15.0 ( #5701 )
...
Bumps [once_cell](https://github.com/matklad/once_cell ) from 1.14.0 to 1.15.0.
- [Release notes](https://github.com/matklad/once_cell/releases )
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md )
- [Commits](https://github.com/matklad/once_cell/compare/v1.14.0...v1.15.0 )
---
updated-dependencies:
- dependency-name: once_cell
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-21 05:20:55 +00:00
Carol (Nichols || Goulding)
c0c0349bc5
fix: Use typed Time values rather than ns
2022-09-19 12:59:20 -04:00
Carol (Nichols || Goulding)
a284cebb51
refactor: Store estimated bytes on the CompactorParquetFile
2022-09-15 11:10:14 -04:00
Andrew Lamb
f86d3e31da
chore: Update datafusion + object_store ( #5619 )
...
* chore: Update datafusion pin
* chore: update object_store to 0.5.0
* chore: Run cargo hakari tasks
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-13 12:34:54 +00:00
Andrew Lamb
1fd31ee3bf
chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0 ( #5591 )
...
* chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0
* fix: enable dynamic comparison flag
* chore: derive Eq for clippy
* chore: update explain plans
* chore: Update sizes for ReadBuffer encoding
* chore: update more tests
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-12 17:45:03 +00:00
Carol (Nichols || Goulding)
85fb0acea6
refactor: Extract read_parquet_file test helper function to iox_tests::utils
2022-09-07 13:21:28 -04:00
YIXIAO SHI
52ae60bf2e
chore: fix comment typo ( #5551 )
...
Co-authored-by: Dom <dom@itsallbroken.com>
2022-09-07 08:49:29 +00:00
dependabot[bot]
366c4d9965
chore(deps): Bump once_cell from 1.13.1 to 1.14.0 ( #5555 )
...
Bumps [once_cell](https://github.com/matklad/once_cell ) from 1.13.1 to 1.14.0.
- [Release notes](https://github.com/matklad/once_cell/releases )
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md )
- [Commits](https://github.com/matklad/once_cell/compare/v1.13.1...v1.14.0 )
---
updated-dependencies:
- dependency-name: once_cell
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-06 09:02:28 +00:00
Marco Neumann
f45cbfb88d
refactor: fine-grained file size mocking ( #5541 )
...
* refactor: do not override parquet file size in querier
This is going to be an issue when we actually rely on the size for
reading, see #5531 .
* refactor: use selected file size mocking in compactor
Do not blindly override parquet file sizes for all subsystems.
This is going to be an issue when we actually rely on the size for
reading, see #5531 .
* refactor: remove ability to override file sizes in catalog
Blindly overriding data for all subsystems is dangerous, because some
parts of our stack actually rely on the actual file size. See #5531 .
* docs: explain `size_overrides`
2022-09-05 08:50:04 +00:00
Carol (Nichols || Goulding)
62b8819d49
fix: Carry object store ID through test builder, but pick new every time
2022-08-31 14:58:46 -04:00
Carol (Nichols || Goulding)
a9d664d0bf
feat: Add a way to set the row count on Parquet file catalog entries
...
And only allow setting this when no record batch or line protocol is
specified so that there isn't a way to create a parquet file with data
that has a mismatched row count.
2022-08-31 14:36:42 -04:00
Carol (Nichols || Goulding)
c21ac9050b
refactor: Extract a test util fn that will only create parquet file catalog records
2022-08-31 14:00:00 -04:00
Andrew Lamb
6669d85fb4
chore: Update datafusion + arrow/parquet to `21.0.0` ( #5519 )
...
* chore: Update arrow/arrow-flight/parquet to 21.0.0
* chore: Update datafusion pin
* chore: Fix arrow update script
* chore: Update Cargo.lock
* chore: Update for new API
2022-08-31 13:30:47 +00:00
dependabot[bot]
0137db9adc
chore(deps): Bump futures from 0.3.23 to 0.3.24
...
Bumps [futures](https://github.com/rust-lang/futures-rs ) from 0.3.23 to 0.3.24.
- [Release notes](https://github.com/rust-lang/futures-rs/releases )
- [Changelog](https://github.com/rust-lang/futures-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/rust-lang/futures-rs/compare/0.3.23...0.3.24 )
---
updated-dependencies:
- dependency-name: futures
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
2022-08-30 01:24:21 +00:00
Carol (Nichols || Goulding)
58f0b63cdc
refactor: Rename KafkaTopic to Topic or TopicMetadata or topic name as appropriate
2022-08-29 14:27:02 -04:00
Carol (Nichols || Goulding)
74c9529062
fix: Rename KafkaPartition to ShardIndex
2022-08-29 14:07:18 -04:00
Carol (Nichols || Goulding)
fe9c474620
fix: rustfmt
2022-08-29 14:06:45 -04:00
Carol (Nichols || Goulding)
698f1a47ff
refactor: Rename test structures from sequencer to shard where appropriate
2022-08-29 14:06:44 -04:00
Jake Goulding
4abf21c724
refactor: Rename Sequencer (and its entourage) to Shard
2022-08-29 14:06:43 -04:00
Andrew Lamb
7f0ae53d6f
chore: Update to (almost) released object_store 0.4.0 ( #5419 )
...
* chore: update object_store
* chore: update hakari config
* chore: Run cargo hakari tasks
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-08-17 13:44:48 +00:00
dependabot[bot]
78665d3092
chore(deps): Bump once_cell from 1.13.0 to 1.13.1 ( #5413 )
...
Bumps [once_cell](https://github.com/matklad/once_cell ) from 1.13.0 to 1.13.1.
- [Release notes](https://github.com/matklad/once_cell/releases )
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md )
- [Commits](https://github.com/matklad/once_cell/compare/v1.13.0...v1.13.1 )
---
updated-dependencies:
- dependency-name: once_cell
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-17 08:31:46 +00:00
dependabot[bot]
0a44a42999
chore(deps): Bump futures from 0.3.21 to 0.3.23 ( #5394 )
...
Bumps [futures](https://github.com/rust-lang/futures-rs ) from 0.3.21 to 0.3.23.
- [Release notes](https://github.com/rust-lang/futures-rs/releases )
- [Changelog](https://github.com/rust-lang/futures-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/rust-lang/futures-rs/compare/0.3.21...0.3.23 )
---
updated-dependencies:
- dependency-name: futures
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-15 12:19:17 +00:00
Andrew Lamb
16ddc5efc6
chore: Update datafusion / arrow/parquet/arrow-flight and prost/tonic ecosystem ( #5360 )
...
* chore: Update datafusion and arrow
* chore: Update Cargo.lock
* chore: update to Decimal128
* chore: Update tonic/prost/pbjson/etc
* chore: Run cargo hakari tasks
* fix: doctest in generated types
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-08-09 17:30:44 +00:00
Andrew Lamb
9215a534d0
chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0` ( #5229 )
...
* chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0`
* chore: Run cargo hakari tasks
* fix: Update for API changes
* fix: clippy
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-28 08:10:47 +00:00
Nga Tran
69cb3f2b19
refactor: remove min_sequence_number from Compactor and Querier, add `count_by_overlaps_with_level_0` and `count_by_overlaps_with_level_1` to catalog ( #5151 )
...
* refactor: remove min_sequnce_number
* fix: typos
* fix: remove min_sequencer_number from new files from merging main
* fix: add back throwing error if the compactor compacts files persisted by the ingester after the ingester sends max seq_num back to querier
* test: add test_compactor_collision back but modify the input to make it work woth new changes
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-21 13:51:54 +00:00
dependabot[bot]
278a7f91af
chore(deps): Bump bytes from 1.1.0 to 1.2.0 ( #5156 )
...
Bumps [bytes](https://github.com/tokio-rs/bytes ) from 1.1.0 to 1.2.0.
- [Release notes](https://github.com/tokio-rs/bytes/releases )
- [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md )
- [Commits](https://github.com/tokio-rs/bytes/compare/v1.1.0...v1.2.0 )
---
updated-dependencies:
- dependency-name: bytes
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-20 10:00:08 +00:00
Nga Tran
c8f4000f04
feat: Select compaction candidates ( #5131 )
...
* feat: initial implementation for selecting compaction candidates
* feat: 2 catalog functions to choose the most thorughput partitions to compact and the selecting candidate function itself
* test: tests for the new 2 queries
* feat: more tests and metrics for chooing compaction candidates
* chore: Apply self suggestions from self review
* chore: cleanup
* chore: fix doc comment
* chore: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* refactor: address review comments
* fix: get the right time provider for the tests
* refactor: remove the left over compaction_
* fix: typos
* fix: make the param name and env name consistent
* refactor: make relevant iSomething to uSomething
* fix: typo
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2022-07-18 18:05:13 +00:00
Andrew Lamb
e2d871b00b
chore: Update datafusion and arrow/parquet/arrow-flight to `18.0.0` ( #5079 )
...
* chore: Update datafusion to 10.0.0, arrow/parquet/arrow-flight to 18
* chore: Run cargo hakari tasks
* fix: update cargo pin
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-18 15:01:03 +00:00
Carol (Nichols || Goulding)
de74415cbe
feat: Gather parquet files for a partition compaction operation
...
Fixes #5118 .
Given a partition ID, look up the non-deleted Parquet files for that
partition. Separate them into level 0 and level 1, and sort the level 0
files by max sequence number.
This is not called anywhere yet.
2022-07-13 16:53:21 -04:00
Carol (Nichols || Goulding)
61c023139b
refactor: Switch compaction levels to an enum with values rather than separate consts
...
Bonuses:
- Type checking
- Validation
- Less casting
- Exhaustiveness checking
- Less use of the numerical value
2022-07-13 11:30:36 -04:00
Marco Neumann
607831585c
refactor: use less executors and threads during tests ( #5086 )
...
`Executor` is only used as a performance boundary, not as a correctness
or data boundary so let's try to re-use it. This also simplifies
profiling of tests since we don't end up with hundreds (or even
thousands) of threads.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-11 16:23:22 +00:00
Andrew Lamb
c46e1c6347
chore: Update datafusion + arrow/parquet/arrow-flight to `17.0.0` ( #5021 )
...
* fix: correct nullability declaration of system tables
* chore: Update datafusion and arrow/parquet/arrow-flight
* chore: Run cargo hakari tasks
* fix: Update tests
* fix: Update tests
* fix: predicate pruning
* fix: add some tests
* fix: query_functions
* fix: fix read_buffer test
* fix: fix clippy
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-07 19:22:15 +00:00
Marco Neumann
be53716e4d
refactor: use IDs for `parquet_file.column_set` ( #4965 )
...
* feat: `ColumnRepo::list_by_table_id`
* refactor: use IDs for `parquet_file.column_set`
Closes #4959 .
* refactor: introduce `TableSchema::column_id_map`
2022-06-30 15:08:41 +00:00
Raphael Taylor-Davies
835e1c91c7
chore: update object_store to 0.3.0 ( #4707 )
...
* chore: update object_store to 0.3.0
* chore: review feedback
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-29 21:44:03 +00:00
Nga Tran
cfcc4b8426
refactor: change level 1 to level 2 preparing for next design changes ( #4954 )
...
* refactor: change level 1 to level 2 preparing for next design changes
* fix: make level-2 consistent everywhere
* chore: remove unused comments
* refactor: change all the name level_1 to level_2 to completely replace 1 with 2 to amke everything consistent
* chore: add correspinding constants for the comapction levels in the comments
Co-authored-by: Dom <dom@itsallbroken.com>
2022-06-29 14:08:58 +00:00