Commit Graph

618 Commits (8fddbc395b1feb8ee2fe9d39d7c4d98d6bcbbeab)

Author SHA1 Message Date
Carol (Nichols || Goulding) eec31b7f00
feat: Abstract over which partition ID type we're using to get a partition from the catalog 2023-07-10 10:43:20 -04:00
Joe-Blount ec6a609f63
Merge branch 'main' into jrb_58_throttle_partition_processing 2023-07-07 08:46:22 -05:00
Joe-Blount 9f522bfd30
Revert "feat(idpe-17789): scheduler job_status() (#8121)" (#8175)
This reverts commit 5d19fa3635.
2023-07-06 18:52:25 +00:00
Joe-Blount 28eb3dcd92 feat: throttle partition evaluation in the compactor 2023-07-06 11:54:18 -05:00
wiedld 5d19fa3635
feat(idpe-17789): scheduler job_status() (#8121)
This block of work moves into the scheduler some of the specific downstream actions affiliated with compaction outcomes. Which responsibilities stay in the compactor, versus moved to the scheduler, roughly followed the heuristic of whether the action (a) had an impact on global catalog state (a.k.a. commits and partition skipping), (b) whether it's logging affiliated with compactor health (e.g. ParitionDoneSink logging outcomes) versus system health (e.g. logging commits), and (c) reporting to the scheduler on any errors encountered during compaction. This boundary is subject to change as we move forward.

Also, a noted caveat (TODO) on this commit. We have a CompactionJob which is used to track work handed off to each compactor. Currently it still uses the partition_id for tracking, but the followup PR will start moving the compactor to have more CompactionJob uuid awareness.
2023-07-06 09:15:59 -07:00
dependabot[bot] 26a6113a37
chore(deps): Bump async-trait from 0.1.70 to 0.1.71 (#8163)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.70 to 0.1.71.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.70...0.1.71)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-06 09:58:51 +00:00
dependabot[bot] b5c9628f0f
chore(deps): Bump async-trait from 0.1.69 to 0.1.70 (#8148)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.69 to 0.1.70.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.69...0.1.70)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-05 09:05:13 +00:00
Marco Neumann ce6a2fb613
refactor: remove `QueryChunk::column_values` (#8111)
Similar to #8109.

This was once implemented by the RUB but as it stands right now, no
chunk implements this anymore.

If we ever want to bring this back, we should use the output of
`QueryChunk::data` instead (i.e. use a data-based implementation instead
of a per-chunk one).

Closes #8096.
2023-07-03 09:03:21 +00:00
Marco Neumann b982ee180e
refactor: remove `QueryChunk::column_names` (#8109)
This interface was once specially implemented by the RUB. The only
actual implementation of it is within the querier that just forwards it
to a simple schema scan. Lift this semantic to `iox_query_influxrpc`
instead so all the chunks can use it.

If we ever want to optimize this again, we should use `QueryChunk::data`
instead (i.e. instead of implementing it within the chunk it should use
the data method and do something smart based on that).

First half of #8096.
2023-06-29 13:43:10 +00:00
Marco Neumann dcb4a9bb5c
refactor: fuse `QueryChunk` and `QueryChunkMeta` (#8107)
Closes #8095.
2023-06-29 11:02:48 +00:00
Marco Neumann 4638b89d93
refactor: migrate retention to proper predicates (#8092)
Do not (ab)use per-chunk delete predicates for the retention policy.
Instead use a per-table predicate.

This makes the code way cleaner, since the scoping is correct (i.e.
delete predicates are a table-wide attribute, not a chunk-based one) and
it is consistent time predicates that the user providers (e.g. via
`WHERE time > x`).

It also allows us to remove delete predicates (in their current,
non-scalable form) from the query path. A potential future version would
likely not use per chunk predicates (and "is processed" markers) but use
the timestamp / chunk order to determine to which data the predicate
should be applied.

Note that the lowering of the retention policy changed slightly from

```text
(time > (now() - retention)) AND (time < MAX)
```

to

```text
time > (now() - retention)
```

Since the `MAX` cut is just an artifact of the lowering and was unnecessary.

Closes #7409.
Closes #7410.
2023-06-29 08:36:37 +00:00
wiedld 3a8a8a153e
feat(idpe 17789): provide scheduler interface (#8057)
* feat: provide convenience methods to create Scheduler, and keep the scheduler implementations crate private. External crates can only create a Scheduler based upon configs.

* feat: provide Scheduler as a component to compactor. Specifically, the scheduler configs are present within the compactor run config, and the scheduler in created within the compactor hardcoded components.

* feat: within the compactor ScheduledPartitionsSource, utilize the dyn Scheduler and Scheduler.get_jobs()

* feat: CompactionJob should be per partition, and have a uniqueness characteristic independent of the partition

* feat: keep compactor_scheduler separate from clap_blocks. Only interface is within ioxd_compactor where the CLI configs are transformed into ShardConfig and PartitionsSourceConfig.

* chore: make IdOnlyPartitionFilter into only pub(crate)

* chore: update scheduler display to include any report information (a.k.a. shard_config, if present)
2023-06-28 15:04:00 -07:00
Joe-Blount ac9cc24315
fix: compactor shouldn't leave small L1s in non-overlap leading edge pattern (#8101)
* fix: compactor shouldn't leave tiny L1s with non-overlapped leading edge pattern

* chore: insta updates for prior commit
2023-06-28 17:02:21 +00:00
Joe-Blount 40865e011c
fix: compactor loop on L1 files (#8082)
* chore: suppress insta run output on some long tests

* fix: prevent L1 compaction looping

* chore: insta updates from prior commit

* chore: addresss comments
2023-06-26 21:21:24 +00:00
Joe-Blount 99d0530a21
fix: compactor stuck looping with unproductive compactions (needs vertical split) (#8056)
* chore: adjust with_max_num_files_per_plan to more common setting

This significantly increases write amplification (see change in `written` at the conclusion of the cases)

* fix: compactor looping with unproductive compactions

* chore: formatting cleanup

* chore: fix typo in comment

* chore: add test case that compacts too many files at once

* fix: enforce max file count for compaction

* chore: insta churn from prior commit

---------

Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-23 09:19:06 +00:00
dependabot[bot] 74a48a8f63
chore(deps): Bump itertools from 0.10.5 to 0.11.0 (#8060)
* chore(deps): Bump itertools from 0.10.5 to 0.11.0

Bumps [itertools](https://github.com/rust-itertools/itertools) from 0.10.5 to 0.11.0.
- [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-itertools/itertools/compare/v0.10.5...v0.11.0)

---
updated-dependencies:
- dependency-name: itertools
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-23 08:11:56 +00:00
dependabot[bot] 6e7b838b52
chore(deps): Bump insta from 1.29.0 to 1.30.0 (#8059)
Bumps [insta](https://github.com/mitsuhiko/insta) from 1.29.0 to 1.30.0.
- [Changelog](https://github.com/mitsuhiko/insta/blob/master/CHANGELOG.md)
- [Commits](https://github.com/mitsuhiko/insta/compare/1.29.0...1.30.0)

---
updated-dependencies:
- dependency-name: insta
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-23 07:45:41 +00:00
wiedld 62251e2323
refactor(idpe-17789): delineate btwn partitions_source within the scheduler versus compactor (#8028)
This is purely a movement of code, and not any definition of the interface methods yet. At best, it further solidifying the boundary of what partitions_source implementations are within the scheduler -- versus within the compactor.
2023-06-22 11:48:08 -07:00
Carol (Nichols || Goulding) 41420cb920
fix: Borrow transition partition ID when possible 2023-06-22 09:01:22 -04:00
Carol (Nichols || Goulding) 62ba18171a
feat: Add a new hash column on the partition and parquet file tables
This will hold the deterministic ID for partitions.

Until all existing partitions have this value, this is optional/nullable.

The row ID still exists and is used as the main foreign key in the
parquet_file and skipped_compaction tables.

The hash_id has a unique index so that we can look up records based on
it (if it's available).

If the parquet file record has a partition_hash_id value, use that to
generate the object storage path instead of the partition_id.
2023-06-22 09:01:22 -04:00
Dom cb2968f0ef
Merge branch 'main' into dom/compactor-query-rate-limit 2023-06-22 13:08:24 +01:00
Dom Dwyer cb79429b5f
refactor: wait until next attempt deadline
Minor optimisation for cases where load exceeds the limit, but not by
much - sleep until the next query is allowed, rather than a full query
period.
2023-06-22 14:07:47 +02:00
Joe-Blount e0ccc2e345
Revert "fix: compactor stuck looping with unproductive compactions (needs vertical split) (#8039)" (#8041)
This reverts commit b219b4b003.
2023-06-21 23:14:35 +00:00
Joe-Blount b219b4b003
fix: compactor stuck looping with unproductive compactions (needs vertical split) (#8039)
* chore: adjust with_max_num_files_per_plan to more common setting

This significantly increases write amplification (see change in `written` at the conclusion of the cases)

* fix: compactor looping with unproductive compactions

* chore: formatting cleanup

* chore: fix typo in comment
2023-06-21 20:23:50 +00:00
wiedld f75736891a
refactor(idpe-17789): move mod id_only_partition_filter (#8027)
* chore: delineate scheduler logic boundary in code comments
* refactor: move id_only_partition_filter mod into local scheduler
* chore: add docs for each IdOnlyPartitionFilter implementation
2023-06-21 10:29:38 -07:00
Dom Dwyer d1cbbd27b1
feat(compactor): config partition query rate limit
Allow the partition fetch queries to be (optionally) rate limited via
runtime config.
2023-06-21 15:50:12 +02:00
Dom Dwyer 48e73fdf63
feat(compactor): partition fetch rate limiter
Implements a (very) simple rate limiter that permits at most N requests
per second, smoothed over a full second.
2023-06-21 15:50:11 +02:00
Dom Dwyer d6c4b51ba8
refactor: introduce catalog query indirection
Add indirection between the CatalogPartitionFilesSource (within the
retry-loop) and the underlying catalog.
2023-06-21 15:48:58 +02:00
wiedld e29b453e0d
refactor: move PartitionsSourceConfig into local scheduler (#8026) 2023-06-20 16:05:59 -07:00
wiedld 34b5fadde0
refactor: move scheduler related configs to compactor_scheduler (#8013) 2023-06-20 09:55:35 -07:00
wiedld 8b7ef69f6f
refactor: move partitions_source to scheduler (#8010)
* refactor: make compactor_scheduler crate

* refactor: move PartitionsSource into the compactor_scheduler

    The compactor currently uses PartitionsSource in two ways:
       * for the preparation of PartitionIds prior to the compactor pipeline.
       * for the abstraction which utilize the PartitionIds during the IO pipeline.
    This commit is a refactoring to enable us to delineate between these two utilizations.
    The former (preparation) utilization will now be done in the compactor_scheduler.
    Since the compactor is dependent on the compactor_scheduler, it made sense to move the trait to the scheduler.
2023-06-16 10:02:13 -07:00
wiedld 7a1f54ac64
refactor: remove compactor type (#8011)
* refactor: remove cold compactions
* refactor: remove compaction_type
2023-06-16 09:40:13 -07:00
Joe-Blount a21596f604
chore: add L1/L2 accumulated size test (#8012)
This adds 4 small test cases intending to test how compaction decisions made affect the final size of L1/L2 files.
The assumption is that when a steady stream of small L0 files is arriving, the compactor needs to be rewriting L1s so they grow to a reasonable size instead of getting left small.
2023-06-16 13:28:59 +00:00
Joe-Blount 5d0bb68c5b
chore: add compactor option to disable scratchpad (#7995) 2023-06-15 14:35:55 +00:00
Phil Bracikowski e34ec77e8d
feat(garbage-collector): batch parquet existence checks to catalog (#7964)
* feat(garbage-collector): batch parquet existence checks to catalog

The core feature of this PR is batching the existence checks of parquet
files in object store against the catalog. Before, there was 1 catalog
query per each parquet file in object store. This can be a lot of
requests.

This PR can perform one query of at most 100 parquet file uuids against
the catalog in one query. A hundred seems like a decent starting place.

The batch may not reach 100 because there is also a timeout on receiving
object store meta objects from the object store lister thread. That
timeout is set to 100 milliseconds. If more than 100 are received, they
are batched into 100 for the catalog.

Additionally, this PR includes surrounding code changes to make it more
idiomatic (but not perfect). It follows up some suggested work from
 #7652 for watching for shutdown on the threads.

* fixes #7784

* use hashset instead of vec to test for contains
* chore: add test for db failure path
* remove ParquetFileExistsByOSID and other single field structs that are
  just for sql deserialization; map to uuid explicitly
* fix the sqlite query by using a blob literal X'<hex>' for uuids
* comment clarifications
* adjust loggings to warn from debug for expected rare events

Many thanks to Carol for help implementing this!
2023-06-14 07:59:00 -07:00
Joe-Blount 59fd5cd7b4 chore: retain written files in shadow mode 2023-06-09 13:22:49 -05:00
Joe-Blount 9171e1521f fix: clean compaction output from scratchpad 2023-06-08 09:35:33 -05:00
Carol (Nichols || Goulding) bf699a8b60
fix: Remove partition ID from the metadata serialized into Parquet files (#7947)
Nothing gets the partition ID out of the metadata. The parts of the code
interacting with object storage that need the ID to create the object
store path were using the partition ID from the metadata out of
convenience, but I changed those places to pass in the partition ID in a
separate argument instead.

This will make the transition to deterministic partition IDs a bit
smoother.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-08 14:03:21 +00:00
Andrew Lamb 17c0d837b3
chore: Update DataFusion, arrow, object_store pins (#7942)
* chore: Update DataFusion, arrow, object_store pins

* chore: Update for hakari

* chore: Update for new APIs

* fix: update test

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-07 17:08:31 +00:00
Marco Neumann fa5011197c
refactor: migrate `iox_query` to use DataFusion statistics (#7908)
This is the major part of #7470. Additional clean ups (e.g. to remove
the actual types from `data_types`) will follow.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-02 09:18:59 +00:00
Andrew Lamb a48f681e56
feat(parquet): reduce and limit buffering when writing parquet files (#7880)
* feat: limit buffering when writing parquet files ("combined solution")

* chore: Run cargo hakari tasks

---------

Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-31 13:27:32 +00:00
Joe-Blount 3b77929007 chore: lint fmt 2023-05-30 16:08:40 -05:00
Joe-Blount c2423d8a5c feat: Order L0s more deterministically 2023-05-30 15:52:03 -05:00
Andrew Lamb 1ff76b7bf2 chore: use workspace dependencies for `object_store` 2023-05-26 07:03:42 -04:00
Carol (Nichols || Goulding) 9c0faa66f0
feat: Set a table partition template explicitly or from the namespace
And use the table partition template when partitioning writes to that
table.
2023-05-24 10:34:30 -04:00
Carol (Nichols || Goulding) afb3838437
feat: Optionally supply the namespace partition template when creating a namespace 2023-05-24 10:10:34 -04:00
Marco Neumann 103e814f22
refactor: clean up catalog `parquet_files` interface (#7853)
* feat: `ParquetFileRepo::list_all`

* refactor: remove `ParquetFileRepo::list_by_table`

* refactor: simlify `ParquetFileRepo::list_by_table`

* refactor: remove `ParquetFileRepo::count`

* refactor: remove `ParquetFileRepo::update_compaction_level`

* refactor: remove `ParquetFileRepo::exists`

* fix: test

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-24 09:15:03 +00:00
Dom Dwyer 928a4d163e
build: remove unused dependencies from crates
This commit fixes loads of crates (47!) had unused dependencies, or
mis-configured dependencies (test deps as normal deps).

I added the "unused_crate_dependencies" to all crates to help prevent
this mess from growing again!

    https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html

This has the minor downside of false-positives when specifying
dev-dependencies for test/bench binaries - these are files in /test or
/benches (not normal tests). This commit includes a workaround,
importing them in lib.rs (gated by a feature flag). I think the
trade-off of better dependency management is worth it!
2023-05-23 14:55:43 +02:00
Andrew Lamb 6344fe8c3f
chore: Add rationale for `clippy::future_not_send` (#7822)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-18 16:58:56 +00:00
Jeffrey Smith II a9ee1bae6c chore: Merge remote-tracking branch 'origin/main' into smith/remove-transactions-main 2023-05-09 13:26:02 -05:00
Carol (Nichols || Goulding) e8655af52d
fix: Change ColumnsByName::new to enable taking ownership if caller wants to give it 2023-05-09 14:55:00 +02:00
Carol (Nichols || Goulding) cc41216382
fix: Undo the addition of a TableInfo type; store partition_template on TableSchema 2023-05-09 14:54:59 +02:00
Carol (Nichols || Goulding) 596673d515
refactor: Create a new ColumnsByName type to abstract over TableSchema columns
And allow usage of just the columns when that's all that's needed
without leaking the BTreeMap implementation detail everywhere
2023-05-09 14:54:58 +02:00
Carol (Nichols || Goulding) 1f1dcc947d
fix: Don't change how the compactor gets the table schema 2023-05-09 14:54:58 +02:00
Carol (Nichols || Goulding) 58d9c40ffd
feat: If namespace or table partition templates are specified, use those 2023-05-09 14:54:57 +02:00
Carol (Nichols || Goulding) 9229ce5668
fix: Rename compactor2_test_utils to compactor_test_utils 2023-05-09 11:02:11 +02:00
Carol (Nichols || Goulding) dd9c5d1b13
fix: Rename compactor2 to compactor 2023-05-09 10:58:55 +02:00
Carol (Nichols || Goulding) a5769a461b
fix: Remove old compactor and ioxd_compactor 2023-04-06 16:49:52 -04:00
Marco Neumann 5f43f2a719
refactor: remove old query planning code (#7449)
Closes #7406.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-06 16:05:08 +00:00
dependabot[bot] 66982f988b
chore(deps): Bump object_store from 0.5.5 to 0.5.6 (#7433)
Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.5 to 0.5.6.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md)
- [Commits](https://github.com/apache/arrow-rs/commits)

---
updated-dependencies:
- dependency-name: object_store
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-04 08:43:34 +00:00
dependabot[bot] 4eedb7ea77
chore(deps): Bump async-trait from 0.1.66 to 0.1.68 (#7374)
* chore(deps): Bump async-trait from 0.1.66 to 0.1.68

Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.66 to 0.1.68.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.66...0.1.68)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2023-03-30 10:14:36 +00:00
dependabot[bot] 9cbcdc7672
chore(deps): Bump tokio from 1.26.0 to 1.27.0 (#7373)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.26.0 to 1.27.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.26.0...tokio-1.27.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-03-30 09:36:04 +00:00
dependabot[bot] 8f3a9396d0
chore(deps): Bump async-trait from 0.1.64 to 0.1.66 (#7129)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.64 to 0.1.66.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.64...0.1.66)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-03-06 10:13:29 +00:00
dependabot[bot] 3256fcc72e
chore(deps): Bump object_store from 0.5.4 to 0.5.5
Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.4 to 0.5.5.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md)
- [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.4...object_store_0.5.5)

---
updated-dependencies:
- dependency-name: object_store
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-03-03 02:00:51 +00:00
dependabot[bot] c538cac4ef
chore(deps): Bump tokio from 1.25.0 to 1.26.0 (#7107)
* chore(deps): Bump tokio from 1.25.0 to 1.26.0

Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.25.0 to 1.26.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.25.0...tokio-1.26.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-02 09:50:39 +00:00
Carol (Nichols || Goulding) faae5eb438 chore: Rerun cargo hakari manage-deps 2023-02-27 11:56:15 +01:00
Marco Neumann 08578cded5
refactor: n_threads and n_target_partitions are non-zero (#7047)
* refactor: n_threads and n_target_partitions are non-zero

Zero values will just panic. Prevent that earlier.

* fix: typo

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

---------

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2023-02-23 16:57:00 +00:00
Dom Dwyer 2d46a364dc
feat: namespace soft-delete support
This commit adds initial support for "soft" namespace deletion, where
the actual records & data remain, but are no longer queryable /
writeable.

Soft deletion is eventually consistent - users can expect to continue
writing to and reading from a bucket after issuing a soft delete call,
until the various components either restart, or have their caches
flushed.

The components treat soft-deleted namespaces differently:

    * router: ignore soft deleted namespaces
    * ingester: accept soft deleted namespaces
    * compactor: accept soft deleted namespaces
    * querier: ignore soft deleted namespaces
    * various gRPC services: ignore soft deleted namespaces

This ensures that the ingester & compactor do not see rows "vanishing"
from the database, and continue to make forward progress.

Writes for the deleted namespace that are buffered in the ingester will
be persisted as normal, allowing us to support "un-delete" operations
where the system is restored to a the state at which the delete was
issued (rather than loosing the buffered data).

Follow-on work is required to ensure GC drops the orphaned parquet files
after the configured GC time, and optimisations such as not compacting
parquet from soft-deleted namespaces seems like a trivial win.
2023-02-13 12:01:35 +01:00
dependabot[bot] 0cbd9f6a82
chore(deps): Bump tokio-util from 0.7.5 to 0.7.7 (#6964)
---
updated-dependencies:
- dependency-name: tokio-util
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-02-13 10:10:53 +00:00
Andrew Lamb 779fb93ce7
refactor: move test builders out of compactor2 code (#6953)
* refactor: move test builders out of compactor2 code

* fix: docs
2023-02-10 18:28:09 +00:00
dependabot[bot] c0c9b51b9e
chore(deps): Bump tokio-util from 0.7.4 to 0.7.5 (#6941)
Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.4 to 0.7.5.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.4...tokio-util-0.7.5)

---
updated-dependencies:
- dependency-name: tokio-util
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-02-10 09:42:00 +00:00
dependabot[bot] 0ecde75af5
chore(deps): Bump object_store from 0.5.3 to 0.5.4 (#6900)
Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.3 to 0.5.4.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md)
- [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.3...object_store_0.5.4)

---
updated-dependencies:
- dependency-name: object_store
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-08 09:40:11 +00:00
Raphael Taylor-Davies d3601a59f8
chore: update DataFusion, upgrade `arrow` `arrow-flight` and `parquet` to `32.0.0` (#6756)
* chore: update DataFusion

* fix: test

* chore: format

* chore: clippy

* chore: update arrow

* chore: arrow upgrade fallout

* chore: Run cargo hakari tasks

* chore: remove failing warm compaction test

* fix: flight error propagation

* chore: update parquet size

* fix: Update error message

* chore: Update parquet metadata test

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-06 11:35:39 +00:00
Carol (Nichols || Goulding) 30fea67701
fix: Move variables within format strings. Thanks clippy!
Changes made automatically using `cargo clippy --fix`.
2023-02-03 13:06:17 -05:00
dependabot[bot] d0e6b16450
chore(deps): Bump bytes from 1.3.0 to 1.4.0
Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.3.0 to 1.4.0.
- [Release notes](https://github.com/tokio-rs/bytes/releases)
- [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md)
- [Commits](https://github.com/tokio-rs/bytes/compare/v1.3.0...v1.4.0)

---
updated-dependencies:
- dependency-name: bytes
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-02-01 00:30:56 +00:00
dependabot[bot] 6f032b1d57
chore(deps): Bump async-trait from 0.1.63 to 0.1.64 (#6769)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.63 to 0.1.64.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.63...0.1.64)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-01-31 10:18:27 +00:00
dependabot[bot] ed7d02a225
chore(deps): Bump tokio from 1.24.2 to 1.25.0
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.24.2 to 1.25.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/commits/tokio-1.25.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-01-30 01:57:27 +00:00
Andrew Lamb ead6812210
fix: reduce logging verbosity (#6704)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-27 13:53:42 +00:00
Nga Tran b8a80869d4
feat: introduce a new way of max_sequence_number for ingester, compactor and querier (#6692)
* feat: introduce a new way of max_sequence_number for ingester, compactor and querier

* chore: cleanup

* feat: new column max_l0_created_at to order files for deduplication

* chore: cleanup

* chore: debug info for chnaging cpu.parquet

* fix: update test parquet file

Co-authored-by: Marco Neumann <marco@crepererum.net>
2023-01-26 10:52:47 +00:00
dependabot[bot] 0114e7ee50
chore(deps): Bump async-trait from 0.1.61 to 0.1.63 (#6660)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.61 to 0.1.63.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.61...0.1.63)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-01-23 08:41:27 +00:00
Nga Tran e596f5f074
chore: metrics for compaction candidate counts (#6593)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-13 19:00:43 +00:00
Nga Tran 550cea8bc5
perf: optimize not to update partitions with newly created level 2 files (#6590)
* perf: optimize not to update partitions with newly created level 2 files

* chore: cleanup

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-13 14:46:58 +00:00
Nga Tran fa0893819c
fix: have warm compaction work with compactor2 (#6571)
* refactor: same function to select partition candidates

* fix: have warm compaction work with compactor2

* fix: format

* chore: cleanup
2023-01-12 02:32:39 +00:00
Nga Tran 1f508b76fc
refactor: same function to select partition candidates (#6569)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-12 01:14:49 +00:00
Nga Tran d3b2203560
fix: bug in count processed partittions (#6572) 2023-01-11 22:53:52 +00:00
Nga Tran 2de0e45b0a
fix: using created_at to order chunks for deduplication (#6556)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-10 18:18:33 +00:00
NGA-TRAN 2ae018e2f9 chore: merge main to branch 2023-01-10 11:55:07 -05:00
NGA-TRAN 1a93f70a8b fix: use created_at to order L0 during comapction 2023-01-10 11:48:05 -05:00
Nga Tran 62c0f3dbdd
feat: have cold compaction work with Compactor2 (#6542)
* feat: cold

* chore: debug info

* feat: only compact qualified cold partition candidates

* fix: catalog test

* chore: cleanup

* chore: add new config flag for cold partition candidates

* chore: implement display for CompactionType and add tests for max num partitions

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-10 16:42:57 +00:00
dependabot[bot] b49cc2e35e
chore(deps): Bump tokio from 1.24.0 to 1.24.1 (#6545)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.24.0 to 1.24.1.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.24.0...tokio-1.24.1)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-10 09:48:44 +00:00
dependabot[bot] e31c84a794
chore(deps): Bump async-trait from 0.1.60 to 0.1.61 (#6533)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.60 to 0.1.61.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.60...0.1.61)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-09 07:44:35 +00:00
Nga Tran 4031ea1c10
feat: integrate new way to get partition candidates for hot compaction (#6525)
* feat: integrate new way to get partition candidates for hot compaction

* chore: rename
2023-01-06 18:51:52 +00:00
Raphael Taylor-Davies e1036a0c63
refactor: cleanup schema boxing (#6511)
* refactor: cleanup Schema boxing

* chore: clippy
2023-01-06 10:57:39 +00:00
Nga Tran cd1a604df0
fix: make estimate memory the a query needs higher due to recent observation (#6476) 2022-12-30 21:15:39 +00:00
Nga Tran 0c944346e5
chore: debug info of estimate file size (#6475) 2022-12-30 20:13:21 +00:00
Nga Tran d27e137c39
chore: add debug info for the investigation (#6472) 2022-12-29 23:49:29 +00:00
Carol (Nichols || Goulding) 39acfc4f0d
fix: Remove needless casts. Thanks clippy! 2022-12-21 14:32:34 -05:00
Dom Dwyer adc6fcfb04
feat(catalog): linearise sort key updates
Updating the sort key is not commutative and MUST be serialised. The
correctness of the current catalog interface relies on the caller
serialising updates globally, something it cannot reasonably assert in a
distributed system.

This change of the catalog interface pushes this responsibility to the
catalog itself where it can be effectively enforced, and allows a caller
to detect parallel updates to the sort key.
2022-12-20 12:31:00 +01:00
kodiakhq[bot] c0f2ba09ee
Merge branch 'main' into cn/compactor2 2022-12-19 14:22:56 +00:00
dependabot[bot] c72734473c
chore(deps): Bump async-trait from 0.1.59 to 0.1.60 (#6433)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.59 to 0.1.60.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.59...0.1.60)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-19 10:09:23 +00:00
Carol (Nichols || Goulding) dfd979477c
fix: Update warm compaction code to optionally take shard ID 2022-12-16 17:41:57 -05:00
Carol (Nichols || Goulding) d7e75d43ea
fix: Make shard ID optional for compactor queries in RPC write mode 2022-12-16 17:28:53 -05:00
Luke Bond f419e2c378
feat: warm compaction (#6192)
* feat: warm compaction

chore: add missing warm compaction config

chore: tests for warm compaction

chore: modify count usage in warm compaction sql

chore: catalog test for warm compaction; sql fixes

feat: settable target level for compact w/ budget

chore: tests for warm compaction

chore: clarifying comments in warm compaction test

chore: fixed erroneous comment in catalog test

chore: improve warm compactor test by checking file exists

chore: tests for warm compaction

chore: warm compactor test tidy-ups

* chore: improve test for warm compaction

* chore: fix erroneous comment in warm compaction code
2022-12-16 15:59:45 +00:00
dependabot[bot] 1d38d400f0
chore(deps): Bump object_store from 0.5.1 to 0.5.2 (#6339)
* chore(deps): Bump object_store from 0.5.1 to 0.5.2

Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.1 to 0.5.2.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md)
- [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.1...object_store_0.5.2)

---
updated-dependencies:
- dependency-name: object_store
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-06 07:53:54 +00:00
Nga Tran 77cbc880f6
feat: Add cap limit on number of partitions to be compacted in parallel (#6305)
* feat: Add cap limit on number of partitions to be comapcted in parallel

* chore: cleanup

* chore: clearer comments
2022-12-01 21:23:44 +00:00
Andrew Lamb 255a168d07
refactor: Refactor ParquetFileCombining into a builder and plan, and add sort exec test (#6196)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-22 14:47:47 +00:00
dependabot[bot] 04c00bbb62
chore(deps): Bump bytes from 1.2.1 to 1.3.0 (#6199)
Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.2.1 to 1.3.0.
- [Release notes](https://github.com/tokio-rs/bytes/releases)
- [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md)
- [Commits](https://github.com/tokio-rs/bytes/commits)

---
updated-dependencies:
- dependency-name: bytes
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-22 08:23:24 +00:00
dependabot[bot] a9db7581cd
chore(deps): Bump tokio from 1.21.2 to 1.22.0 (#6183)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.21.2 to 1.22.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.21.2...tokio-1.22.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-21 10:21:24 +00:00
Luke Bond 7c813c170a
feat: reintroduce compactor first file in partition exception (#6176)
* feat: compactor ignores max file count for first file

chore: typo in comment in compactor

* feat: restore special first file in partition compaction logic; add limit

* fix: calculation in compaction max file count

chore: clippy

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-18 15:58:59 +00:00
Nga Tran 49a9565240
feat: gRPC that creates namespace (#6103)
* feat: create namespace API call in router

Co-authored-by: Nga Tran <nga-tran@live.com>

* chore: treat retention as ns except in CLI

* fix: overflow in nanosecond calc

* fix: retention test after changing it from hours to ns

* chore: comment clarification in cli; better response type for error in ns API

* fix: correct some rebase mistakes

* chore: merge namespace create & create_with_retention; renamed ns create test helper fn & const

* fix: ns autocreation test was wrong after rebase

* fix: mem catalog has default 1hr retention, accidently removed in rebase

* chore: remove mem catalogs default 1hr retention; make it settable in sets & router

Co-authored-by: Luke Bond <luke.n.bond@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-18 13:02:12 +00:00
Nga Tran 80e91a644b
chore: Revert "feat: compactor ignores max file count for first file (#6144)" (#6158)
This reverts commit bf1681f4fe.
2022-11-16 19:58:46 +00:00
Luke Bond bf1681f4fe
feat: compactor ignores max file count for first file (#6144)
* feat: compactor ignores max file count for first file

* chore: typo in comment in compactor
2022-11-16 11:21:28 +00:00
Andrew Lamb 448911794c
test: test coverage for sorting and merging in compactor (#6136)
* test: test coverage for sorting and merging in compactor

* fix: Apply suggestions from code review (comments)

Co-authored-by: Marco Neumann <marco@crepererum.net>

* feat: use itertools to cover all permutations

Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-15 20:39:45 +00:00
Nga Tran 9c4266c503
refactor: first step to remove unused retention_duration (#6113)
* refactor: first step to remove unused retention_duration

* refactor: remove retenion_duration from update catalog

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-11 15:21:06 +00:00
Luke Bond f9316decee
chore: expose compactor's hot compaction hours thresholds as cfg (#6060)
* chore: expose compactor's hot compaction hours thresholds as cfg

* fix: add missing compactor arg envar; fix some comments

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-07 15:29:17 +00:00
Marco Neumann f511db380c
refactor: remove table name from chunks (#6063)
It should be always clear from the context to which table a chunk
belongs.

I think having a table name bound to a chunk goes back to a time where
chunks had multiple tables.

Helps with #6049.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-07 10:42:57 +00:00
Nga Tran 654ed98d1f
feat: config param to set when partition is cold (#6044)
* feat: config param to set when partition is cold

* chore: Apply suggestions from code review

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* fix: make default 8 hours and avoid using 8 * 60 becasue it is a string, not expression which makes a test fail

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-03 15:03:56 +00:00
Andrew Lamb 4fb2843d05
refactor: Rename `schema::selection::Selection` to `schema::projection::Projection` (#6037)
* chore: Rename `schema::selection::Selection` to `schema::projection::Projection`

* fix: docs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-02 18:15:04 +00:00
Marco Neumann 45b3984aa3
refactor: simplify `QueryChunk` data access (#6015)
* refactor: simplify `QueryChunk` data access

We have only two types for chunks (now that the RUB is gone):

1. In-memory RecordBatches
2. Parquet files

Loads of logic is duplicated in the different `read_filter`
implementations. Also `read_filter` hides a solid amount of logic from
DataFusion, which will prevent certain (future) optimizations. To enable #5897
and to simplify the interface, let the chunks return the data (batches
or metadata for parquet files) directly and let `iox_query` perform the
actual heavy-lifting.

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-02 08:18:33 +00:00
Marco Neumann 072439e428
refactor: mandatory `QueryChunkMeta::summary` (#5997)
With #5963 merged, all chunks now provide a summary (even though it may
not contain data for all columns). So let's make it mandatory, which
also removes a few 🙈-style `.except(...)` calls.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-31 16:38:02 +00:00
Carol (Nichols || Goulding) dad1ad1318
feat: Add the catalog service to ingester, querier, and compactor
So that `remote get` that uses the catalog service can work no matter
what kind of server you contact.
2022-10-28 10:49:26 -04:00
Carol (Nichols || Goulding) 53445af25d
chore: Alphabetize some dependencies
I can't handle not knowing where to look for a dependency or knowing
where to add a new dependency.
2022-10-28 10:34:25 -04:00
Marco Neumann 8447d46093
refactor: remove `QueryChunkMeta::timestamp_min_max` (#5963)
Use the table summary instead. This allows us to have a single mechanism
that both IOx and DataFusion understand. This basically lifts the "basic
table summary" mechanism that the querier uses to `iox_query` and let
the compactor and ingester use the same mechanism.

While not strictly necessary, simplifying the `QueryChunk[Meta]`
interface helps with #5897.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-28 10:29:16 +00:00
Carol (Nichols || Goulding) 3145e2c05b
feat: Use workspace dep inheritance for the arrow crate 2022-10-26 10:34:29 -04:00
Carol (Nichols || Goulding) 44936f661a
feat: Use workspace dep inheritance for datafusion instead of shim crate 2022-10-26 10:33:56 -04:00
Carol (Nichols || Goulding) 2e83e04eab
feat: Use workspace package metadata to reduce differences and repetition 2022-10-24 13:04:09 -04:00
Nga Tran 84e5c2a0ee
fix: cardinality of each batch should use row count of the batch (#5946)
* fix: cardinality of each batch should use row count of the batch

* chore: cleanup

* fix: auto-merge conflict

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-24 15:36:33 +00:00
Marco Neumann e0062f2d40
refactor: do NOT use fake DF context for parquet reading (#5942)
Use the proper top-level DataFusion context and register the object
store there.

Note that we still hide the `ParquetExec` behind an opaque record batch
stream. Fixing that is next on my list.

Helps with #5897.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-24 08:20:26 +00:00
Jake Goulding fa7fe2e9cf feat: Add a gRPC endpoint to delete a skipped compaction
Also add a CLI usage of it for convenience
2022-10-21 15:12:20 -04:00
Carol (Nichols || Goulding) 0132a33946
fix: Rename SkippedCompactionService to CompactionService
To make a good place for other compactor-related gRPC actions in the
future.
2022-10-21 13:40:37 -04:00
Carol (Nichols || Goulding) 699332fd6b
fix: Actually implement the error conversion, oops 2022-10-21 13:40:31 -04:00
Carol (Nichols || Goulding) ba25300b01
feat: Create compactor service to list skipped compactions 2022-10-21 13:40:31 -04:00
Marco Neumann eb5a661ab3
refactor: prep work for #5897 (#5907)
* refactor: add ID to `ParquetStorage`

* refactor: remove duplicate code

* refactor: use dedicated `StorageId`
2022-10-19 11:54:42 +00:00
dependabot[bot] b5574c07b7
chore(deps): Bump async-trait from 0.1.57 to 0.1.58 (#5904)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.57 to 0.1.58.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.57...0.1.58)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-19 09:40:26 +00:00
Andrew Lamb d706f8221d
chore: Update datafusion and arrow / parquet / arrow-flight 25.0.0 (#5900)
* chore: Update datafusion and  `arrow` / `parquet` / `arrow-flight` 25.0.0

* chore: Update for structure changes

* chore: Update for new projection pushdown

* chore: Run cargo hakari tasks

* fix: fmt

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-18 20:58:47 +00:00
kodiakhq[bot] e6fd0bc621
Merge branch 'main' into cn/l2-file-size 2022-10-14 15:26:13 +00:00
Carol (Nichols || Goulding) 475db15a51
fix: Compact one L1 file even if it is over the file size limit 2022-10-14 10:55:34 -04:00
Andrew Lamb 8021b8be0b
fix: Use Display rather than Debug when logging errors (#5859)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-14 14:43:11 +00:00
Carol (Nichols || Goulding) b2df492558
feat: Limit L1 -> L2 compaction based on file size 2022-10-13 16:20:22 -04:00
Andrew Lamb 9134ccd6c3
chore: Update datafusion again (#5855)
* chore: Update datafusion

* chore: Updates for changes in datafusion

* chore: more updates

* fix: update doc example

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-13 19:18:57 +00:00
Carol (Nichols || Goulding) 082d045633
fix: Update test compactor limit values 2022-10-13 14:25:10 -04:00
Carol (Nichols || Goulding) cdd01eb3fc
test: Verify L1 files chosen for compaction are limited by the memory budget 2022-10-13 14:15:39 -04:00
Carol (Nichols || Goulding) 3cdf2556ec
test: Verify L1 files in a group by themselves get upgraded to L2 2022-10-13 14:15:39 -04:00
Nga Tran fab3cd845c
feat: add memory need for output streams into our estimation (#5847)
* feat: add memory need for output streams into our estimation

* test: modify tests to have better coverage

* refactor: use constants isntead of numbers

* chore: address review comments

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-13 14:31:19 +00:00
Nga Tran 1400bf99e4
refactor: split memory estimation into bytes to store and bytes to stream (#5845)
* refactor: split memory estimation into bytes to store and bytes to stream

* chore: cleanup

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-12 16:59:51 +00:00
Andrew Lamb d57c99638c
chore: Update datafusion + `arrow`, `arrow-flight`, and `parquet` to 24.0.0.0 (#5792)
* chore: Update datafusion + `arrow`, `arrow-flight`, and `parquet` to 24.0.0.0

* fix: Update for coercion, fix explain plans for change in column name display

* chore: Update datafusion lock

* fix: Update for other API changes

* chore: Update to latest datafusion pin

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-12 16:19:14 +00:00
Nga Tran f05ca867a5
feat: add file size into estimated memory (#5837)
* feat: add file size into estimataed memory

* chore: cleanup

* chore: fmt

* chore: Apply suggestions from code review

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* chore: run fmt after applying review suggestion

* fix: fix tests towork with the change for review suggestion

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-12 14:42:53 +00:00
Nga Tran b7153862b0
refactor: due to limit in size uplaoed to S3, we need to split output file of cold compaction, too (#5834)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-11 17:22:19 +00:00
dependabot[bot] 933493fab3
chore(deps): Bump object_store from 0.5.0 to 0.5.1
Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.0 to 0.5.1.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md)
- [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.0...object_store_0.5.1)

---
updated-dependencies:
- dependency-name: object_store
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-11 01:19:10 +00:00
Marco Neumann c4c83e0840
fix: query error propagation (#5801)
- treat OOM protection as "resource exhausted"
- use `DataFusionError` in more places instead of opaque `Box<dyn Error>`
- improve conversion from/into `DataFusionError` to preserve more
  semantics

Overall, this improves our error handling. DF can now return errors like
"resource exhausted" and gRPC should now automatically generate a
sensible status code for it.

Fixes #5799.
2022-10-06 08:54:01 +00:00