Commit Graph

550 Commits (1ddc64d68db906c6490f36d4aecde7ccd5bff945)

Author SHA1 Message Date
Nga Tran a6eb83d47d
feat: compact small contiguous files of the same partition even if they do not overlap (#4197)
* feat: compact small contiguous files of the same partition even if they do not overlap

* test: more tests

* chore: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* refactor: address review comments

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2022-04-01 15:26:43 +00:00
Nga Tran 9c50a4c9fb
test: replace find_and_compact with compact_partition in tests (#4185)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-31 13:51:22 +00:00
Nga Tran ddc2c8304f
fix: have the compaction level set correctly (#4184)
* fix: have the compaction level set correctly, especially for compacted file from the compactor

* fix: typo
2022-03-30 21:23:40 +00:00
Paul Dix 04d961e70d
feat: wire up compactor scheduler and config (#4139)
Add configuration options for compactor for the max size of level 0 files and split percentage.
Add metrics for compaction to track the number of candidates, compactions, and durations.
Add functions to separate identifying partitions to compact from running compaction.
Make compaction run in smaller chunks, specifically per partition.
Update compaction to automatically promote level 0 files that are non-overlapping without waiting some period of time.

Closes #4120

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-30 17:45:24 +00:00
Marco Neumann 20bbb88dc5
refactor: remove table name from `TableSummary` (#4170)
This allows us to remove the table name from the low-level chunk
representations (like `ParquetFile`, RUB, ...) since table names are
already tracked by the higher-level data structures (e.g. catalog,
catalog chunk) that manage the low-level chunk representations.

This is similar to #4167.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-30 13:24:00 +00:00
Marco Neumann 036626a576
refactor: remove partition key from `ParquetChunk` (#4167)
The parquet chunk is always wrapped into some higher-level data
structure (e.g. a catalog chunk, a partition, ...) that knows exactly
"where" the chunk is located. There is no need for the parquet chunk to
back-reference container-level attributes. In the contrary:
double-bookkeeping makes the code more complex and costs additional
memory.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-30 09:24:56 +00:00
Nga Tran bfd5568acf
fix: make sure the QueryableParquetChunks are always sorted correctly (#4163)
* fix: make sure the chunks are always sorted correctly

* fix: output

* chore: Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refactor: make new function for new chunk id

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-29 21:36:45 +00:00
Carol (Nichols || Goulding) db5cd70c77
fix: logical merge conflict, unused import 2022-03-29 08:29:23 -04:00
Carol (Nichols || Goulding) 4a51c9eda6
feat: Add a garbage collector to be called in a background loop
Fixes #3954.
2022-03-29 08:15:26 -04:00
Carol (Nichols || Goulding) f3f792fd08
feat: Add namespace_id to the parquet_files table; object store paths need it 2022-03-29 08:15:26 -04:00
Carol (Nichols || Goulding) a373c90415
refactor: Extract the list_all function to object store
I'm about to use this in a third file, so time to extract this.

Make it clear that this is appropriate for tests only.
2022-03-29 08:15:24 -04:00
dependabot[bot] 17af5fcbd1
chore(deps): Bump tokio-util from 0.7.0 to 0.7.1 (#4154)
* chore(deps): Bump tokio-util from 0.7.0 to 0.7.1

Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.0 to 0.7.1.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.0...tokio-util-0.7.1)

---
updated-dependencies:
- dependency-name: tokio-util
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-29 08:39:02 +00:00
Nga Tran 80b7e9cce1
feat: delete fully processed tombstones & integration tests for find_and_compact (#4116)
* feat: remove fully processed tombstones

* test: first few tests

* fix: delete SQL

* fix: test how IN (...) works in PG

* fix: test how IN (?) works in PG

* fix: test how IN (?) works in PG

* fix: dynamically add  IN (?, ?, ...)

* fix: dynamically add  IN (?, ?, ...) & its dynamic values

* fix: add argument directly in the SQL

* test: more tests for catalog read and update functions

* chore: move a subfunction to make it easier to read)

* test: first test for find_can_compact but disabled due to bug

* test: integration tests and a bug fix for find_and_compact

* chore: cleanup

* refactor: address review comments

* fix: put 2 delete processed  tombstones and tombstones in a transaction
2022-03-28 18:35:54 +00:00
dependabot[bot] 4f9515ffba
chore(deps): Bump async-trait from 0.1.52 to 0.1.53 (#4141)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.52 to 0.1.53.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.52...0.1.53)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-28 08:55:24 +00:00
kodiakhq[bot] 15a9108135
Merge branch 'main' into dom/revert-revert-revert 2022-03-24 16:38:06 +00:00
Dom Dwyer bf782de421 fix: compactor early shutdown
The compactor stub code would wait on nothing when the caller waited on
join()-ing the compactor handler, and this meant any caller who blocked
on join() would immediately return.
2022-03-24 15:58:02 +00:00
Andrew Lamb 5c69a3f43b
chore: Update deps: datafusion, arrow/arrow-flight/parquet to 11, zstd to 0.11 (#4119)
* chore: update datafusion

* chore(deps): Bump arrow from 10.0.0 to 11.0.0

Bumps [arrow](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0)

---
updated-dependencies:
- dependency-name: arrow
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): Bump arrow-flight from 10.0.0 to 11.0.0

Bumps [arrow-flight](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0)

---
updated-dependencies:
- dependency-name: arrow-flight
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: update parquet to 11.0.0

* fix: error on create schema, test for same

* fix: upgrade zstd

* chore: Run cargo hakari tasks

* fix: fix logical merge conflict

* fix: hakari

* fix: hakari

* fix: update newly introduced dep

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-24 15:27:36 +00:00
Carol (Nichols || Goulding) 67e13a7c34
fix: Change to_delete column on parquet_files to be a time (#4117)
Set to_delete to the time the file was marked as deleted rather than
true.

Fixes #4059.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-23 18:47:27 +00:00
Marco Neumann 51da6dd7fa
feat: store sort key in NG metadata (#4110)
The sort key is optional and currently only produced by `iox_tests`.
Writing it within the ingester/compactor is tracked by #3968. The sort
key is read by the querier (and this will be verified by the query tests
and is required to merge #4103).

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-23 18:24:46 +00:00
Carol (Nichols || Goulding) c3a8834970
test: Add a test for add_tombstones_to_groups 2022-03-23 09:56:27 -04:00
Carol (Nichols || Goulding) 080156aa27
fix: Only do one catalog query for tombstones per each group of parquet files
The query will get all tombstones that could be relevant to the group;
then associate subsets of the results with each parquet file.
2022-03-23 09:56:26 -04:00
Carol (Nichols || Goulding) 2749c37d02
fix: Query for tombstones in a time range, not for a particular parquet file
The compactor at this point is still querying for each file; this is an
intermediate step
2022-03-23 09:52:00 -04:00
Carol (Nichols || Goulding) 4d2e71c03e
feat: Wrap parquet files with their relevant tombstones 2022-03-23 09:52:00 -04:00
Nga Tran c3ef56588f
feat: use creation time to check level upgradable (#4094)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-22 13:51:18 +00:00
Nga Tran 886f9dc8c1
feat: split compacted data into 2 compacted sets (#4088)
* feat: split compacted data into 2 compacted sets

* chore: clean up

* refactor: address review comments

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-22 13:28:32 +00:00
Andrew Lamb b83b000590
chore: Update datafusion (#4071)
* chore: update to datafusion 5936edc2a94d5fb20702a41eab2b80695961b9dc

* chore: Update apis to match datafusion changes
2022-03-22 13:17:41 +00:00
Carol (Nichols || Goulding) 201ced1d66
test: Mark a parquet file deleted in the update catalog operation 2022-03-21 10:16:58 -04:00
Carol (Nichols || Goulding) dbca54d917
refactor: Move add parquet file and tombstones within update catalog
This should never be done on its own so doesn't really need to be its
own method. We also don't do anything with the returned data, so no need
to allocate those vectors.
2022-03-21 10:16:58 -04:00
Carol (Nichols || Goulding) 2fea10dfd7
feat: Mark old compacted parquet files to be deleted in transaction
Connects to #3952
2022-03-21 10:16:58 -04:00
Carol (Nichols || Goulding) 5b294968a5
feat: Add processed tombstone records with compacted parquet file
In a transaction when the parquet file is added to the catalog.

Connects to #3952.
2022-03-21 10:16:57 -04:00
Carol (Nichols || Goulding) b983b24fcf
fix: Adding processed tombstones to catalog only needs tombstone ID 2022-03-21 10:16:57 -04:00
Carol (Nichols || Goulding) 8fd3d85634
refactor: Move add_parquet_file_with_tombstones from ingester to compactor 2022-03-21 10:16:57 -04:00
Carol (Nichols || Goulding) 933dc69ecf
feat: For each compacted data set, persist new parquet file to object store (#4058)
* feat: Rearrange skeleton functions for split/persist/catalog update

* feat: Persist compacted files to object storage

Fixes #3951.

* docs: Add comment about batches' schemas
2022-03-21 14:16:03 +00:00
Marco Neumann d1df95df87 refactor: dyn-dispatch chunks in query subsystem
- this is what DataFusion is doing as well; it's also fast enough
  because the number of chunks in a query is not THAT massive (it's not
  like we are doing row-level dyn dispatching)
- it simplifies abstracting over different databases
- it allows us to drop our enum-based dispatching that we have for
  `DbChunk` and that we would also need for the querier (e.g. depending
  on if a chunk is backed by a parquet file or ingester data)
- it likely speeds up compile times because the `query` is no longer
  contains massive amounts of generic code

For #3934.
2022-03-21 12:47:54 +01:00
Marco Neumann 169fa2fb2f refactor: make `QueryChunk` object-safe
This makes it way easier to dyn-type database implementations. The only
real change is that we make `QueryChunk::Error` opaque. Nobody is going
to inspect that anyways, it's just printed to the user.

This is a follow-up of #4053.

Ref #3934.
2022-03-18 11:40:31 +01:00
Carol (Nichols || Goulding) cd9c483864
feat: Group files by whether they overlap in time (#4048)
Fixes #3949.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-17 13:05:18 +00:00
Dom Dwyer 65273721b6 feat(compactor): enable object store metrics 2022-03-15 16:32:52 +00:00
Dom Dwyer 5585dd3c21 refactor: switch to using DynObjectStore
Changes all consumers of the object store to use the dynamically
dispatched DynObjectStore type, instead of using a hardcoded concrete
implementation type.
2022-03-15 16:32:52 +00:00
Dom Dwyer 1d5066c421 refactor: rename ObjectStore -> ObjectStoreImpl
Frees up the name for so we can use `dyn ObjectStore` throughout the
code instead of `ObjectStoreApi`.
2022-03-15 16:29:43 +00:00
Carol (Nichols || Goulding) 1dacf567d9
feat: Add a function to the catalog to fetch level 1 parquet files
Fixes #3946.
2022-03-11 15:40:34 -05:00
Carol (Nichols || Goulding) f184b7023c
feat: Update specified parquet file records to compaction level 1
Fixes #3950.
2022-03-11 15:34:40 -05:00
Carol (Nichols || Goulding) fabd262442
feat: Add a function to the catalog to fetch level 0 parquet files
Connects to #3946.
2022-03-11 15:34:05 -05:00
Nga Tran 5a29d070ea
feat: Implement the compact function for NG Compactor (#4001)
* feat: initial implementation of compact a given list of overlapped parquet files

* feat: Add QueryableParquetChunk and some refactoring

* feat:  build queryable parquet chunks for parquet files with tombstones

* feat: second half the implementation for Compactor's compact. Tests will be next

* fix: comments for trait funnctions fof QueryChunkMeta

* test: add tests for compactor's compact function

* fix: typos

* refactor: address Jake's review comments

* refactor: address Andrew's comments and add one more test for files in different order in the vector

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-11 20:25:19 +00:00
Andrew Lamb b24ae7d23b
refactor: extract out compactor creation from config (#4018)
* refactor: extract out compactor creation from config

* fix: fmt

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-11 14:46:34 +00:00
Carol (Nichols || Goulding) 944f628e29
fix: Remove data_types as a dependency of ng compactor (#3993)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-09 17:03:02 +00:00
Nga Tran 09fba1d2c0
feat: NG Compactor - main function for finding and compacting parquet files (#3973)
* feat: main function for finding and compacting parquet files

* chore: Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refactor: rename file and struct

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-08 16:34:43 +00:00
Andrew Lamb b870b9340b
chore: remove uneeded dependencies (#3929)
* chore: remove unused deps in compactor

* chore: remove unused deps in influxdb_ioxd

* chore: remove unused deps in object_store

* chore: remove unused deps in server

* fix: object_store needs observability deps when compiled with aws

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-04 17:39:46 +00:00
Luke Bond 34e06e8689
fix: compactor server stays up; removed unused delegates (#3855)
* fix: compactor server stays up; removed unused delegates

* chore: fmt

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-24 16:30:44 +00:00
dependabot[bot] ad3868ed7c
chore(deps): Bump tokio from 1.16.1 to 1.17.0 (#3814)
* chore(deps): Bump tokio from 1.16.1 to 1.17.0

Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.16.1 to 1.17.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.16.1...tokio-1.17.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build: update workspace-hack

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom Dwyer <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-22 16:27:43 +00:00
Luke Bond 0f012de70c
feat: adding compactor CLI command and crate
Closes: #3777
2022-02-21 12:24:09 +00:00