influxdb

Commit Graph

Author	SHA1	Message	Date
Nga Tran	a6eb83d47d	feat: compact small contiguous files of the same partition even if they do not overlap (#4197 ) * feat: compact small contiguous files of the same partition even if they do not overlap * test: more tests * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * refactor: address review comments Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com> Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com>	2022-04-01 15:26:43 +00:00
Nga Tran	9c50a4c9fb	test: replace find_and_compact with compact_partition in tests (#4185 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-31 13:51:22 +00:00
Nga Tran	ddc2c8304f	fix: have the compaction level set correctly (#4184 ) * fix: have the compaction level set correctly, especially for compacted file from the compactor * fix: typo	2022-03-30 21:23:40 +00:00
Paul Dix	04d961e70d	feat: wire up compactor scheduler and config (#4139 ) Add configuration options for compactor for the max size of level 0 files and split percentage. Add metrics for compaction to track the number of candidates, compactions, and durations. Add functions to separate identifying partitions to compact from running compaction. Make compaction run in smaller chunks, specifically per partition. Update compaction to automatically promote level 0 files that are non-overlapping without waiting some period of time. Closes #4120 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-30 17:45:24 +00:00
Marco Neumann	20bbb88dc5	refactor: remove table name from `TableSummary` (#4170 ) This allows us to remove the table name from the low-level chunk representations (like `ParquetFile`, RUB, ...) since table names are already tracked by the higher-level data structures (e.g. catalog, catalog chunk) that manage the low-level chunk representations. This is similar to #4167. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-30 13:24:00 +00:00
Marco Neumann	036626a576	refactor: remove partition key from `ParquetChunk` (#4167 ) The parquet chunk is always wrapped into some higher-level data structure (e.g. a catalog chunk, a partition, ...) that knows exactly "where" the chunk is located. There is no need for the parquet chunk to back-reference container-level attributes. In the contrary: double-bookkeeping makes the code more complex and costs additional memory. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-30 09:24:56 +00:00
Nga Tran	bfd5568acf	fix: make sure the QueryableParquetChunks are always sorted correctly (#4163 ) * fix: make sure the chunks are always sorted correctly * fix: output * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: make new function for new chunk id Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-29 21:36:45 +00:00
Carol (Nichols \|\| Goulding)	db5cd70c77	fix: logical merge conflict, unused import	2022-03-29 08:29:23 -04:00
Carol (Nichols \|\| Goulding)	4a51c9eda6	feat: Add a garbage collector to be called in a background loop Fixes #3954.	2022-03-29 08:15:26 -04:00
Carol (Nichols \|\| Goulding)	f3f792fd08	feat: Add namespace_id to the parquet_files table; object store paths need it	2022-03-29 08:15:26 -04:00
Carol (Nichols \|\| Goulding)	a373c90415	refactor: Extract the list_all function to object store I'm about to use this in a third file, so time to extract this. Make it clear that this is appropriate for tests only.	2022-03-29 08:15:24 -04:00
dependabot[bot]	17af5fcbd1	chore(deps): Bump tokio-util from 0.7.0 to 0.7.1 (#4154 ) * chore(deps): Bump tokio-util from 0.7.0 to 0.7.1 Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.0 to 0.7.1. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.0...tokio-util-0.7.1) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * chore: Run cargo hakari tasks Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-29 08:39:02 +00:00
Nga Tran	80b7e9cce1	feat: delete fully processed tombstones & integration tests for find_and_compact (#4116 ) * feat: remove fully processed tombstones * test: first few tests * fix: delete SQL * fix: test how IN (...) works in PG * fix: test how IN (?) works in PG * fix: test how IN (?) works in PG * fix: dynamically add IN (?, ?, ...) * fix: dynamically add IN (?, ?, ...) & its dynamic values * fix: add argument directly in the SQL * test: more tests for catalog read and update functions * chore: move a subfunction to make it easier to read) * test: first test for find_can_compact but disabled due to bug * test: integration tests and a bug fix for find_and_compact * chore: cleanup * refactor: address review comments * fix: put 2 delete processed tombstones and tombstones in a transaction	2022-03-28 18:35:54 +00:00
dependabot[bot]	4f9515ffba	chore(deps): Bump async-trait from 0.1.52 to 0.1.53 (#4141 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.52 to 0.1.53. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.52...0.1.53) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-03-28 08:55:24 +00:00
kodiakhq[bot]	15a9108135	Merge branch 'main' into dom/revert-revert-revert	2022-03-24 16:38:06 +00:00
Dom Dwyer	bf782de421	fix: compactor early shutdown The compactor stub code would wait on nothing when the caller waited on join()-ing the compactor handler, and this meant any caller who blocked on join() would immediately return.	2022-03-24 15:58:02 +00:00
Andrew Lamb	5c69a3f43b	chore: Update deps: datafusion, arrow/arrow-flight/parquet to 11, zstd to 0.11 (#4119 ) * chore: update datafusion * chore(deps): Bump arrow from 10.0.0 to 11.0.0 Bumps [arrow](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0) --- updated-dependencies: - dependency-name: arrow dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): Bump arrow-flight from 10.0.0 to 11.0.0 Bumps [arrow-flight](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0) --- updated-dependencies: - dependency-name: arrow-flight dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * chore: update parquet to 11.0.0 * fix: error on create schema, test for same * fix: upgrade zstd * chore: Run cargo hakari tasks * fix: fix logical merge conflict * fix: hakari * fix: hakari * fix: update newly introduced dep Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-24 15:27:36 +00:00
Carol (Nichols \|\| Goulding)	67e13a7c34	fix: Change to_delete column on parquet_files to be a time (#4117 ) Set to_delete to the time the file was marked as deleted rather than true. Fixes #4059. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-23 18:47:27 +00:00
Marco Neumann	51da6dd7fa	feat: store sort key in NG metadata (#4110 ) The sort key is optional and currently only produced by `iox_tests`. Writing it within the ingester/compactor is tracked by #3968. The sort key is read by the querier (and this will be verified by the query tests and is required to merge #4103). Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-23 18:24:46 +00:00
Carol (Nichols \|\| Goulding)	c3a8834970	test: Add a test for add_tombstones_to_groups	2022-03-23 09:56:27 -04:00
Carol (Nichols \|\| Goulding)	080156aa27	fix: Only do one catalog query for tombstones per each group of parquet files The query will get all tombstones that could be relevant to the group; then associate subsets of the results with each parquet file.	2022-03-23 09:56:26 -04:00
Carol (Nichols \|\| Goulding)	2749c37d02	fix: Query for tombstones in a time range, not for a particular parquet file The compactor at this point is still querying for each file; this is an intermediate step	2022-03-23 09:52:00 -04:00
Carol (Nichols \|\| Goulding)	4d2e71c03e	feat: Wrap parquet files with their relevant tombstones	2022-03-23 09:52:00 -04:00
Nga Tran	c3ef56588f	feat: use creation time to check level upgradable (#4094 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-22 13:51:18 +00:00
Nga Tran	886f9dc8c1	feat: split compacted data into 2 compacted sets (#4088 ) * feat: split compacted data into 2 compacted sets * chore: clean up * refactor: address review comments Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-22 13:28:32 +00:00
Andrew Lamb	b83b000590	chore: Update datafusion (#4071 ) * chore: update to datafusion 5936edc2a94d5fb20702a41eab2b80695961b9dc * chore: Update apis to match datafusion changes	2022-03-22 13:17:41 +00:00
Carol (Nichols \|\| Goulding)	201ced1d66	test: Mark a parquet file deleted in the update catalog operation	2022-03-21 10:16:58 -04:00
Carol (Nichols \|\| Goulding)	dbca54d917	refactor: Move add parquet file and tombstones within update catalog This should never be done on its own so doesn't really need to be its own method. We also don't do anything with the returned data, so no need to allocate those vectors.	2022-03-21 10:16:58 -04:00
Carol (Nichols \|\| Goulding)	2fea10dfd7	feat: Mark old compacted parquet files to be deleted in transaction Connects to #3952	2022-03-21 10:16:58 -04:00
Carol (Nichols \|\| Goulding)	5b294968a5	feat: Add processed tombstone records with compacted parquet file In a transaction when the parquet file is added to the catalog. Connects to #3952.	2022-03-21 10:16:57 -04:00
Carol (Nichols \|\| Goulding)	b983b24fcf	fix: Adding processed tombstones to catalog only needs tombstone ID	2022-03-21 10:16:57 -04:00
Carol (Nichols \|\| Goulding)	8fd3d85634	refactor: Move add_parquet_file_with_tombstones from ingester to compactor	2022-03-21 10:16:57 -04:00
Carol (Nichols \|\| Goulding)	933dc69ecf	feat: For each compacted data set, persist new parquet file to object store (#4058 ) * feat: Rearrange skeleton functions for split/persist/catalog update * feat: Persist compacted files to object storage Fixes #3951. * docs: Add comment about batches' schemas	2022-03-21 14:16:03 +00:00
Marco Neumann	d1df95df87	refactor: dyn-dispatch chunks in query subsystem - this is what DataFusion is doing as well; it's also fast enough because the number of chunks in a query is not THAT massive (it's not like we are doing row-level dyn dispatching) - it simplifies abstracting over different databases - it allows us to drop our enum-based dispatching that we have for `DbChunk` and that we would also need for the querier (e.g. depending on if a chunk is backed by a parquet file or ingester data) - it likely speeds up compile times because the `query` is no longer contains massive amounts of generic code For #3934.	2022-03-21 12:47:54 +01:00
Marco Neumann	169fa2fb2f	refactor: make `QueryChunk` object-safe This makes it way easier to dyn-type database implementations. The only real change is that we make `QueryChunk::Error` opaque. Nobody is going to inspect that anyways, it's just printed to the user. This is a follow-up of #4053. Ref #3934.	2022-03-18 11:40:31 +01:00
Carol (Nichols \|\| Goulding)	cd9c483864	feat: Group files by whether they overlap in time (#4048 ) Fixes #3949. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-17 13:05:18 +00:00
Dom Dwyer	65273721b6	feat(compactor): enable object store metrics	2022-03-15 16:32:52 +00:00
Dom Dwyer	5585dd3c21	refactor: switch to using DynObjectStore Changes all consumers of the object store to use the dynamically dispatched DynObjectStore type, instead of using a hardcoded concrete implementation type.	2022-03-15 16:32:52 +00:00
Dom Dwyer	1d5066c421	refactor: rename ObjectStore -> ObjectStoreImpl Frees up the name for so we can use `dyn ObjectStore` throughout the code instead of `ObjectStoreApi`.	2022-03-15 16:29:43 +00:00
Carol (Nichols \|\| Goulding)	1dacf567d9	feat: Add a function to the catalog to fetch level 1 parquet files Fixes #3946.	2022-03-11 15:40:34 -05:00
Carol (Nichols \|\| Goulding)	f184b7023c	feat: Update specified parquet file records to compaction level 1 Fixes #3950.	2022-03-11 15:34:40 -05:00
Carol (Nichols \|\| Goulding)	fabd262442	feat: Add a function to the catalog to fetch level 0 parquet files Connects to #3946.	2022-03-11 15:34:05 -05:00
Nga Tran	5a29d070ea	feat: Implement the compact function for NG Compactor (#4001 ) * feat: initial implementation of compact a given list of overlapped parquet files * feat: Add QueryableParquetChunk and some refactoring * feat: build queryable parquet chunks for parquet files with tombstones * feat: second half the implementation for Compactor's compact. Tests will be next * fix: comments for trait funnctions fof QueryChunkMeta * test: add tests for compactor's compact function * fix: typos * refactor: address Jake's review comments * refactor: address Andrew's comments and add one more test for files in different order in the vector Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-11 20:25:19 +00:00
Andrew Lamb	b24ae7d23b	refactor: extract out compactor creation from config (#4018 ) * refactor: extract out compactor creation from config * fix: fmt Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-11 14:46:34 +00:00
Carol (Nichols \|\| Goulding)	944f628e29	fix: Remove data_types as a dependency of ng compactor (#3993 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-09 17:03:02 +00:00
Nga Tran	09fba1d2c0	feat: NG Compactor - main function for finding and compacting parquet files (#3973 ) * feat: main function for finding and compacting parquet files * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: rename file and struct Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-08 16:34:43 +00:00
Andrew Lamb	b870b9340b	chore: remove uneeded dependencies (#3929 ) * chore: remove unused deps in compactor * chore: remove unused deps in influxdb_ioxd * chore: remove unused deps in object_store * chore: remove unused deps in server * fix: object_store needs observability deps when compiled with aws Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-04 17:39:46 +00:00
Luke Bond	34e06e8689	fix: compactor server stays up; removed unused delegates (#3855 ) * fix: compactor server stays up; removed unused delegates * chore: fmt Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-24 16:30:44 +00:00
dependabot[bot]	ad3868ed7c	chore(deps): Bump tokio from 1.16.1 to 1.17.0 (#3814 ) * chore(deps): Bump tokio from 1.16.1 to 1.17.0 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.16.1 to 1.17.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.16.1...tokio-1.17.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * build: update workspace-hack Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dom Dwyer <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-22 16:27:43 +00:00
Luke Bond	0f012de70c	feat: adding compactor CLI command and crate Closes: #3777	2022-02-21 12:24:09 +00:00

... 7 8 9 10 11

550 Commits (1ddc64d68db906c6490f36d4aecde7ccd5bff945)