influxdb

Commit Graph

Author	SHA1	Message	Date
Nga Tran	ea81152fac	refactor: add partition ID into debug info and panic earlier to identify the bug easier (#4716 ) * chore: point tests to the new ticket * chore: cleanup * refactor: add partition ID into debug info and panic earlier to identify the bug easier	2022-05-27 12:20:36 +00:00
Nga Tran	09b55a209d	chore: point tests to the new ticket (#4715 ) * chore: point tests to the new ticket * chore: cleanup	2022-05-27 11:12:55 +00:00
Nga Tran	372b262f37	test: parquet meta decoded tests and more debug info (#4713 ) * test: reproducer for 4695 * chore: some debug info * test: test with many columns and rows * chore: cleanup and add debug info * chore: cleanup * chore: cleanup * chore: more debug info	2022-05-27 09:53:07 +00:00
Carol (Nichols \|\| Goulding)	b2905650aa	refactor: Extract extract_range to be a method on TableSummary So that other kinds of chunks can use this code too.	2022-05-26 16:52:14 -04:00
Nga Tran	05151d5c69	test: reproducer for 4695 (#4706 ) * test: reproducer for 4695 * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-26 15:32:30 +00:00
Dom Dwyer	6aa626ef84	refactor: retry object store upload Changes the Storage::upload() method to endlessly retry uploading the generated Parquet file.	2022-05-24 11:29:42 +01:00
Andrew Lamb	e877a64462	feat: Add `ParquetFiles` cache and memory size estimation for ParquetMetadata (#4661 ) * feat: Add `ParquetFiles` cache * fix: Apply suggestions from code review Co-authored-by: Marko Mikulicic <mkm@influxdata.com> * fix: remove commented out debugging println * refactor: Improve size calculation * fix: mark `ParquetFileCache::clear` test only * fix: assert on metric count Co-authored-by: Marko Mikulicic <mkm@influxdata.com>	2022-05-23 17:11:38 +00:00
Dom Dwyer	2e6c49be83	refactor: remove IoxMetadata min & max timestamp Removes the min/max timestamp fields from the IoxMetadata proto structure embedded within a Parquet file's metadata. These values are redundant as they already exist within the Parquet column statistics, and precluded streaming serialisation as these removed min/max values were needed before serialising the file.	2022-05-23 16:27:08 +01:00
Dom Dwyer	a142a9eb57	refactor: remove row_count from IoxMetadata Remove the redundant row_count from the IoxMetadata structure that is serialised into the Parquet file. The reasoning is twofold: * The Parquet file's native metadata already contains a row count * Needing to know the number of rows up-front precludes streaming	2022-05-23 16:18:35 +01:00
Dom Dwyer	71555ee55c	test: Parquet metadata integration test Adds two integration tests covering validation of the embedded IOx metadata within the Parquet file metadata, and validation of the derived ParquetFileParams metadata used to populate the catalog.	2022-05-23 16:17:56 +01:00
Dom Dwyer	af6d3f4d48	docs: remove clone ref comment	2022-05-23 11:46:06 +01:00
Dom Dwyer	00dc95829d	style: enable more lints Enable more lints on the parquet_file crate to keep it a little cleaner - adds the following: clippy::clone_on_ref_ptr, unreachable_pub, missing_docs, clippy::todo, clippy::dbg_macro This commit includes fixes for any new lint failures.	2022-05-20 15:17:40 +01:00
Dom Dwyer	7df7c4844c	refactor: remove redundant ParquetChunk errors Eliminates unused / refactors away unnecessary errors for the parquet::chunk module.	2022-05-20 15:17:40 +01:00
Dom Dwyer	661f8599a6	refactor: internalise Parquet path generation Derive the ParquetFilePath from the IoxMetadata within the ParquetStorage::read_filter() call. This prevents the "put/get RecordBatches" abstraction from leaking out the object store path generation concern - an implementation detail of the ParquetStorage layer.	2022-05-20 15:17:40 +01:00
Dom Dwyer	cdb341d45a	test: ParquetStorage upload() and read_filter() Adds tests for the previously untested (directly at least) Parquet (de)serialisation & persistence layer, provided by the ParquetStorage type.	2022-05-20 15:17:40 +01:00
Dom Dwyer	302301659e	refactor: derive ParquetFilePath from IoxMetadata Allow directly converting an IoxMetadata to a ParquetFilePath.	2022-05-20 15:17:40 +01:00
Dom Dwyer	b9a745d42d	feat: RecordBatch stream to Parquet file upload Implements an upload() method on the ParquetStorage type, consuming a stream of RecordBatch, serialising the Parquet file, and uploading the result to object storage. Returns the IOx-specific file metadata. Currently while the upload() method accepts a stream of RecordBatch, the actual resulting Parquet file is buffered in memory before uploading to object store, due to lack of streaming upload functionality in the ObjectStore abstraction - this isn't the end of the world, as the files tend to be relatively small with our current usage. This impl should be easily modified to be fully streaming once streaming object store puts are implemented: https://github.com/influxdata/object_store_rs/issues/9	2022-05-20 15:17:40 +01:00
Dom Dwyer	76e08d14a3	perf: IoxParquetMetaData direct from file metadata Construct a IoxParquetMetaData instance directly from the FileMetaData instance returned by the ArrowWriter. This change will allow us to avoid the inefficient impl currently in use: * Serialise batches into memory * Wrap buffer in arrow cursor * Read parquet metadata with arrow file reader * Serialise schema with thrift * Serialise each row group's metadata with thrift * Construct our own FileMetaData instance * Serialise FileMetaData with thrift * zstd encode resulting thrift bytes * Wrap in IoxParquetMetaData Now we "only": * Stream batches into opaque Write impl * Serialise FileMetaData with thrift * zstd encode resulting thrift bytes * Wrap in IoxParquetMetaData Then accessing any data within the IoxParquetMetaData (as before this change) requires deserialising it first. There are still a number of easy performance improvements to be had w.r.t the metadata handling.	2022-05-20 15:17:40 +01:00
Dom Dwyer	70856a645f	feat: streaming RecordBatch -> parquet encoding Implements a streaming RecordBatch to Parquet file serialiser. This impl automatically discovers the schema of the RecordBatch stream, and accepts &mut destination types (internalising the handle cloning/etc) to simplify caller usage. This encoder returns the resulting FileMetaData to allow callers to inspect the resulting metadata without reading back the file. Currently unused / not yet plumbed in.	2022-05-20 15:09:26 +01:00
Marco Neumann	addc45327e	fix: ensure that query tokio background tasks are canceled (#4643 ) * fix: ensure that query tokio background tasks are canceled While I am not entirely sure if this explains some of the memory leaks I am seeing in prod, not canceling the tasks correctly certainly makes debugging way harder and also renders certain form of throttling (e.g. max. concurrent queries) somewhat ineffective. Note that parquet file downloads are currently NOT canceled because tokios `spawn_blocking` cannot be canceled. * refactor: `Vec` -> `Option` * refactor: `spawn_blocking` creates a join handle, even though it is useless Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-20 07:18:52 +00:00
Dom Dwyer	baa86d846f	refactor: use ParquetStore instead of ObjectStore Changes the code paths that interact with Parquet files in the object store to reference the ParquetStorage directly (DRY refactor). This change takes us from a dependency graph of: ┌─────────────────┐ │ │ ▼ │ Parquet Consumer │ │ ┌──────────────┐ ├────────▶│ParquetStorage│ ▼ └──────────────┘ ┌──────────────┐ │ ObjectStore │ └──────────────┘ │ ┌────┴────┐ ▼ ▼ File s3 System (etc) to: Parquet Consumer │ ▼ ┌──────────────┐ │ParquetStorage│ └──────────────┘ │ ▼ ┌──────────────┐ │ ObjectStore │ └──────────────┘ │ ┌────┴────┐ ▼ ▼ File s3 System (etc) With the ParquetStorage being solely responsible for managing interactions with the object store when dealing with Parquet files.	2022-05-19 13:52:51 +01:00
Dom Dwyer	d3548653d5	refactor: rename Storage -> ParquetStorage Renames the Storage type so the context is clear in usage (i.e. fn args), rather than having to rely on knowing the fully-qualified import path to know what the type stores.	2022-05-19 13:51:07 +01:00
Dom Dwyer	e20b02b914	refactor: tidy ParquetChunk constructor Removes two unused constructors for a ParquetChunk, and moves the bare fn constructor that is actually used to be an associated method (a conventional constructor).	2022-05-19 13:51:07 +01:00
Dom Dwyer	7a8e6d1a38	refactor: remove unused max_row_group_size The Parquet writer references an unused max_row_group_size property in the parquet file metadata.	2022-05-18 16:45:15 +01:00
Andrew Lamb	3a33e806c7	chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `14.0.0` (#4619 ) * chore: Update datafusion deps * chore: update arrow/parquet/arrow flight deps * chore: Run cargo hakari tasks * chore: Update location of utils * chore: Update some more APIs Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-05-17 14:13:03 +00:00
Nga Tran	9530e73925	chore: move noisy debug to trace and fix some comments (#4598 ) * chore: move noisy debug to trace and fix some comments * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * chore: fix format Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-13 19:18:15 +00:00
Raphael Taylor-Davies	f2bb0fdf77	feat: update to crates.io object_store version (#4595 ) * feat: update to crates.io object_store version * chore: Run cargo hakari tasks * fix: tests * chore: remove object store integration test plumbing Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-05-13 16:26:07 +00:00
Carol (Nichols \|\| Goulding)	55313d290a	fix: Update or remove comments that mention NG or OG Connects to #4450.	2022-05-12 16:09:08 -04:00
Raphael Taylor-Davies	8b379c83cc	refactor: simplify object_store path handling (#4534 ) * refactor: simplify object_store path handling * fix: aws integration tests * chore: lint * fix: update gcs tests * refactor: move errors into submodules * chore: lint * chore: review feedback * refactor: replace provider with Display * fix: failing tests Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-09 18:43:22 +00:00
Jake Goulding	e07bcd40c2	refactor: Remove unused dependencies These were found by iterating over all of the dependencies of each Cargo.toml, then grepping that crate for the dependency's name. If it didn't show up, I attempted to remove it. I left a few dependencies that this process flagged: * generated_types - `pbjson`,`serde`. Apparently used by the generated code. * grpc-router-test-gen - `prost`. Apparently used by the generated code. * influxdb_iox - `heappy`. Doesn't appear used, but is behind enough feature flags that I don't care to reason about and it's already optional. - `tikv_jemalloc_sys`. Appears to be setting a feature flag of an indirect dependency. * iox_gitops_adapter - `k8s_openapi`. Appears to be setting a feature flag of an indirect dependency.	2022-05-06 15:57:58 -04:00
Carol (Nichols \|\| Goulding)	068096e7e1	fix: Rename data_types2 to data_types	2022-05-06 14:45:39 -04:00
Carol (Nichols \|\| Goulding)	0541c6e40f	fix: Remove data_types crate where it's no longer used	2022-05-06 14:45:39 -04:00
Carol (Nichols \|\| Goulding)	d2671355c3	fix: Move partition metadata types to data_types2	2022-05-06 14:45:37 -04:00
Carol (Nichols \|\| Goulding)	ea46830954	fix: Remove iox_object_store crate; move ParquetFilePath to parquet_file	2022-05-06 14:45:36 -04:00
Carol (Nichols \|\| Goulding)	b4894c2b46	fix: Remove unused parts of parquet_file	2022-05-06 11:30:36 -04:00
Carol (Nichols \|\| Goulding)	ba8191c1eb	fix: Remove persistence_windows	2022-05-06 11:30:35 -04:00
Andrew Lamb	02893e598c	chore: Update datafusion and upgrade arrow/parquet/arrow-flight to 13 (#4516 ) * chore: Tool for automating arrow version update * chore: Update datafusion and arrow/parquet/arrow-flight * fix: update for changes in Arrow API Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-05 00:21:02 +00:00
dependabot[bot]	420c306caa	chore(deps): Bump tokio from 1.17.0 to 1.18.0 (#4453 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.17.0 to 1.18.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.17.0...tokio-1.18.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-04-28 08:21:17 +00:00
二手掉包工程师	4b47d723b1	refactor: Rename time to iox_time (#4416 ) Signed-off-by: hi-rustin <rustin.liu@gmail.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-26 00:19:59 +00:00
Marco Neumann	86e8f05ed1	fix: make all catalog IDs 64bit (#4418 ) Closes #4365. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-25 16:49:34 +00:00
Nga Tran	d963110842	feat: group chunk overlaps based on time range only (#4389 ) * feat: overlap for NG querier * chore: cleanup * refactor: address review comments * fix: typo Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-25 13:32:07 +00:00
Andrew Lamb	73bed810da	chore: Update arrow, arrow-flight, parquet, tonic, prost, etc (#4357 ) * chore: Update datafusion * chore: Update arrow/arrow-flight/parquet to 12 * chore: update datafusion correctly * chore: Update prost, tonic, and dependents * fix: Fixup some api changes * fix: Update test output in db * fix: Update test output in parquet_file * fix: remove old pbjson types * fix: Add "--experimental_allow_proto3_optional" flag * chore: Run cargo hakari tasks * fix: compile error * chore: Update heappy Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-20 11:12:17 +00:00
Carol (Nichols \|\| Goulding)	94dcde4996	fix: Do fewer queries for metadata By adding another _with_metadata catalog function. Also introduce a new type rather than passing around tuples everywhere.	2022-04-13 10:43:20 -04:00
Carol (Nichols \|\| Goulding)	02fee3b84f	feat: Request parquet metadata from the catalog when needed only	2022-04-13 10:43:19 -04:00
Dom Dwyer	6131381b8d	refactor: extra debug in compactor Continues pushing more debug through the compaction processing loop.	2022-04-08 11:20:19 +01:00
Dom Dwyer	3706ac042d	refactor: add debug in compaction path Adds debug!() and friends through the compaction path.	2022-04-07 17:13:45 +01:00
dependabot[bot]	438e739344	chore(deps): Bump parquet from 11.0.0 to 11.1.0 (#4240 ) * chore(deps): Bump parquet from 11.0.0 to 11.1.0 Bumps [parquet](https://github.com/apache/arrow-rs) from 11.0.0 to 11.1.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/apache/arrow-rs/compare/11.0.0...11.1.0) --- updated-dependencies: - dependency-name: parquet dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * fix: Update tests Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-04-06 14:51:01 +00:00
Nga Tran	ddc2c8304f	fix: have the compaction level set correctly (#4184 ) * fix: have the compaction level set correctly, especially for compacted file from the compactor * fix: typo	2022-03-30 21:23:40 +00:00
Marco Neumann	20bbb88dc5	refactor: remove table name from `TableSummary` (#4170 ) This allows us to remove the table name from the low-level chunk representations (like `ParquetFile`, RUB, ...) since table names are already tracked by the higher-level data structures (e.g. catalog, catalog chunk) that manage the low-level chunk representations. This is similar to #4167. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-30 13:24:00 +00:00
Marco Neumann	036626a576	refactor: remove partition key from `ParquetChunk` (#4167 ) The parquet chunk is always wrapped into some higher-level data structure (e.g. a catalog chunk, a partition, ...) that knows exactly "where" the chunk is located. There is no need for the parquet chunk to back-reference container-level attributes. In the contrary: double-bookkeeping makes the code more complex and costs additional memory. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-30 09:24:56 +00:00
Marco Neumann	2b76c31157	refactor: make statistics null counts optional (#4160 ) Min/max values and distinct counts are already optional, so let's make the null counts optional as well. This will be helpful for NG to deal w/ partial statistics (e.g. we only populate stats for the time column). Note that the total count is still mandatory, but we normally have the chunk/file-level row count at hand.	2022-03-29 17:47:57 +00:00
Carol (Nichols \|\| Goulding)	f3f792fd08	feat: Add namespace_id to the parquet_files table; object store paths need it	2022-03-29 08:15:26 -04:00
Andrew Lamb	5c69a3f43b	chore: Update deps: datafusion, arrow/arrow-flight/parquet to 11, zstd to 0.11 (#4119 ) * chore: update datafusion * chore(deps): Bump arrow from 10.0.0 to 11.0.0 Bumps [arrow](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0) --- updated-dependencies: - dependency-name: arrow dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): Bump arrow-flight from 10.0.0 to 11.0.0 Bumps [arrow-flight](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0) --- updated-dependencies: - dependency-name: arrow-flight dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * chore: update parquet to 11.0.0 * fix: error on create schema, test for same * fix: upgrade zstd * chore: Run cargo hakari tasks * fix: fix logical merge conflict * fix: hakari * fix: hakari * fix: update newly introduced dep Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-24 15:27:36 +00:00
Marco Neumann	51da6dd7fa	feat: store sort key in NG metadata (#4110 ) The sort key is optional and currently only produced by `iox_tests`. Writing it within the ingester/compactor is tracked by #3968. The sort key is read by the querier (and this will be verified by the query tests and is required to merge #4103). Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-23 18:24:46 +00:00
Dom Dwyer	1d5066c421	refactor: rename ObjectStore -> ObjectStoreImpl Frees up the name for so we can use `dyn ObjectStore` throughout the code instead of `ObjectStoreApi`.	2022-03-15 16:29:43 +00:00
Carol (Nichols \|\| Goulding)	ecd06c6ec3	fix: ParquetFileRepo create should be responsible for setting INITIAL_COMPACTION_LEVEL When created in the catalog, parquet files should always have compaction level 0. Updating the compaction level should always happen in the compactor. Only the catalog should need to know about the initial compaction level value.	2022-03-10 13:51:18 -05:00
Carol (Nichols \|\| Goulding)	ff31407dce	refactor: Extract a ParquetFileParams type for create This has the advantages of: - Not needing to create fake parquet file IDs or fake deleted_at values that aren't used by create before insertion - Not needing too many arguments for create - Naming the arguments so it's easier to see what value is what argument, especially in tests - Easier to reuse arguments or parts of arguments by using copies of params, which makes it easier to see differences, especially in tests	2022-03-10 13:51:18 -05:00
Paul Dix	27999ff72f	feat: add compaction_level and created_at to parquet_file (#3972 )	2022-03-10 15:56:57 +00:00
Andrew Lamb	2c3d30ca32	chore: Update datafusion, arrow, flight and parquet (#4000 ) * chore: Update datafusion, arrow, flight and parquet * fix: api change * fix: fmt * fix: update test metadata size * fix: Update sizes in parquet test * fix: more metadata size update	2022-03-10 12:24:47 +00:00
Nga Tran	c6cab3538f	refactor: move parquet chunk's new and decode to parquet_file crate (#3987 )	2022-03-08 22:04:32 +00:00
Andrew Lamb	e09f39d6a0	chore: Update datafusion (#3943 ) * chore: Update datafusion * refactor: update for new datafusion * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-03-04 19:37:46 +00:00
Andrew Lamb	677a272095	refactor: Clean up some future clippy warnings from nightly (#3892 ) * refactor: clean up new clippy lints * refactor: complete other cleanups * fix: ignore overzealous clippy * fix: re-remove old code	2022-03-03 19:14:27 +00:00
Carol (Nichols \|\| Goulding)	8f3e44bf76	refactor: Extract a crate for shared data types in the new design	2022-03-02 12:16:15 -05:00
Marco Neumann	33851be3a5	chore: upgrade Rust to 1.59 (#3875 ) Mostly a few new clippy crates around `flat_map`, `and_then`, and "underscore locks" (!!!): https://rust-lang.github.io/rust-clippy/master/index.html#let_underscore_lock Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-28 15:14:19 +00:00
Raphael Taylor-Davies	2a842fbb1a	feat: correctly sort data and store in catalog metadata (#3864 ) * feat: respect sort order in ChunkTableProvider (#3214) feat: persist sort order in catalog (#3845) refactor: owned SortKey (#3845) * fix: size tests * refactor: immutable SortKey * test: test sort order restart (#3845) * chore: explicit None for sort key * chore: test cleanup * fix: handling of sort keys containing fields * chore: remove unused selected_sort_key * chore: more docs Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-25 17:56:27 +00:00
Marco Neumann	f966f4c7a4	feat: create `ParquetChunk` in querier (#3857 ) Adds a small adapter that is able to produce `ParquetChunk`s for NG. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-25 08:54:16 +00:00
Marco Neumann	49d1be30e7	feat: wire up `ParquetFilePath` for NG (#3853 ) It's a bit of a duck-type hack, but if we wanna just `ParquetFileChunk` in the new architecture, we somehow need it to accept new-gen paths. Also path handling should be somewhat centralized since ingester/compactor/querier all need to construct them. So having a `ParquetFilePath` that supports both path styles seems to be a not-to-bad solution. This should obviously be cleaned up in some not-to-distant future.	2022-02-24 16:05:38 +00:00
Carol (Nichols \|\| Goulding)	252ced7adf	feat: Add row count to the parquet_file record in the catalog (#3847 ) Fixes #3842.	2022-02-24 15:20:50 +00:00
Marco Neumann	d62a052394	feat: extend catalog so we can recover `ParquetChunk`s from it (#3852 ) * refactor: less parquet data copying * feat: `PartitionRepo::get_by_id` * feat: `TableRepo::get_by_id` * feat: `ParquetFile::file_size_bytes` * feat: `ParquetFile::parquet_metadata` Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-24 13:16:15 +00:00
dependabot[bot]	b63f920d4c	chore(deps): Bump parquet from 9.0.2 to 9.1.0 (#3828 ) * chore(deps): Bump parquet from 9.0.2 to 9.1.0 Bumps [parquet](https://github.com/apache/arrow-rs) from 9.0.2 to 9.1.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0) --- updated-dependencies: - dependency-name: parquet dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * chore: update chunk size test Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-23 11:25:15 +00:00
dependabot[bot]	3b7d31c88a	chore(deps): Bump arrow from 9.0.2 to 9.1.0 (#3826 ) Bumps [arrow](https://github.com/apache/arrow-rs) from 9.0.2 to 9.1.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0) --- updated-dependencies: - dependency-name: arrow dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-02-23 09:25:46 +00:00
dependabot[bot]	ad3868ed7c	chore(deps): Bump tokio from 1.16.1 to 1.17.0 (#3814 ) * chore(deps): Bump tokio from 1.16.1 to 1.17.0 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.16.1 to 1.17.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.16.1...tokio-1.17.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * build: update workspace-hack Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dom Dwyer <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-22 16:27:43 +00:00
Andrew Lamb	a30803e692	chore: Update datafusion, update `arrow`/`parquet`/`arrow-flight` to 9.0 (#3733 ) * chore: Update datafusion * chore: Update arrow * fix: missing updates * chore: Update cargo.lock * fix: update for smaller parquet size * fix: update test for smaller parquet files * test: ensure parquet_file tests write multiple row groups * fix: update callsite * fix: Update for tests * fix: harkari * fix: use IoxObjectStore::existing Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-15 12:10:24 +00:00
Carol (Nichols \|\| Goulding)	73828323ac	feat: Ingester Flight gRPC API (#3623 ) * feat: Add a way to run ingester with an in-memory catalog from the CLI If you set the --catalog-dsn string to "mem", rather than using that as a Postgres connection URL, create an in-memory catalog. Planning on using this in tests, so not documenting. * fix: Set default topic to the same value as SHARED_KAFKA_TOPIC Namely, both should use an underscore. I don't think there's a way to directly share these values between a constant and an annotation. * feat: Add a flight API (handshake only) to ingester * fix: Create partitions if using file-based write buffer * fix: Change the server fixture to handle ingester server type For now, the ingester doesn't implement the deployment API. Not sure if it should or not. * feat: Start implementing ingester do_get, namely decoding the query Skip serialization of the predicate for the moment. * refactor: Rename ingest protos to ingester to match crate name * refactor: Rename QueryResults to QueryData * feat: Move ingester flight client to new querier crate * fix: Off by one error, different starting indexes in sequencers * fix: Create new CLI argument to pick the catalog type * fix: Create a CLI option to set the number of topics to auto-create in the write buffer * fix: Check the arrow flight service's health to tell that the ingester gRPC is up * fix: Set postgres as the default catalog type * fix: Return an error rather than panicking if CLI args aren't right	2022-02-09 19:07:44 +00:00
Carol (Nichols \|\| Goulding)	2e30483f1f	refactor: Remove predicate module from predicate crate (#3648 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-07 14:54:07 +00:00
Nga Tran	17fbeaaade	feat: insert the persisted info into the catalog in one transaction (#3636 ) * feat: add ProcessedTombstoneRepo * feat: add function add_parquet_file_with_tombstones * fix: remove unecessary use * feat: handling transaction when adding parquet file and its processed tombstones * feat: tests update catalog for parquet file and processed tombstones * fix: make add parquet file & its processed tombstones fully transactional * chore: cleanup * test: add integration tests for new catalog update functions * chore: remove catalog_update.rs * chore: cleanup * fix: assert the right values * fix: create unique namespace * fix: support non transaction create_many * test: remove tests that do not work in a transaction * fix: one more case with unique namespace * chore: more verification around for better understanding why certain tests fail * fix: compare difference rather than absolute becasue the DB already has data * fix: fix the argument provided to SQL * fix: return non-empty processed tombstones * fix: insert the right parquet file * chore: remove unsed file Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-07 14:44:15 +00:00
Carol (Nichols \|\| Goulding)	62a2ad289b	feat: Implement deserializing IoxMetadata from protobuf (#3589 ) Fixes #3587. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-02 16:05:21 +00:00
Marco Neumann	22778a3a80	chore: upgrade rskafka and parking_lot (#3592 )	2022-02-01 11:50:42 +00:00
Carol (Nichols \|\| Goulding)	093d5acfd4	fix: Unify temporary multiple definitions of IoxMetadata	2022-01-31 10:48:29 -05:00
Carol (Nichols \|\| Goulding)	8f81ce5501	refactor: Share parquet_file::storage code between new and old metadata	2022-01-31 10:36:33 -05:00
Carol (Nichols \|\| Goulding)	bf89162fa5	refactor: Move IoxMetadata to parquet_file	2022-01-31 10:36:33 -05:00
Carol (Nichols \|\| Goulding)	0f72a881ef	refactor: Rename Rust struct parquet_file::IoxMetadata to be IoxMetadataOld	2022-01-31 10:36:33 -05:00
Carol (Nichols \|\| Goulding)	1b298bb5bd	refactor: Alias the old proto definitions to make clearer the new ones coming in	2022-01-31 10:36:33 -05:00
Dom	32d7c4cbfe	refactor: remove InfluxColumnType::IOx (#3565 ) * refactor: remove InfluxColumnType::IOx Remove unused column variant - see #3554 for context. * refactor: reserve SEMANTIC_TYPE_IOX name in proto Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-01-27 21:15:36 +00:00
Andrew Lamb	5488c257d1	chore: Update datafusion, upgrade to arrow/parqet/arrow-flight 8.0.0 (#3517 ) * chore: Update datafusion * chore: update to arrow 8 * fix: update to use new DataFusion APIs * fix: update case for sortedness * fix: cargo hakari	2022-01-27 13:33:27 +00:00
Andrew Lamb	dd23056efd	chore: update datafusion, arrow, prost, tonic, pbjson, etc (#3455 ) * chore: update datafusion, arrow, prost, tonic, etc * fix: update pprof as well * chore: update hakari * fix: update pbjson * chore: update heappy * fix: hakari * fix: workaround https://github.com/influxdata/influxdb_iox/issues/3458 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-01-13 17:07:15 +00:00
Andrew Lamb	cdf5c21cd4	fix: Fix max timestamp value comparison in chunk metadata (#3453 ) * fix: Fix max timestamp value comparison in chunk metadata * refactor: rename contains to overlaps Co-authored-by: Edd Robinson <me@edd.io>	2022-01-13 16:58:30 +00:00
Raphael Taylor-Davies	c5cf03511c	fix: parquet column count statistics (#2124 ) (#3444 ) * fix: parquet metadata total_count (#2124) * chore: review feedback Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-01-11 21:56:24 +00:00
Marco Neumann	f3f6f335a9	chore: upgrade to snafu 0.7 (#3440 )	2022-01-11 19:22:36 +00:00
Marco Neumann	37bb7f2120	chore: `cargo update` dependabot currently doesn't work due to https://github.com/dependabot/dependabot-core/issues/4574 Excluded `quote` due to https://github.com/dtolnay/quote/issues/204	2022-01-11 14:57:51 +01:00
Nga Tran	ec8644a39a	refactor: return clearer error message	2021-12-07 12:24:28 -05:00
Nga Tran	561c5ed8e7	refactor: make checking no data happen during reading inout stream	2021-12-07 12:03:41 -05:00
Nga Tran	c992c82582	chore: Merge branch 'main' into ntran/compact_os_tests	2021-12-07 11:08:12 -05:00
Raphael Taylor-Davies	5fdaa5b4ab	chore: don't panic with invalid parquet (#3309 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-12-06 21:15:35 +00:00
Carol (Nichols \|\| Goulding)	7499eac067	fix: Disable uuid serde feature; we're not actually serializing any UUIDs Connects to #3117.	2021-12-06 09:37:31 -05:00
Carol (Nichols \|\| Goulding)	02c297e850	fix: Always specify the parking_lot feature of tokio to get potential perf boost	2021-12-06 09:37:15 -05:00
Carol (Nichols \|\| Goulding)	0b24b3c227	fix: Use a consistent version specifier when depending on the futures crate	2021-12-06 09:37:12 -05:00
Raphael Taylor-Davies	bca561366b	feat: don't copy parquet files out of disk object store (#3282 ) (#3293 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-12-05 16:31:40 +00:00
Raphael Taylor-Davies	11067bfe3f	feat: simplify parquet reader (#3282 ) (#3291 ) * feat: simplify parquet reader (#3282) * chore: add back log line	2021-12-03 23:21:58 +00:00
Nga Tran	86f9fe0bcb	refactor: no longer need to create and test no-row-groups parquet files	2021-12-03 15:14:04 -05:00
Nga Tran	152281e428	fix: Capture the right 'no data' while parquet has no data	2021-12-03 12:19:48 -05:00
kodiakhq[bot]	2857b6a990	Merge branch 'main' into er/feat/load_chunk_cli	2021-12-02 20:20:56 +00:00
Edd Robinson	b4ea9887ba	refactor: error name	2021-12-02 20:14:02 +00:00
Carol (Nichols \|\| Goulding)	5d0fd1c603	fix: Allow dead code on fields that are now detected as never read	2021-12-02 11:52:01 -05:00
Edd Robinson	88aedc556e	feat: add FromStr implementation	2021-12-02 12:59:52 +00:00
Nga Tran	bf74608dc8	docs: not persist of the input stream is empty	2021-12-01 17:53:19 -05:00
Nga Tran	f085af034e	refactor: not persist empty chunk resulting from deleting & deduplicating (#3274 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-12-01 20:57:30 +00:00
Nga Tran	f53cdca010	feat: handling empty compacted stream	2021-11-30 18:13:36 -05:00
Raphael Taylor-Davies	197634ed50	feat: reload chunk back into read buffer (#3209 ) (#3216 ) * feat: reload chunk back into read buffer (#3209) * chore: fix logical conflict Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-11-29 11:34:55 +00:00
kodiakhq[bot]	d16a7759ca	Merge branch 'main' into cn/workspace-hack	2021-11-22 17:05:31 +00:00
Raphael Taylor-Davies	73d60539ad	refactor: use ChunkGenerator in parquet_catalog (#2209 ) (#3167 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-11-22 10:29:33 +00:00
Carol (Nichols \|\| Goulding)	9fd4a560f5	feat: Results of running cargo hakari manage-deps	2021-11-19 09:21:57 -05:00
Raphael Taylor-Davies	ca4e0ad13b	refactor: add parquet chunk generator (#2209 ) (#3163 ) * refactor: add parquet chunk generator (#2209) * fix: tests Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-11-19 12:35:18 +00:00
Carol (Nichols \|\| Goulding)	c8d80e5c28	fix: Change database paths to be under /dbs/ instead of under /[server id]/	2021-11-05 10:14:06 -04:00
Andrew Lamb	1902c4f8a9	chore: Update DataFusion (#3012 ) * chore: Update DataFusion * fix: restore Cargo.log Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-11-02 18:06:21 +00:00
Marco Neumann	4c9570b519	refactor: move `catalog` protobuf to `preserved_catalog` This makes it clearer what's going since the contained messages are only for the preserved part, not the in-mem catalog and its management.	2021-11-01 18:07:25 +01:00
dependabot[bot]	c540b40f05	chore(deps): bump tokio from 1.12.0 to 1.13.0 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.12.0 to 1.13.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.12.0...tokio-1.13.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2021-11-01 11:21:59 +00:00
Carol (Nichols \|\| Goulding)	990f768cda	fix: Assign a UUID when creating a database	2021-10-28 13:20:28 -04:00
Carol (Nichols \|\| Goulding)	8198c1ff2a	refactor: Rename IoxObjectStore constructors to better match what server does with Databases	2021-10-28 13:20:27 -04:00
Marco Neumann	bc7244c48e	chore: use Rust edition 2021	2021-10-25 10:58:20 +02:00
Andrew Lamb	a82dc6f5f0	chore: Update datafusion + arrow (#2903 ) * chore: Update datafusion to latest, arrow to 6.0.0 * fix: Update tests * fix: bubble internal error Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-10-19 17:14:08 +00:00
Marco Neumann	d8f35d8ee9	chore: remove unused `parquet_file` => `chrono` dep	2021-10-19 14:45:56 +02:00
Marco Neumann	28195b9c0c	chore: new `parquet_catalog` crate	2021-10-14 14:34:59 +02:00
Andrew Lamb	0568452a0c	chore: Update datafusion (#2838 ) * chore: update datafusion version * refactor: Update to use new datafusion apis * fix: do not upgrade other packages	2021-10-13 20:51:19 +00:00
Marco Neumann	1523e0edcd	refactor: clean up preserved catalog interface 1. Remove `new_empty` logic. It's a leftover from the time when the `PreservedCatalog` owned the in-memory catalog. 2. Make `db_name` a part of the `PreservedCatalogConfig`.	2021-10-13 13:58:11 +02:00
Raphael Taylor-Davies	8414e6edbb	feat: migrate preserved catalog to TimeProvider (#2722 ) (#2808 ) * feat: migrate preserved catalog to TimeProvider (#2722) * fix: deterministic catalog prune tests * fix: failing test Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-10-12 14:43:05 +00:00
Raphael Taylor-Davies	3dfe400e6b	feat: migrate write path to TimeProvider (#2722 ) (#2807 )	2021-10-12 12:09:08 +00:00
Raphael Taylor-Davies	b39e01f7ba	feat: migrate PersistenceWindows to TimeProvider (#2722 ) (#2798 )	2021-10-11 20:40:00 +00:00
Raphael Taylor-Davies	06c2c23322	refactor: create PreservedCatalogConfig struct (#2793 ) * refactor: create PreservedCatalogConfig struct * chore: fmt Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-10-11 15:43:05 +00:00
Carol (Nichols \|\| Goulding)	5da2f7b1b0	Merge branch 'main' into cn/less-database-name	2021-10-11 10:35:42 -04:00
Raphael Taylor-Davies	afe34751e7	refactor: split out schema crate (#2781 ) * refactor: split out schema crate * chore: fix doc	2021-10-11 09:45:08 +00:00
Carol (Nichols \|\| Goulding)	8407735e00	fix: Pass the database name into PreservedCatalog	2021-10-08 15:25:10 -04:00
Carol (Nichols \|\| Goulding)	276aef69c9	refactor: Move PreservedCatalog test helper functions to test helpers and use them more	2021-10-08 15:25:10 -04:00
Carol (Nichols \|\| Goulding)	3aff4fcb07	refactor: Extract test helper functions for common catalog operations This will make the next change easier, and I think it makes the tests easier to read.	2021-10-08 15:25:10 -04:00
kodiakhq[bot]	559a7e0221	Merge branch 'main' into cn/chunk-addr-smaller	2021-10-08 17:26:20 +00:00
Carol (Nichols \|\| Goulding)	fbe76935f4	fix: Remove some calls to iox_object_store.database_name	2021-10-08 09:50:14 -04:00
Marco Neumann	64bda1fc08	feat: improve `Debug`/`Display` for test `ChunkId`s	2021-10-08 13:55:56 +02:00
Marco Neumann	d3de6bb6e4	refactor: `max_persisted_timestamp` => `flush_timestamp` There might be data left before this timestamp that wasn't persisted (e.g. incoming data while the persistence was running).	2021-10-08 12:36:23 +02:00
Marco Neumann	63a932fa37	refactor: "min unpersisted ts" => "max persisted ts" Store the "maximum persisted timestamp" instead of the "minimum unpersisted timestamp". This avoids the need to calculate the next timestamp from the current one (which was done via "max TS + 1ns"). The old calculation was prone to overflow panics. Since the timestamps in this calculation originate from user-provided data (and not the wall clock), this was an easy DoS vector that could be triggered via the following line protocol: ```text table_1 foo=1 <i64::MAX> ``` which is ```text table_1 foo=1 9223372036854775807 ``` Bonus points: the timestamp persisted in the partition checkpoints is now the very same that was used by the split query during persistence. Consistence FTW! Fixes #2225.	2021-10-08 11:52:49 +02:00
kodiakhq[bot]	7d6be3f500	Merge branch 'main' into crepererum/issue2748	2021-10-07 09:04:18 +00:00
Marco Neumann	63d74be490	refactor: make `ChunkId` a UUID	2021-10-07 10:23:27 +02:00
Marco Neumann	2a52fd90d9	fix: transaction pruning logic for "nothing to do"	2021-10-07 10:14:42 +02:00
kodiakhq[bot]	d72a494198	Merge branch 'main' into crepererum/in_mem_expr_part5	2021-10-05 16:20:24 +00:00
Marco Neumann	b8aa4c33ce	refactor: use protobuf bytes for transaction UUIDs	2021-10-05 12:27:48 +02:00
Marco Neumann	bb7a27e5ed	refactor: use proper sets during delete predicate collection We no longer need hacky pointer tricks to de-duplicate delete predicates when collecting them for catalog checkpoints. This was once required when the delete predicates didn't implement `Eq` and `Hash` but now it's all way easier.	2021-10-05 10:37:34 +02:00
Marco Neumann	28ccf2a8c3	refactor: `TransactionHandle::delete_predicate` cannot fail	2021-10-05 09:41:46 +02:00
Marco Neumann	10c1a72402	refactor: remove unused fields from `DeletePredicate`	2021-10-05 09:29:24 +02:00
Marco Neumann	97881079e8	refactor: make `ChunkOrder` non-zero This will make it easier to handle missing values. Helps with #2633.	2021-10-04 17:49:12 +02:00
Marco Neumann	75ac6e8646	refactor: make `DeletePredicate::range` non-optional	2021-10-04 16:36:20 +02:00
Marco Neumann	d1835a3eee	fix: doc links	2021-10-04 16:36:20 +02:00

1 2 3 4 5 ...

524 Commits (ef6eda639912290414a21b31708bad9696c2ab8b)