influxdb

Commit Graph

Author	SHA1	Message	Date
Nga Tran	ddc2c8304f	fix: have the compaction level set correctly (#4184 ) * fix: have the compaction level set correctly, especially for compacted file from the compactor * fix: typo	2022-03-30 21:23:40 +00:00
Marco Neumann	20bbb88dc5	refactor: remove table name from `TableSummary` (#4170 ) This allows us to remove the table name from the low-level chunk representations (like `ParquetFile`, RUB, ...) since table names are already tracked by the higher-level data structures (e.g. catalog, catalog chunk) that manage the low-level chunk representations. This is similar to #4167. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-30 13:24:00 +00:00
Marco Neumann	036626a576	refactor: remove partition key from `ParquetChunk` (#4167 ) The parquet chunk is always wrapped into some higher-level data structure (e.g. a catalog chunk, a partition, ...) that knows exactly "where" the chunk is located. There is no need for the parquet chunk to back-reference container-level attributes. In the contrary: double-bookkeeping makes the code more complex and costs additional memory. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-30 09:24:56 +00:00
Marco Neumann	2b76c31157	refactor: make statistics null counts optional (#4160 ) Min/max values and distinct counts are already optional, so let's make the null counts optional as well. This will be helpful for NG to deal w/ partial statistics (e.g. we only populate stats for the time column). Note that the total count is still mandatory, but we normally have the chunk/file-level row count at hand.	2022-03-29 17:47:57 +00:00
Carol (Nichols \|\| Goulding)	f3f792fd08	feat: Add namespace_id to the parquet_files table; object store paths need it	2022-03-29 08:15:26 -04:00
Andrew Lamb	5c69a3f43b	chore: Update deps: datafusion, arrow/arrow-flight/parquet to 11, zstd to 0.11 (#4119 ) * chore: update datafusion * chore(deps): Bump arrow from 10.0.0 to 11.0.0 Bumps [arrow](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0) --- updated-dependencies: - dependency-name: arrow dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): Bump arrow-flight from 10.0.0 to 11.0.0 Bumps [arrow-flight](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0) --- updated-dependencies: - dependency-name: arrow-flight dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * chore: update parquet to 11.0.0 * fix: error on create schema, test for same * fix: upgrade zstd * chore: Run cargo hakari tasks * fix: fix logical merge conflict * fix: hakari * fix: hakari * fix: update newly introduced dep Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-24 15:27:36 +00:00
Marco Neumann	51da6dd7fa	feat: store sort key in NG metadata (#4110 ) The sort key is optional and currently only produced by `iox_tests`. Writing it within the ingester/compactor is tracked by #3968. The sort key is read by the querier (and this will be verified by the query tests and is required to merge #4103). Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-23 18:24:46 +00:00
Dom Dwyer	1d5066c421	refactor: rename ObjectStore -> ObjectStoreImpl Frees up the name for so we can use `dyn ObjectStore` throughout the code instead of `ObjectStoreApi`.	2022-03-15 16:29:43 +00:00
Carol (Nichols \|\| Goulding)	ecd06c6ec3	fix: ParquetFileRepo create should be responsible for setting INITIAL_COMPACTION_LEVEL When created in the catalog, parquet files should always have compaction level 0. Updating the compaction level should always happen in the compactor. Only the catalog should need to know about the initial compaction level value.	2022-03-10 13:51:18 -05:00
Carol (Nichols \|\| Goulding)	ff31407dce	refactor: Extract a ParquetFileParams type for create This has the advantages of: - Not needing to create fake parquet file IDs or fake deleted_at values that aren't used by create before insertion - Not needing too many arguments for create - Naming the arguments so it's easier to see what value is what argument, especially in tests - Easier to reuse arguments or parts of arguments by using copies of params, which makes it easier to see differences, especially in tests	2022-03-10 13:51:18 -05:00
Paul Dix	27999ff72f	feat: add compaction_level and created_at to parquet_file (#3972 )	2022-03-10 15:56:57 +00:00
Andrew Lamb	2c3d30ca32	chore: Update datafusion, arrow, flight and parquet (#4000 ) * chore: Update datafusion, arrow, flight and parquet * fix: api change * fix: fmt * fix: update test metadata size * fix: Update sizes in parquet test * fix: more metadata size update	2022-03-10 12:24:47 +00:00
Nga Tran	c6cab3538f	refactor: move parquet chunk's new and decode to parquet_file crate (#3987 )	2022-03-08 22:04:32 +00:00
Andrew Lamb	e09f39d6a0	chore: Update datafusion (#3943 ) * chore: Update datafusion * refactor: update for new datafusion * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-03-04 19:37:46 +00:00
Andrew Lamb	677a272095	refactor: Clean up some future clippy warnings from nightly (#3892 ) * refactor: clean up new clippy lints * refactor: complete other cleanups * fix: ignore overzealous clippy * fix: re-remove old code	2022-03-03 19:14:27 +00:00
Carol (Nichols \|\| Goulding)	8f3e44bf76	refactor: Extract a crate for shared data types in the new design	2022-03-02 12:16:15 -05:00
Marco Neumann	33851be3a5	chore: upgrade Rust to 1.59 (#3875 ) Mostly a few new clippy crates around `flat_map`, `and_then`, and "underscore locks" (!!!): https://rust-lang.github.io/rust-clippy/master/index.html#let_underscore_lock Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-28 15:14:19 +00:00
Raphael Taylor-Davies	2a842fbb1a	feat: correctly sort data and store in catalog metadata (#3864 ) * feat: respect sort order in ChunkTableProvider (#3214) feat: persist sort order in catalog (#3845) refactor: owned SortKey (#3845) * fix: size tests * refactor: immutable SortKey * test: test sort order restart (#3845) * chore: explicit None for sort key * chore: test cleanup * fix: handling of sort keys containing fields * chore: remove unused selected_sort_key * chore: more docs Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-25 17:56:27 +00:00
Marco Neumann	f966f4c7a4	feat: create `ParquetChunk` in querier (#3857 ) Adds a small adapter that is able to produce `ParquetChunk`s for NG. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-25 08:54:16 +00:00
Marco Neumann	49d1be30e7	feat: wire up `ParquetFilePath` for NG (#3853 ) It's a bit of a duck-type hack, but if we wanna just `ParquetFileChunk` in the new architecture, we somehow need it to accept new-gen paths. Also path handling should be somewhat centralized since ingester/compactor/querier all need to construct them. So having a `ParquetFilePath` that supports both path styles seems to be a not-to-bad solution. This should obviously be cleaned up in some not-to-distant future.	2022-02-24 16:05:38 +00:00
Carol (Nichols \|\| Goulding)	252ced7adf	feat: Add row count to the parquet_file record in the catalog (#3847 ) Fixes #3842.	2022-02-24 15:20:50 +00:00
Marco Neumann	d62a052394	feat: extend catalog so we can recover `ParquetChunk`s from it (#3852 ) * refactor: less parquet data copying * feat: `PartitionRepo::get_by_id` * feat: `TableRepo::get_by_id` * feat: `ParquetFile::file_size_bytes` * feat: `ParquetFile::parquet_metadata` Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-24 13:16:15 +00:00
dependabot[bot]	b63f920d4c	chore(deps): Bump parquet from 9.0.2 to 9.1.0 (#3828 ) * chore(deps): Bump parquet from 9.0.2 to 9.1.0 Bumps [parquet](https://github.com/apache/arrow-rs) from 9.0.2 to 9.1.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0) --- updated-dependencies: - dependency-name: parquet dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * chore: update chunk size test Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-23 11:25:15 +00:00
dependabot[bot]	3b7d31c88a	chore(deps): Bump arrow from 9.0.2 to 9.1.0 (#3826 ) Bumps [arrow](https://github.com/apache/arrow-rs) from 9.0.2 to 9.1.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0) --- updated-dependencies: - dependency-name: arrow dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-02-23 09:25:46 +00:00
dependabot[bot]	ad3868ed7c	chore(deps): Bump tokio from 1.16.1 to 1.17.0 (#3814 ) * chore(deps): Bump tokio from 1.16.1 to 1.17.0 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.16.1 to 1.17.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.16.1...tokio-1.17.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * build: update workspace-hack Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dom Dwyer <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-22 16:27:43 +00:00
Andrew Lamb	a30803e692	chore: Update datafusion, update `arrow`/`parquet`/`arrow-flight` to 9.0 (#3733 ) * chore: Update datafusion * chore: Update arrow * fix: missing updates * chore: Update cargo.lock * fix: update for smaller parquet size * fix: update test for smaller parquet files * test: ensure parquet_file tests write multiple row groups * fix: update callsite * fix: Update for tests * fix: harkari * fix: use IoxObjectStore::existing Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-15 12:10:24 +00:00
Carol (Nichols \|\| Goulding)	73828323ac	feat: Ingester Flight gRPC API (#3623 ) * feat: Add a way to run ingester with an in-memory catalog from the CLI If you set the --catalog-dsn string to "mem", rather than using that as a Postgres connection URL, create an in-memory catalog. Planning on using this in tests, so not documenting. * fix: Set default topic to the same value as SHARED_KAFKA_TOPIC Namely, both should use an underscore. I don't think there's a way to directly share these values between a constant and an annotation. * feat: Add a flight API (handshake only) to ingester * fix: Create partitions if using file-based write buffer * fix: Change the server fixture to handle ingester server type For now, the ingester doesn't implement the deployment API. Not sure if it should or not. * feat: Start implementing ingester do_get, namely decoding the query Skip serialization of the predicate for the moment. * refactor: Rename ingest protos to ingester to match crate name * refactor: Rename QueryResults to QueryData * feat: Move ingester flight client to new querier crate * fix: Off by one error, different starting indexes in sequencers * fix: Create new CLI argument to pick the catalog type * fix: Create a CLI option to set the number of topics to auto-create in the write buffer * fix: Check the arrow flight service's health to tell that the ingester gRPC is up * fix: Set postgres as the default catalog type * fix: Return an error rather than panicking if CLI args aren't right	2022-02-09 19:07:44 +00:00
Carol (Nichols \|\| Goulding)	2e30483f1f	refactor: Remove predicate module from predicate crate (#3648 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-07 14:54:07 +00:00
Nga Tran	17fbeaaade	feat: insert the persisted info into the catalog in one transaction (#3636 ) * feat: add ProcessedTombstoneRepo * feat: add function add_parquet_file_with_tombstones * fix: remove unecessary use * feat: handling transaction when adding parquet file and its processed tombstones * feat: tests update catalog for parquet file and processed tombstones * fix: make add parquet file & its processed tombstones fully transactional * chore: cleanup * test: add integration tests for new catalog update functions * chore: remove catalog_update.rs * chore: cleanup * fix: assert the right values * fix: create unique namespace * fix: support non transaction create_many * test: remove tests that do not work in a transaction * fix: one more case with unique namespace * chore: more verification around for better understanding why certain tests fail * fix: compare difference rather than absolute becasue the DB already has data * fix: fix the argument provided to SQL * fix: return non-empty processed tombstones * fix: insert the right parquet file * chore: remove unsed file Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-07 14:44:15 +00:00
Carol (Nichols \|\| Goulding)	62a2ad289b	feat: Implement deserializing IoxMetadata from protobuf (#3589 ) Fixes #3587. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-02 16:05:21 +00:00
Marco Neumann	22778a3a80	chore: upgrade rskafka and parking_lot (#3592 )	2022-02-01 11:50:42 +00:00
Carol (Nichols \|\| Goulding)	093d5acfd4	fix: Unify temporary multiple definitions of IoxMetadata	2022-01-31 10:48:29 -05:00
Carol (Nichols \|\| Goulding)	8f81ce5501	refactor: Share parquet_file::storage code between new and old metadata	2022-01-31 10:36:33 -05:00
Carol (Nichols \|\| Goulding)	bf89162fa5	refactor: Move IoxMetadata to parquet_file	2022-01-31 10:36:33 -05:00
Carol (Nichols \|\| Goulding)	0f72a881ef	refactor: Rename Rust struct parquet_file::IoxMetadata to be IoxMetadataOld	2022-01-31 10:36:33 -05:00
Carol (Nichols \|\| Goulding)	1b298bb5bd	refactor: Alias the old proto definitions to make clearer the new ones coming in	2022-01-31 10:36:33 -05:00
Dom	32d7c4cbfe	refactor: remove InfluxColumnType::IOx (#3565 ) * refactor: remove InfluxColumnType::IOx Remove unused column variant - see #3554 for context. * refactor: reserve SEMANTIC_TYPE_IOX name in proto Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-01-27 21:15:36 +00:00
Andrew Lamb	5488c257d1	chore: Update datafusion, upgrade to arrow/parqet/arrow-flight 8.0.0 (#3517 ) * chore: Update datafusion * chore: update to arrow 8 * fix: update to use new DataFusion APIs * fix: update case for sortedness * fix: cargo hakari	2022-01-27 13:33:27 +00:00
Andrew Lamb	dd23056efd	chore: update datafusion, arrow, prost, tonic, pbjson, etc (#3455 ) * chore: update datafusion, arrow, prost, tonic, etc * fix: update pprof as well * chore: update hakari * fix: update pbjson * chore: update heappy * fix: hakari * fix: workaround https://github.com/influxdata/influxdb_iox/issues/3458 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-01-13 17:07:15 +00:00
Andrew Lamb	cdf5c21cd4	fix: Fix max timestamp value comparison in chunk metadata (#3453 ) * fix: Fix max timestamp value comparison in chunk metadata * refactor: rename contains to overlaps Co-authored-by: Edd Robinson <me@edd.io>	2022-01-13 16:58:30 +00:00
Raphael Taylor-Davies	c5cf03511c	fix: parquet column count statistics (#2124 ) (#3444 ) * fix: parquet metadata total_count (#2124) * chore: review feedback Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-01-11 21:56:24 +00:00
Marco Neumann	f3f6f335a9	chore: upgrade to snafu 0.7 (#3440 )	2022-01-11 19:22:36 +00:00
Marco Neumann	37bb7f2120	chore: `cargo update` dependabot currently doesn't work due to https://github.com/dependabot/dependabot-core/issues/4574 Excluded `quote` due to https://github.com/dtolnay/quote/issues/204	2022-01-11 14:57:51 +01:00
Nga Tran	ec8644a39a	refactor: return clearer error message	2021-12-07 12:24:28 -05:00
Nga Tran	561c5ed8e7	refactor: make checking no data happen during reading inout stream	2021-12-07 12:03:41 -05:00
Nga Tran	c992c82582	chore: Merge branch 'main' into ntran/compact_os_tests	2021-12-07 11:08:12 -05:00
Raphael Taylor-Davies	5fdaa5b4ab	chore: don't panic with invalid parquet (#3309 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-12-06 21:15:35 +00:00
Carol (Nichols \|\| Goulding)	7499eac067	fix: Disable uuid serde feature; we're not actually serializing any UUIDs Connects to #3117.	2021-12-06 09:37:31 -05:00
Carol (Nichols \|\| Goulding)	02c297e850	fix: Always specify the parking_lot feature of tokio to get potential perf boost	2021-12-06 09:37:15 -05:00
Carol (Nichols \|\| Goulding)	0b24b3c227	fix: Use a consistent version specifier when depending on the futures crate	2021-12-06 09:37:12 -05:00

1 2 3 4 5 ...

377 Commits (5d66cd0a81d2533ee174e1bfb56bab86f81c1fed)