influxdb

Commit Graph

Author	SHA1	Message	Date
Andrew Lamb	bfddb032ce	docs: improve docs for `persist_partition_size_threshold_bytes` / `INFLUXDB_IOX_PERSIST_PARTITION_SIZE_THRESHOLD_BYTES` (#4877 ) * docs: improve docs for `persist_partition_size_threshold_bytes` / `INFLUXDB_IOX_PERSIST_PARTITION_SIZE_THRESHOLD_BYTES` * docs: improve comments about LifecycleConfig::partition_size_threshold Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-27 21:52:40 +00:00
Ryan Russell	77a4246432	docs: Readability improvements (#4946 ) Signed-off-by: Ryan Russell <git@ryanrussell.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-27 21:46:18 +00:00
kodiakhq[bot]	b28cecf26e	Merge pull request #4951 from influxdata/dom/schema-api refactor(schema-api): column data type enum	2022-06-27 21:40:07 +00:00
kodiakhq[bot]	c22aed4347	Merge branch 'main' into dom/schema-api	2022-06-27 21:34:07 +00:00
Marco Neumann	215f297162	refactor: parquet file metadata from catalog (#4949 ) * refactor: remove `ParquetFileWithMetadata` * refactor: remove `ParquetFileRepo::parquet_metadata` * refactor: parquet file metadata from catalog Closes #4124.	2022-06-27 15:38:39 +00:00
Marco Neumann	9b8086df74	fix: size estimates (#4950 ) * fix: `Tombstone::size` must include serialized predicate * fix: `CachedPartition::size` must include `Arc` heap allocation Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-27 15:25:32 +00:00
Marco Neumann	1a74f84494	refactor: remove `ParquetFileWithMetadata` usage outside the catalog (#4948 ) * refactor: remove `DecodedParquetFile` from `iox_tests` * refactor: remove `DecodedParquetFile` from querier Also pull out all the chunk schema and sort key handling into a function so that RB chunks and parquet chunks mostly use the same code path. * refactor: remove `DecodedParquetFile` * refactor: remove `ParquetFileWithMetadata` usage * fix: test data consistency	2022-06-27 15:19:29 +00:00
Dom	e84529af2c	Merge branch 'main' into dom/schema-api	2022-06-27 16:18:21 +01:00
Dom Dwyer	75c425f375	refactor(schema-api): column data type enum Previously the column data type was exposed using an internal i32 value. This commit changes the Schema API to use a self-descriptive proto enum for the column data type.	2022-06-27 16:14:49 +01:00
Marco Neumann	3b78bf1c48	refactor: remove binary parquet file MD from compactor (#4938 ) * refactor: simplify sort key calculation * refactor: use schema from catalog instead from file * refactor: do not request parquet file MD in compactor * test: ensure that `QueryableParquetChunk` works correctly	2022-06-27 15:11:15 +00:00
Marco Neumann	b9cbb3dfca	refactor: do not use in-parquet IOx metadata in compactor () (#4935 ) refactor: avoid feeding sort key from struct into same struct * feat: allow namespace schema query by ID * refactor: do not use binary parquet file MD in compactor tests * refactor: do not use in-parquet IOx metadata * refactor: reduce number of catalog queries	2022-06-27 08:06:11 +00:00
dependabot[bot]	7546476e15	chore(deps): Bump smallvec from 1.8.0 to 1.8.1 (#4947 ) Bumps [smallvec](https://github.com/servo/rust-smallvec) from 1.8.0 to 1.8.1. - [Release notes](https://github.com/servo/rust-smallvec/releases) - [Commits](https://github.com/servo/rust-smallvec/compare/v1.8.0...v1.8.1) --- updated-dependencies: - dependency-name: smallvec dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-27 07:17:32 +00:00
Nga Tran	92eeb5b232	chore: remove unused sort_key_old from catalog partition (#4944 ) * chore: remove unused sort_key_old from catalog partition * chore: add new line at the end of the SQL file	2022-06-24 15:02:38 +00:00
Marco Neumann	994bc5fefd	refactor: ensure that SQL parquet file column sets are not NULL (#4937 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-24 14:26:18 +00:00
Nga Tran	3c0fb6e8ef	fix: avoid using min_time, which can be negative, for ChunkId. Using object store id which is uuid instead (#4942 ) * fix: avoid using min_time, which can be negative, for ChunkId. Using object store id which is uuid instead * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * chore: run fmt Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-23 19:00:13 +00:00
Nga Tran	35dacf388b	feat: Compact now can split compacted results into multiple non-overlapped files based on config max file size (#4918 ) * feat: split times of compacting results based on the max file size * feat: cosider max file size while computing split time * test: tests for comput_split_time * feat: first step to teach the function split_the_steam to know how to split data into n streams using n-1 input PhysicalExprs * feat: make StreamSplitNode support a list of expression * docs: explain how StreamSplitNode works * feat: Teach compute_split_time to split a time range into many contiguous ranges and split compacted result into multiple non-overlapped files based on the config comapction_max_size_bytes * chore: cleanup * chore: clean up doc * chore: address review comments Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-23 18:54:03 +00:00
kodiakhq[bot]	ce906354f9	Merge pull request #4941 from influxdata/dom/persist-memory-accounting fix: account for partition memory until persist is completed	2022-06-23 17:36:02 +00:00
Andrew Lamb	49b34e1135	test: add appropriate tests	2022-06-23 11:50:55 -04:00
Andrew Lamb	fb4c3ed294	fix: revert test change	2022-06-23 11:34:59 -04:00
Dom Dwyer	9a79d16585	fix: account for partition memory until persisted The ingester maintains a rough "total memory in use" counter it uses to try and limit the amount of memory the ingester is using overall. When a partition is persisted, this total memory usage value is adjusted to account for releasing the partition memory. Prior to this commit, the ordering was: * Writes increase the memory counter * maybe_persist() is called to trigger persistence * A partition is identified for persistence * Partition memory usage is released back to the total memory counter * Persistence starts This meant that the partitions in the process of being persisted were not accounted for in the ingester's total memory counter, and therefore we could significantly overrun the configured memory limit. After this commit, the ordering is: * Writes increase the memory counter * maybe_persist() is called to trigger persistence * A partition is identified for persistence * Persistence starts * Persistence completes * Partition memory usage is released back to the total memory counter This ensures persisting partitions are sill tracked in the total memory counter, causing pauses to correctly fire.	2022-06-23 15:40:51 +01:00
Marco Neumann	bd6c4659af	refactor: slim down parquet chunk (remove Metadata) (#4934 ) * feat: conversion from `ParquetFile` to `ParquetFilePath` * refactor: slim down parquet chunk - ensure it works without binary parquet metadata - timestamp range is no longer optional (ensured by the NG type system) - remove table summary: this is only needed for SOME API users. The compactor can perfectly work without statistics since has the timestamp range which is sufficient for the current overlap check (we don't use any other primary key stats at the moment). The querier currently does NOT use parquet chunks (was replaced by read buffer) but if it will again in some future it will likely need to find a way to fetch and cache the statistics. - the schema is now provided by the API user since it can be reconstructed using the NG catalog only (and "wrong" column orders are tolerated as of #4921) Ref #4124 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-23 10:55:16 +00:00
kodiakhq[bot]	fd5aa201d2	Merge pull request #4933 from influxdata/dom/remove-unnecessary-errors refactor: remove unused errors	2022-06-23 10:35:25 +00:00
Dom Dwyer	87af3848d1	refactor: remove unused errors These errors are not referenced, but are hidden from the "unused" lint because of the macro magic code generation.	2022-06-23 11:24:30 +01:00
Andrew Lamb	16c558e11e	refactor: Make some structures in `LifecycleManager` non pub (#4929 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-23 09:55:39 +00:00
dependabot[bot]	6f9d8b54cf	chore(deps): Bump integer-encoding from 3.0.3 to 3.0.4 (#4932 ) Bumps [integer-encoding](https://github.com/dermesser/integer-encoding-rs) from 3.0.3 to 3.0.4. - [Release notes](https://github.com/dermesser/integer-encoding-rs/releases) - [Commits](https://github.com/dermesser/integer-encoding-rs/compare/v3.0.3...v3.0.4) --- updated-dependencies: - dependency-name: integer-encoding dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-23 09:42:34 +00:00
Andrew Lamb	776c34e03d	chore: Update datafusion (#4927 ) * chore: Update datafusion * fix: Update for API changes Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-23 09:30:43 +00:00
Andrew Lamb	47def89670	docs: Update tracing.md for NG (#4916 ) tracing instructions referred to OG -- update for NG Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-23 09:24:38 +00:00
Marco Neumann	463d430d43	refactor: do not fetch parquet MD from catalog in querier (#4926 ) Ref #4124	2022-06-23 09:03:19 +00:00
Marco Neumann	4b7d02fad1	feat: do not rely on encoded parquet metadata for RB chunks (#4924 ) * fix: use proper sort key in tests * feat: do not rely on encoded parquet metadata for RB chunks Ref #4124. * refactor: allocate less strings * refactor: use upstream PK calculation * fix: cache expiration w/o a good reason * refactor: make namespace cache safer to use * refactor: make partition cache safer to use	2022-06-23 08:55:52 +00:00
Marco Neumann	c899c3a0f4	fix: column handling when reading parquet files (#4921 ) * fix: column handling when reading parquet files This improves/fixes/tests a few aspects when reading parquet files: - fix usage of `Selection::Some(...)`. This was broken since #4912 but apparently no test caught that. - ensure that the order of `Selection::Some(...)` is preserved - ensure that schema metadata is attached to output batches - ignore parquet columns that we don't care about (i.e. do not select) - allow parquet file to have a different column order than our internal bookkeeping, this makes it way simpler to read parquet files w/o scanning the metadata first - extend the test coverage Ref #4124. * test: even more tests for parquet reader	2022-06-22 13:51:30 +00:00
Marco Neumann	0534b80886	fix: `ParquetFile::size` must include column set (#4925 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-22 13:06:02 +00:00
Marco Neumann	9591bed696	refactor: make querier internals private (#4922 ) Queries internals are not meant to be used by other crates. Only a handful selected interfaces should be used by IOxD and the query tests. The compactor only used a very small subset just to read parquet files back into memory. It shall rather use the official `parquet_file` interface instead. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-22 13:00:08 +00:00
Marco Neumann	751bdce88a	fix: pass write buffer tests w/o Kafka (#4923 ) Fixes interaction of `maybe_skip_kafka_integration!` and `should_panic` by ensuring that `maybe_skip_kafka_integration!` panics to skip `should_panic` tests. Without that it is not possible to just run `cargo test -p write_buffer`. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-22 10:41:40 +00:00
dependabot[bot]	f7d83ea581	chore(deps): Bump clap from 3.2.5 to 3.2.6 (#4920 ) Bumps [clap](https://github.com/clap-rs/clap) from 3.2.5 to 3.2.6. - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.5...v3.2.6) --- updated-dependencies: - dependency-name: clap dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-22 10:28:44 +00:00
dependabot[bot]	50ffca791a	chore(deps): Bump indexmap from 1.9.0 to 1.9.1 (#4919 ) Bumps [indexmap](https://github.com/bluss/indexmap) from 1.9.0 to 1.9.1. - [Release notes](https://github.com/bluss/indexmap/releases) - [Changelog](https://github.com/bluss/indexmap/blob/master/RELEASES.md) - [Commits](https://github.com/bluss/indexmap/compare/1.9.0...1.9.1) --- updated-dependencies: - dependency-name: indexmap dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-22 10:18:06 +00:00
kodiakhq[bot]	95bd8b41e4	Merge pull request #4914 from influxdata/dom/ingester-uses-partition-key refactor: use propagated partition key in ingester	2022-06-22 09:53:44 +00:00
kodiakhq[bot]	aff5e6d69a	Merge branch 'main' into dom/ingester-uses-partition-key	2022-06-22 09:47:49 +00:00
Andrew Lamb	087dbd3eca	fix: fix heappy + update docs (#4917 ) * docs: Update heap profiling documentation * fix: fix heappy builds * fix: do not run cli tests with heappy Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-21 19:53:28 +00:00
Marco Neumann	59accfe862	refactor: assorted fixes and prep work for #4124 (#4912 ) * refactor: `TestPartition::update_sort_key` should return an `Arc` The whole test framework is built around `Arc`s, so let's fix this consistency issue. * fix: actually calculate correct column set in test framework * feat: check expected parquet file schema While working on the querier I made some mistakes regarding schemas and such a check would have greatly improved the debugging experience. * feat: namespace cache expiration * fix: improve parquet schema check * fix: remove clone	2022-06-21 16:08:28 +00:00
Dom Dwyer	75a3fd5e1e	refactor: use propagated partition key in ingester Changes the ingester to use the partition key derived in the router, and transmitted over through the kafka API boundary. This should have no observable behavioural change, but be more resilient as we're no longer assuming the partitioning algorithm produces the same value in both the router (where data is partitioned) and the ingester (where data is persisted, segregated by partition key). This is a pre-requisite to allowing the user to specify partitioning schemes.	2022-06-21 15:57:30 +01:00
Marco Neumann	70337087a8	refactor: do not require parquet metadata for RB cache (#4911 ) * test: add `TestParquetFile::schema` * refactor: do not require parquet metadata for RB cache Ref #4124.	2022-06-21 12:59:23 +00:00
Marco Neumann	db24838221	refactor: remove table name from read buffer (#4910 ) The low-level chunk storage shouldn't care about the table name (this is also true for parquet chunks btw). In fact, the table name is already only a partial information since it misses the namespace. If we need a table name, then the high-level chunk/data management is responsible for that. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-21 11:57:28 +00:00
Marco Neumann	0f63be26c3	refactor: pass path instead of metadata around to load parquet files (#4909 )	2022-06-21 10:57:10 +00:00
Marco Neumann	c3912e34e9	refactor: store per-file column set in catalog (#4908 ) * refactor: store per-file column set in catalog Together with the table-wide schema and the partition-wide sort key, this should be everything we need to read a parquet file directly into memory without peeking any file-level metadata. The querier will use this to directly load parquet files into the read buffer. WARNING: This requires a catalog wipe! Ref #4124. * refactor: use proper `ColumnSet` type	2022-06-21 10:26:12 +00:00
Dom	4df710a205	Merge pull request #4907 from influxdata/dom/propagate-partition-key feat: propagate partition key through kafka	2022-06-20 14:07:50 +01:00
Dom Dwyer	c1f7154031	feat: propagate partition key through kafka Changes the kafka message wire format to include the partition key for serialised DML writes on the wire. After this commit, the kafka messages will contain the partition key for each op, but this information will go unused in the ingester - this enables us to roll out the producer side, before making the value's presence necessary on the consumer side. A follow-up PR will change the ingester to utilise this embedded partition key. This has the unfortunate side effect of making the partition key part of the public gRPC write API: https://github.com/influxdata/influxdb_iox/issues/4866	2022-06-20 13:42:51 +01:00
Marco Neumann	1962fcc229	chore: reduce dependencies and run `cargo update` (#4906 ) * chore: reduce proptest features * chore: remove `grpc-router` This crate is currently unused and we don't have immediate plans to use it. And there's GIT, so it can always be restored. * chore: `cargo update`	2022-06-20 12:18:28 +00:00
Marco Neumann	730f85a619	refactor(querier): split ingester partitions into chunks (#4893 ) * refactor(querier): split ingester partitions into chunks With the new wire protocol the ingester can now transmit multiple snapshots per partition with different schemas. This changes the querier to reflect this and and splits uses the individual snapshots as chunks for the query engine instead of a single partition. The schema handling was changed so that instead of a table-wide schema enforcement, we now use the snapshot-specific projections. This means we do not need to create all-NULL columns any longer because the batches within the chunks now always have the correct schema. * refactor: "disassembler" -> "decoder"	2022-06-20 08:58:58 +00:00
dependabot[bot]	cc324613cc	chore(deps): Bump syn from 1.0.96 to 1.0.98 (#4905 ) Bumps [syn](https://github.com/dtolnay/syn) from 1.0.96 to 1.0.98. - [Release notes](https://github.com/dtolnay/syn/releases) - [Commits](https://github.com/dtolnay/syn/compare/1.0.96...1.0.98) --- updated-dependencies: - dependency-name: syn dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-20 08:49:02 +00:00
Andrew Lamb	f151b1e89f	fix: categorize `NamespaceNotFound` as ingester not found errors as well (#4899 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-20 08:40:31 +00:00

... 3 4 5 6 7 ...

8510 Commits (d7838e357f7c1a5135ddf0e9d9d83dd037b4a466) All Branches Search

8510 Commits (d7838e357f7c1a5135ddf0e9d9d83dd037b4a466)

All Branches