influxdb

Commit Graph

Author	SHA1	Message	Date
dependabot[bot]	6a734e62a9	chore(deps): Bump tonic-build from 0.8.0 to 0.8.2 (#5759 ) Bumps [tonic-build](https://github.com/hyperium/tonic) from 0.8.0 to 0.8.2. - [Release notes](https://github.com/hyperium/tonic/releases) - [Changelog](https://github.com/hyperium/tonic/blob/master/CHANGELOG.md) - [Commits](https://github.com/hyperium/tonic/compare/v0.8.0...v0.8.2) --- updated-dependencies: - dependency-name: tonic-build dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dom <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-29 10:18:22 +00:00
dependabot[bot]	354d0f1e0e	chore(deps): Bump tonic from 0.8.1 to 0.8.2 (#5762 ) Bumps [tonic](https://github.com/hyperium/tonic) from 0.8.1 to 0.8.2. - [Release notes](https://github.com/hyperium/tonic/releases) - [Changelog](https://github.com/hyperium/tonic/blob/master/CHANGELOG.md) - [Commits](https://github.com/hyperium/tonic/compare/v0.8.1...v0.8.2) --- updated-dependencies: - dependency-name: tonic dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dom <dom@itsallbroken.com>	2022-09-29 10:10:40 +00:00
Dom	71cab5a1a6	Merge pull request #5764 from influxdata/dom/remove-after-watermark-check fix: remove future offset read check	2022-09-29 10:53:25 +01:00
Dom Dwyer	5f49c568c9	fix: remove future offset read check In https://github.com/influxdata/influxdb_iox/pull/5754 I added code at seek() time to check if the offset exists, and refuse to seek if that's not the case, effectively making this check redundant - I left it in on the assumption that some cases previously added would work! Unfortunately this doesn't seem to be the case - performing a read-ahead-of-data and read-behind-data seems to cause the high_watermark to be returned as -1, meaning this code never worked?! This new read-ahead-of-data match arm took priority over the SequenceNumberNoLongerExists arm, effectively preventing the ingester from taking the desired remediation (skipping to most recent write, or erroring, depending on configuration).	2022-09-29 11:39:57 +02:00
kodiakhq[bot]	58d1b9f4e3	Merge pull request #5743 from influxdata/cn/hot-cold-generation feat: Generate the requested num rows and compaction type	2022-09-28 18:51:05 +00:00
Carol (Nichols \|\| Goulding)	8d90181e67	fix: Make sure generated level 1 files don't overlap The data generator is inclusive in start and end times, so don't use the same end time as the next level 1 file's start time to prevent overlap.	2022-09-28 14:41:00 -04:00
Carol (Nichols \|\| Goulding)	72203b3b31	feat: Generate the number of rows across the cold time range	2022-09-28 14:34:27 -04:00
Carol (Nichols \|\| Goulding)	21488ac360	feat: Generate the number of rows across the "hot" time range	2022-09-28 14:34:27 -04:00
Carol (Nichols \|\| Goulding)	2c70cd0cd0	refactor: Only pass along the one relevant config field	2022-09-28 14:34:27 -04:00
kodiakhq[bot]	03940d0750	Merge pull request #5747 from influxdata/dom/partition-cache perf(ingester): partition ID + persist marker cache	2022-09-28 15:29:59 +00:00
kodiakhq[bot]	54e68637dc	Merge branch 'main' into dom/partition-cache	2022-09-28 15:22:40 +00:00
kodiakhq[bot]	49427c79cc	Merge pull request #5754 from influxdata/dom/write-buffer-watermark-nonsense fix(write_buffer): spurious watermark < read offset panic	2022-09-28 14:52:39 +00:00
Dom Dwyer	82b7479f97	refactor(write_buffer): seek error at seek time Moves the "you've tried to seek into the future!" error to the point at which the seek attempt was made. This makes more sense than deferring the seek error until read time, and is easier to determine this is the case rather than at read time (where the read response error contains an invalid high_watermark value of -1, making it impossible to conclusively determine what has happened).	2022-09-28 16:44:59 +02:00
Dom Dwyer	5f2f735c7e	fix: spurious watermark < read offset panic In staging we observed an ingester panic due to the write buffer stream yielding an WriteBufferErrorKind::SequenceNumberAfterWatermark, suggesting the ingester was attempting to read from an offset that exceeds the current max write offset in Kafka (high watermark offset). This turned out not to be the case - the partition had a single write at offset 2, and the ingester was attempting to seek to offset 1. The first read would fail (offset 1 does not exist) and the error handling did not account for the high watermark not being correctly set (-1 in the response). I have no idea why rskafka returns this watermark / doesn't retry / etc but this change will allow the ingesters to recover.	2022-09-28 15:22:34 +02:00
Andrew Lamb	13ed1c089a	feat: use /api/v2 upload for write command rather than grpc `write_service` (#5749 ) * feat: use /api/v2 upload for write command rather than grpc service * fix: Update influxdb_iox/src/commands/write.rs	2022-09-28 11:16:51 +00:00
Dom Dwyer	8cf81f457a	perf(ingester): amortise Partition cache memory Remove each cache hit from the partition cache, as each partition should be looked up at most once. This amortises the memory usage of the cache, as it should be "drained" of hot partitions.	2022-09-27 17:16:18 +02:00
Dom Dwyer	1311a8746d	refactor(ingester): use Partition cache Cache the 10,000 most recent partitions at startup, and share them across all shards. At commit time, there are approx ~8,000 partitions per day, per ingester, so this should cache all of the partitions for a given day so far at startup.	2022-09-27 17:15:59 +02:00
Dom Dwyer	2068ff394b	perf(ingester): cache Partition This commit implements a PartitionCache decorator over the PartitionProvider abstraction. When an ingester starts up, the internal data structures are empty and are lazily initialised for each namespace / table / partition as they are observed in the stream of DML ops. This lazy initialisation includes resolving the Partition ID and last persisted sequence number offset value from the catalog for each partition in each table in each namespace for which an op is observed - this occurs in the hot path, while blocking ingest for a shard. resolving each partition will cause a catalog query, this can cause a spike in queries against the catalog, also resulting in unnecessarily slow ingester recovery - we're effectively lazily warming a cache of PartitionData in the hot path! Instead this cache can be used to pre-warm the N most recently created partitions (which are likely to have ongoing writes) at startup to eliminate the hot-path overhead and associated catalog queries. NOTE: unlike most of the other hot-path queries, partition persist offset resolution cannot be eliminated by changes to the Kafka wire format.	2022-09-27 17:15:57 +02:00
Dom Dwyer	a3d6e7a45a	refactor(ingester): server-wide PartitionProvider Lifts the PartitionProvider initialisation higher in the stack to a point where a single instance can be used across all shards an ingester manages. This is a pre-requisite for sharing a cache of Partitions across all shards.	2022-09-27 17:15:31 +02:00
Dom Dwyer	38ebd5fb20	test: simplify partition provider mock Removes redundant fields from the MockPartitionProvider.	2022-09-27 17:11:13 +02:00
Dom	48c6ff0e09	Merge pull request #5746 from influxdata/dom/catalog-most-recent-partitions feat(catalog): most recent N partitions	2022-09-27 16:05:58 +01:00
Dom Dwyer	e19b88cae9	feat(catalog): most recent N partitions Adds a Partition::most_recent_n() method to the catalog interface, returning the N most recent partitions for a given set of shards. The most recently created partitions are likely to be currently "hot" for writes, and are cheap to list.	2022-09-27 16:22:00 +02:00
kodiakhq[bot]	10c315d6ce	Merge pull request #5745 from influxdata/dom/non-pub-shard refactor(ingester): limit visibility of internals	2022-09-27 13:32:33 +00:00
kodiakhq[bot]	42644e9e97	Merge branch 'main' into dom/non-pub-shard	2022-09-27 13:25:03 +00:00
Dom	8ec7421e20	Merge pull request #5744 from influxdata/dependabot/cargo/syn-1.0.101 chore(deps): Bump syn from 1.0.100 to 1.0.101	2022-09-27 14:24:51 +01:00
Dom	2ef04e99da	Merge branch 'main' into dom/non-pub-shard	2022-09-27 14:21:23 +01:00
dependabot[bot]	d93a61e397	chore(deps): Bump syn from 1.0.100 to 1.0.101 Bumps [syn](https://github.com/dtolnay/syn) from 1.0.100 to 1.0.101. - [Release notes](https://github.com/dtolnay/syn/releases) - [Commits](https://github.com/dtolnay/syn/compare/1.0.100...1.0.101) --- updated-dependencies: - dependency-name: syn dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2022-09-27 12:52:45 +00:00
Andrew Lamb	66dbb9541f	chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to 23.0.0, `thrift` to 0.16.0 (#5694 ) * chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to 23.0.0 * chore: Update thrift / remove parquet_format * fix: Update APIs * chore: Update lock + Run cargo hakari tasks * fix: use patched version of arrow-rs to work around https://github.com/apache/arrow-rs/issues/2779 * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-27 12:50:54 +00:00
Dom Dwyer	b873297fad	refactor(ingester): limit visibility Marks many internal data structures as non-pub. Many remain as they're used across tests / from multiple callers "peeking", but this limits the scope of false sharing in the future.	2022-09-27 14:27:32 +02:00
Dom Dwyer	11be746dc0	refactor: internalise ShardData init Move the initialisation of ShardData (an internal ingester data structure) into the ingester itself. Previously callers would initialise the ingester state, and pass it into the IngesterData constructor.	2022-09-27 14:26:17 +02:00
kodiakhq[bot]	8139d01d08	Merge pull request #5658 from influxdata/dom/partition-provider refactor(ingester): partition provider abstraction	2022-09-27 12:23:04 +00:00
Dom	74a9672c07	Merge branch 'main' into dom/partition-provider	2022-09-27 13:15:52 +01:00
Nga Tran	75ff805ee2	feat: instead of adding num_files and memory budget into the reason text column, let us create differnt columns for them. We will be able to filter them easily (#5742 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-26 20:14:04 +00:00
Nga Tran	84b10b28b2	feat: send only needed projection columns from querier to ingester in… (#5678 ) * feat: send only needed projection columns from querier to ingester in case of normal SQL queries * refactor: push column index down until we need to convert them strings * fix: make the test deterministic * test: test for the projection pushdown * test: add asserts for the proj pushdown test * test: implement projection pushdown for partitions of MockIngesterConnection * chore: cleanup * chore: address review comments * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: address review comments Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-26 17:19:20 +00:00
kodiakhq[bot]	fed67633e2	Merge pull request #5729 from influxdata/cn/generate-parquet-to-compact feat: Start of a tool to generate files to compact	2022-09-26 16:22:31 +00:00
kodiakhq[bot]	87d0d19b0f	Merge branch 'main' into cn/generate-parquet-to-compact	2022-09-26 16:15:17 +00:00
Carol (Nichols \|\| Goulding)	e1210439bd	fix: Clear out data generation directory on every run	2022-09-26 12:14:36 -04:00
dependabot[bot]	b1740f45d6	chore(deps): Bump thiserror from 1.0.35 to 1.0.36 (#5737 ) Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.35 to 1.0.36. - [Release notes](https://github.com/dtolnay/thiserror/releases) - [Commits](https://github.com/dtolnay/thiserror/compare/1.0.35...1.0.36) --- updated-dependencies: - dependency-name: thiserror dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-09-26 14:44:36 +00:00
Andrew Lamb	65f1550126	feat: Implement `debug parquet_to_lp` command to convert parquet to line protocol (#5734 ) * feat: add `influxdb_iox debug parquet_to_lp` command * chore: Run cargo hakari tasks * fix: update command description * fix: remove unecessary Result import * fix: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com>	2022-09-26 14:17:27 +00:00
Dom Dwyer	61aecc3044	refactor: decouple partition init from table Removes the "how" of initialising a per-partition buffer structure (PartitionData) from the per-table buffer (TableData). This is a cleaner separation of concerns - a table buffer is responsible for addressing and initialising per-table partitions as necessary, and buffering of ops for them. It does not have to be concerned with the series of steps necessary to look up the various bits of data in order to construct a PartitionData. This abstract provider can be layered up to provide more complex behaviours - I intend to add a read-through cache impl that decorates the catalog impl in this commit, which should eliminate most partition queries at ingester startup utilising the indirection added here.	2022-09-26 14:35:15 +02:00
Nga Tran	b11da1d98b	fix: a silly bug that did not capture file limit if a lot of L0 files and very few or non overlapped L1 (#5736 )	2022-09-23 21:03:29 +00:00
Nga Tran	c4542d6b21	chore: more verbose about the memory budget inserted in to the catalog table skipped_comapction (#5735 )	2022-09-23 18:40:09 +00:00
Nga Tran	bb7df22aa1	chore: always use a fixed number of rows (8192) per batch to estimate memory (#5733 )	2022-09-23 15:51:25 +00:00
Carol (Nichols \|\| Goulding)	8dfdd73533	feat: Write the specified number of line protocol files	2022-09-23 11:45:48 -04:00
Carol (Nichols \|\| Goulding)	e752684378	feat: Run the data generator with the generated spec	2022-09-23 11:45:48 -04:00
Carol (Nichols \|\| Goulding)	483c1da666	fix: Make org and bucket optional again if writing points to files/stdout/no-op	2022-09-23 11:45:46 -04:00
Carol (Nichols \|\| Goulding)	ea39e760e8	feat: Write out a data generator spec for one file of line protocol	2022-09-23 11:21:20 -04:00
Carol (Nichols \|\| Goulding)	5c458d439a	feat: Create a data generation spec file	2022-09-23 11:16:37 -04:00
Carol (Nichols \|\| Goulding)	febf46f3b5	refactor: Extract compactor generate to its own module file	2022-09-23 11:16:36 -04:00
Carol (Nichols \|\| Goulding)	61df629265	test: Failing test for actually creating parquet files	2022-09-23 11:16:31 -04:00

1 2 3 4 5 ...

9231 Commits (6a734e62a94fe2747427fd8d3384cbe4e6466132) All Branches Search

9231 Commits (6a734e62a94fe2747427fd8d3384cbe4e6466132)

All Branches