Commit Graph

9088 Commits (85d6efafe19627a0107a0d044490e9fcaaa4dbd6)

Author SHA1 Message Date
Dom Dwyer 85d6efafe1 refactor: snapshot_to_persisting redundant ID
Partition::snapshot_to_persisting() passes the ID of the partition it is
calling `snapshot_to_persisting()` on. The partition already knows what
its ID is, so at best it's redundant, and at worst, inconsistent with
the actual ID.
2022-09-16 17:08:08 +02:00
dependabot[bot] 099dda430e
chore(deps): Bump digest from 0.10.3 to 0.10.5 (#5655)
Bumps [digest](https://github.com/RustCrypto/traits) from 0.10.3 to 0.10.5.
- [Release notes](https://github.com/RustCrypto/traits/releases)
- [Commits](https://github.com/RustCrypto/traits/compare/digest-v0.10.3...digest-v0.10.5)

---
updated-dependencies:
- dependency-name: digest
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom <dom@itsallbroken.com>
2022-09-16 14:03:54 +00:00
kodiakhq[bot] 2c9f9ec52b
Merge pull request #5657 from influxdata/dom/per-partition-read-offset
perf: O(1) partition persist mark discovery
2022-09-16 12:13:31 +00:00
Dom Dwyer ce0d189260 perf: O(1) partition persist mark discovery
Changes the ingest code path to eliminate scanning the parquet_files
table to discover the last persisted offset per partition, instead
utilising the new persisted_sequence_number field on the Partition
itself to read the same value.

This lookup blocks ingest for the shard, so removing the expensive query
from the ingest hot path should improve catch-up time after a
restart/deployment.
2022-09-16 14:06:42 +02:00
kodiakhq[bot] 69c9e7b5ff
Merge pull request #5650 from influxdata/cn/partition-estimates-size
refactor: Clear up responsibilities of different parts of the compactor
2022-09-15 18:50:50 +00:00
kodiakhq[bot] 1c0b6997c1
Merge branch 'main' into cn/partition-estimates-size 2022-09-15 18:43:36 +00:00
Carol (Nichols || Goulding) f5497a3a3d
refactor: Extract a conversion for convenience in tests 2022-09-15 12:48:36 -04:00
kodiakhq[bot] 609707c2d5
Merge pull request #5652 from influxdata/dom/nullable-partition-persist
refactor(db): NULLable persisted_sequence_number
2022-09-15 16:27:37 +00:00
kodiakhq[bot] 4fe5311d8b
Merge branch 'main' into dom/nullable-partition-persist 2022-09-15 16:20:54 +00:00
Dom Dwyer 66bf0ff272 refactor(db): NULLable persisted_sequence_number
Makes the partition.persisted_sequence_number column in the catalog DB
NULLable. 0 is a valid persisted sequence number.
2022-09-15 18:19:39 +02:00
Marco Neumann e346433914
refactor: concurrent table scan for "table names" (#5649)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-15 15:39:00 +00:00
Carol (Nichols || Goulding) dcab9d0ffc
refactor: Combine relevant data with the FilterResult state
This encodes the result directly and has the FilterResult hold only the
relevant data to the state. So no longer any need to create or check for
empty vectors or 0 budget_bytes. Also creates a new type after checking
the filter result state and handling the budget, as actual compaction
doesn't need to care about that.

This could still use more refactoring to become a clearer pipeline of
different states, but I think this is a good start.
2022-09-15 11:13:18 -04:00
Carol (Nichols || Goulding) e57387b8e4
refactor: Extract an inner function so partition isn't needed in tests 2022-09-15 11:10:14 -04:00
Carol (Nichols || Goulding) a284cebb51
refactor: Store estimated bytes on the CompactorParquetFile 2022-09-15 11:10:14 -04:00
Carol (Nichols || Goulding) 70094aead0
refactor: Make estimating bytes a responsibility of the Partition
Table columns for a partition don't change, so rather than carrying
around table columns for the partition and parquet files to look up
repeatedly, have the `PartitionCompactionCandidateWithInfo` keep track
of its column types and be able to estimate bytes given a number of rows
from a parquet file.
2022-09-15 11:10:14 -04:00
kodiakhq[bot] f718cfd71c
Merge pull request #5648 from influxdata/dom/per-partition-persist-markers
feat: store per partition persist markers
2022-09-15 14:59:33 +00:00
Dom Dwyer f4cc9a6984 docs: partition persist visibility invariants
Document the invariants (and non-invariants) of
Partition.persisted_sequence_number.
2022-09-15 16:10:35 +02:00
Dom Dwyer 234d460fcb chore: rename update_persisted_sequence_number fn 2022-09-15 16:10:35 +02:00
Dom Dwyer f91d802107 feat: store per-partition persist markers
Changes the ingester to record the per-partition, maximum persisted
sequencer offsets to the catalog. This will enable quick O(1) lookup in
the future, but the currently persisted value is only used to assert the
per-partition monotonic persist ordering invariant.
2022-09-15 16:10:35 +02:00
Dom Dwyer 300938f858 refactor: assert partition persistence ordering
Assert the per-shard / per-partition persistence watermarks
monotonically increase, and document the invariant.

NOTE: this is not a new invariant, just a new assertion to validate it.
2022-09-15 16:10:35 +02:00
Dom Dwyer d199a83355 feat(catalog): per-partition persist mark API
Adds the "persisted_sequence_number" field to the Partition model, and
updates the catalog API to read & update it.
2022-09-15 16:10:35 +02:00
Dom Dwyer c5ac17399a refactor(db): persist marker for partition table
Adds a migration to add a column "persisted_sequence_number" that
defines the inclusive upper-bound on sequencer writes materialised and
uploaded to object store for the partition.
2022-09-15 16:10:35 +02:00
Marco Neumann 159250e776
refactor: concurrent table planning in InfluxRPC (#5647)
* refactor: concurrent table planning in InfluxRPC

Some InfluxRPC can scan multiple tables. Prior to this PR we were always
scanning the tables in sequence, adding up potential latencies (catalog,
ingester, object store). There is no reason we need to do this,
"ordinary" SQL queries would not serialize this way either.

So let's scan tables concurrently. This add concurrency to:

- read filter
- read group
- read window aggregate

There are other query types that could benefit from a similar treatment.
They will be changed in a follow-up.

* docs: improve

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* test: explain `Send` assertion

* refactor: change `CONCURRENT_TABLE_JOBS` to 10

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-09-15 13:55:22 +00:00
Marco Neumann 513fdf1e26
feat: split "pruned" metric into "early" and "late" (#5645)
* feat: split "pruned" metric into "early" and "late"

* docs: improve

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* docs: explain `PruningMetrics`

* test: try to test pruning

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-09-15 13:42:00 +00:00
Marco Neumann f7b6f81fe1
feat: concurrent chunk creation (#5646)
Create chunks in querier concurrently after we've pre-filtered them.
Chunk creation still may require a bit of cached information (e.g. the
partition sort key) and we can easily fetch these concurrently instead
of in order.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-15 12:30:02 +00:00
kodiakhq[bot] 1bac7792db
Merge pull request #5644 from influxdata/dom/split-data
refactor: hoist per-partition persistence watermark from buffer
2022-09-15 09:32:05 +00:00
Dom f84ca2a44f
Merge branch 'main' into dom/split-data 2022-09-15 09:58:31 +01:00
Stuart Carnie e5d8f23fcd
chore: Remove variants from Identifier and BindParameter types (#5642)
* chore: Remove variants from Identifier and BindParameter types

This simplifies usage of these types. Display traits have been updated
to properly quote and escape the output, when necessary.

* chore: Fix docs
2022-09-15 06:52:31 +00:00
Nga Tran 7c4c918636
chore: add parttion id into panic message (#5641)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-15 02:21:13 +00:00
Stuart Carnie e6f2a105e5
feat: Improved InfluxQL error messages (#5632)
* chore: Drive by to improve tests and coverage

* chore: Make Error generic, so we can change it

* chore: change visibility

pub(crate) is superfluous, as we are yet to specify
which APIs are public outside the crate in lib.rs

* chore: Introduce crate IResult type

In preparation of adding custom error type

* feat: Initial implementation of custom error type

* chore: Add module docs

* chore: Rename IResult → ParseResult; syntax and expect errors

* chore: ParserResult and error refactoring

* chore: Drive by simplification

* feat: Add custom errors to string parsing

* feat: Added public API to parse a set of statements

* chore: Errors are dyn Display to convey their intent

Errors from the parser are only displayable messages.

* chore: Separate SHOW for improved error handling

By moving SHOW to a separate parser, we can display clearer error
messages when consuming SHOW followed by an unexpected token.

* chore: Docs and cleanup

* chore: Add tests and a specific `ParseError` type

The fields are intentionally not public yet, as we would like clients
of the package to display the message only.

* chore: PR feedback to improve the `ORDER BY` error message
2022-09-15 00:19:03 +00:00
kodiakhq[bot] a5aa871ff8
Merge pull request #5639 from influxdata/cn/always-get-extra-info
refactor: Move fetching of table columns, extra partition info into the method
2022-09-14 17:08:57 +00:00
kodiakhq[bot] 08e2523295
Merge branch 'main' into cn/always-get-extra-info 2022-09-14 17:01:59 +00:00
Dom Dwyer fc17f2ec2d refactor: hoist persistence watermark from buffer
The maximum persisted sequence number is tracked to answer "up to where
has this partition been persisted", used for querying and skipping
writes that have already been applied (though I suspect this is
redundant).

This is a property of the partition, not the actual data buffer, so this
commit hoists it up out of the data buffer and onto the per-partition
data structure, internalising the field in the process (not pub).
2022-09-14 18:07:45 +02:00
Nga Tran 44e12aa512
feat: add needed budget and memory budget into the message for us to diagnose and increase our memory budget as needed (#5640) 2022-09-14 16:06:19 +00:00
Carol (Nichols || Goulding) e16306d21c
refactor: Move fetching of extra partition info into the method because it's always needed 2022-09-14 11:14:17 -04:00
Andrew Lamb 8b273c2a7d
docs: Add comments about how to see debug logs via `cargo test` (#5627)
* docs: Add documentation about how to see debug logs via `cargo test`

* fix: Update test_helpers/src/lib.rs

Co-authored-by: Marco Neumann <marco@crepererum.net>

* fix: Update test_helpers/src/lib.rs

Co-authored-by: Marco Neumann <marco@crepererum.net>

* fix: Update test_helpers/src/lib.rs

* fix: fmt

Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-14 14:16:46 +00:00
Luke Bond b52865e018
feat: garbage collector now cleans up old parquet files (#5588)
* feat: garbage collector now cleans up old parquet files

* chore: clarifying comment in GC

* chore: typos in GC

* chore: typos in GC

* fix: cmdline arg in GC test needs updating after refactor

* fix: use select! on shutdown rx in GC

* fix: recalc cutoff in GD each loop

* chore: add delete_old that returns IDs only, for GC

* chore: use duration in GC args instead of usize days

* chore: GC lister runs forever w/ sleep; tests updated accordingly

* docs: fix link in GC comments to automatic link

* chore: test for delete_old_ids_only; refactor mem impl thereof

* chore: make GC test less flakey

* chore: make GC test less flakey

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-14 14:09:28 +00:00
dependabot[bot] 7e1f013346
chore(deps): Bump itertools from 0.10.3 to 0.10.4 (#5631)
Bumps [itertools](https://github.com/rust-itertools/itertools) from 0.10.3 to 0.10.4.
- [Release notes](https://github.com/rust-itertools/itertools/releases)
- [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-itertools/itertools/compare/v0.10.3...v0.10.4)

---
updated-dependencies:
- dependency-name: itertools
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-14 14:02:14 +00:00
Marco Neumann 2332e5de10
refactor: slightly increase querier namespace cache TTLs (#5635)
This should lower catalog load and eliminate a few costly cache misses.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-14 13:54:51 +00:00
dependabot[bot] 1353a429d7
chore(deps): Bump tokio from 1.21.0 to 1.21.1 (#5630)
* chore(deps): Bump tokio from 1.21.0 to 1.21.1

Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.21.0 to 1.21.1.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.21.0...tokio-1.21.1)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-09-14 13:22:03 +00:00
dependabot[bot] b4a25fdb0e
chore(deps): Bump thiserror from 1.0.34 to 1.0.35 (#5629)
Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.34 to 1.0.35.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.34...1.0.35)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-14 12:54:12 +00:00
Dom 9ed931271a
Merge pull request #5634 from influxdata/dom/split-data
refactor(ingester): split data.rs into submodules
2022-09-14 13:44:01 +01:00
Dom Dwyer ee8cdb48af style(ingester): fmt imports & long strings
Rewrite the imports to be a consistent order; std, external, crate and
merge all crate-level imports into one use statement.
2022-09-14 14:20:19 +02:00
Dom Dwyer 074722eb3e refactor(ingester): split data.rs into modules
Breaks the gigantic data.rs file into sub-modules for Shard, Namespace,
Table, Partition, and finally the actual data buffer used to store
writes.
2022-09-14 14:20:19 +02:00
Andrew Lamb 45d795055a
feat: Support calling influxql/flux selector aggregates from IOx SQL (#5628)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-14 10:37:17 +00:00
kodiakhq[bot] 674670e17f
Merge pull request #5622 from influxdata/cn/infallible-estimated-bytes
fix: Use `ColumnType` as an enum in more places; make `estimate_arrow_bytes_for_file` infallible
2022-09-14 01:07:05 +00:00
kodiakhq[bot] 85641efa6f
Merge branch 'main' into cn/infallible-estimated-bytes 2022-09-14 01:00:10 +00:00
Luke Bond 51dac55652
Merge pull request #5567 from influxdata/chore/parquetfile-size-trigger
feat: parquetfile size trigger
2022-09-13 16:39:57 +01:00
Luke Bond ee3f172d45 chore: renamed DB migration for billing trigger 2022-09-13 16:29:14 +01:00
Luke Bond c8b545134e chore: add index to speed up billing_summary upsert 2022-09-13 16:22:44 +01:00