Partition::snapshot_to_persisting() passes the ID of the partition it is
calling `snapshot_to_persisting()` on. The partition already knows what
its ID is, so at best it's redundant, and at worst, inconsistent with
the actual ID.
Changes the ingest code path to eliminate scanning the parquet_files
table to discover the last persisted offset per partition, instead
utilising the new persisted_sequence_number field on the Partition
itself to read the same value.
This lookup blocks ingest for the shard, so removing the expensive query
from the ingest hot path should improve catch-up time after a
restart/deployment.
This encodes the result directly and has the FilterResult hold only the
relevant data to the state. So no longer any need to create or check for
empty vectors or 0 budget_bytes. Also creates a new type after checking
the filter result state and handling the budget, as actual compaction
doesn't need to care about that.
This could still use more refactoring to become a clearer pipeline of
different states, but I think this is a good start.
Table columns for a partition don't change, so rather than carrying
around table columns for the partition and parquet files to look up
repeatedly, have the `PartitionCompactionCandidateWithInfo` keep track
of its column types and be able to estimate bytes given a number of rows
from a parquet file.
Changes the ingester to record the per-partition, maximum persisted
sequencer offsets to the catalog. This will enable quick O(1) lookup in
the future, but the currently persisted value is only used to assert the
per-partition monotonic persist ordering invariant.
Assert the per-shard / per-partition persistence watermarks
monotonically increase, and document the invariant.
NOTE: this is not a new invariant, just a new assertion to validate it.
Adds a migration to add a column "persisted_sequence_number" that
defines the inclusive upper-bound on sequencer writes materialised and
uploaded to object store for the partition.
* refactor: concurrent table planning in InfluxRPC
Some InfluxRPC can scan multiple tables. Prior to this PR we were always
scanning the tables in sequence, adding up potential latencies (catalog,
ingester, object store). There is no reason we need to do this,
"ordinary" SQL queries would not serialize this way either.
So let's scan tables concurrently. This add concurrency to:
- read filter
- read group
- read window aggregate
There are other query types that could benefit from a similar treatment.
They will be changed in a follow-up.
* docs: improve
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* test: explain `Send` assertion
* refactor: change `CONCURRENT_TABLE_JOBS` to 10
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* feat: split "pruned" metric into "early" and "late"
* docs: improve
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* docs: explain `PruningMetrics`
* test: try to test pruning
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Create chunks in querier concurrently after we've pre-filtered them.
Chunk creation still may require a bit of cached information (e.g. the
partition sort key) and we can easily fetch these concurrently instead
of in order.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Remove variants from Identifier and BindParameter types
This simplifies usage of these types. Display traits have been updated
to properly quote and escape the output, when necessary.
* chore: Fix docs
* chore: Drive by to improve tests and coverage
* chore: Make Error generic, so we can change it
* chore: change visibility
pub(crate) is superfluous, as we are yet to specify
which APIs are public outside the crate in lib.rs
* chore: Introduce crate IResult type
In preparation of adding custom error type
* feat: Initial implementation of custom error type
* chore: Add module docs
* chore: Rename IResult → ParseResult; syntax and expect errors
* chore: ParserResult and error refactoring
* chore: Drive by simplification
* feat: Add custom errors to string parsing
* feat: Added public API to parse a set of statements
* chore: Errors are dyn Display to convey their intent
Errors from the parser are only displayable messages.
* chore: Separate SHOW for improved error handling
By moving SHOW to a separate parser, we can display clearer error
messages when consuming SHOW followed by an unexpected token.
* chore: Docs and cleanup
* chore: Add tests and a specific `ParseError` type
The fields are intentionally not public yet, as we would like clients
of the package to display the message only.
* chore: PR feedback to improve the `ORDER BY` error message
The maximum persisted sequence number is tracked to answer "up to where
has this partition been persisted", used for querying and skipping
writes that have already been applied (though I suspect this is
redundant).
This is a property of the partition, not the actual data buffer, so this
commit hoists it up out of the data buffer and onto the per-partition
data structure, internalising the field in the process (not pub).
* feat: garbage collector now cleans up old parquet files
* chore: clarifying comment in GC
* chore: typos in GC
* chore: typos in GC
* fix: cmdline arg in GC test needs updating after refactor
* fix: use select! on shutdown rx in GC
* fix: recalc cutoff in GD each loop
* chore: add delete_old that returns IDs only, for GC
* chore: use duration in GC args instead of usize days
* chore: GC lister runs forever w/ sleep; tests updated accordingly
* docs: fix link in GC comments to automatic link
* chore: test for delete_old_ids_only; refactor mem impl thereof
* chore: make GC test less flakey
* chore: make GC test less flakey
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This should lower catalog load and eliminate a few costly cache misses.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>