Fix the ingester to track the max persisted sequence number per partition.
Ensure replay takes in data from unpersisted partitions.
Simplify the table persist info to not return a max persisted sequence number for the table as that information isn't needed.
After checking the postgres workload for the catalog in prod, this
missing index was noted as the cause of unexpectedly expensive plans for
simple queries.
* feat: return write_token from HTTP writes to router2
* fix: Update router2/src/dml_handlers/instrumentation.rs
Co-authored-by: Dom <dom@itsallbroken.com>
* refactor: Use WriteSummary::default more vigorously
* fix: fix typo and add links to follow on issues
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: compact small contiguous files of the same partition even if they do not overlap
* test: more tests
* chore: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* refactor: address review comments
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* fix: create_or_get_multi for column in catalog now enforces limits
fix: create_or_get_multi for column in catalog now enforces limits
chore: reorder catalog column create fns to be next to each other
test: add failing test for multi col insert w/ limits
test: bend catalog mem impl to match postgres for tests
fix: postgres column insert many column type error checks
chore: clippy
* test: assert column counts in partial column insert test
* chore: add some sql comments to the monster multicolumn insert query; s/RIGHT/INNER/ join
* chore: adding comments to clarify partial failure behaviour of multi col insert
* test: add tests for create_or_get_many columns in catalog
* test: forgot how macros work for a moment
* test: service limit test handles partial update of cols
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Add indexes so compactor can find candidate partitions and specific partition files quickly.
Limit number of level 0 files returned for determining candidates. This should ensure that if comapction is very backed up, it will be able to work through the backlog without evaluating the entire world.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Abstract
========
We need to be careful w/ tombstone that fall exactly in sequence number range of a parquet file.
Current Bug
===========
Imagine the following order of events:
1. Router creates write at sequence number 1:
- `table,selector=1 payload=1 1`
- `table,selector=2 payload=2 2`
2. Ingester pulls write, waits a bit and persists it to parquet file 1:
- `table,selector=1 payload=1 1`
- `table,selector=2 payload=2 2`
4. Router creates write at sequence number 2:
- `table,selector=1 payload=3 3`
- `table,selector=2 payload=4 4`
5. Ingester pulls write
6. Router create delete at sequencer number 3: full time range, `selector=1`
7. Ingeser pulls delete and creates tombstone 1
8. Router creates write at sequence number 4:
- `table,selector=1 payload=5 5`
- `table,selector=2 payload=6 6`
9. Ingester pulls write
10. Ingester persists parquet file 2:
- `table,selector=2 payload=4 4`
- `table,selector=1 payload=5 5`
- `table,selector=2 payload=6 6`
When reading parquet file 2, the tombstone MUST NOT be applied. Otherwise `table,selector=1 payload=5 5` will be
deleted.
Notes
=====
Technically this issue also applies to files created by the compactor, however the compactor marks tombstones as
processed that fall into the sequence number range. It even does that in a single transaction:
fc4635a334/compactor/src/compact.rs (L821-L861)
Alternative
===========
An alternative solution would be if the ingester would mark tombstones that it materialized during persistence as
"processed" (tombstone 1 for parquet file 2 in the example above). However "processed" markers are currently a mere
optimization and don't affect correctness, which is nice for caching on the querier side as well as reasoning.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Support `SHOW NAMESPACES` in sql repl
* feat: add basic support to clients
* fix: add get_namespaces service test
* fix: proper error handling
* test: end to end test for namespace client
* refactor: Use QuerierDatabase rather than Catalog
* refactor: remove unused function
Add configuration options for compactor for the max size of level 0 files and split percentage.
Add metrics for compaction to track the number of candidates, compactions, and durations.
Add functions to separate identifying partitions to compact from running compaction.
Make compaction run in smaller chunks, specifically per partition.
Update compaction to automatically promote level 0 files that are non-overlapping without waiting some period of time.
Closes#4120
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: split up ingester schema test, add mini cluster
* feat: add schema cli test
* feat: add end to end test for querier
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>