Fix the ingester to track the max persisted sequence number per partition.
Ensure replay takes in data from unpersisted partitions.
Simplify the table persist info to not return a max persisted sequence number for the table as that information isn't needed.
After checking the postgres workload for the catalog in prod, this
missing index was noted as the cause of unexpectedly expensive plans for
simple queries.
* fix: create_or_get_multi for column in catalog now enforces limits
fix: create_or_get_multi for column in catalog now enforces limits
chore: reorder catalog column create fns to be next to each other
test: add failing test for multi col insert w/ limits
test: bend catalog mem impl to match postgres for tests
fix: postgres column insert many column type error checks
chore: clippy
* test: assert column counts in partial column insert test
* chore: add some sql comments to the monster multicolumn insert query; s/RIGHT/INNER/ join
* chore: adding comments to clarify partial failure behaviour of multi col insert
* test: add tests for create_or_get_many columns in catalog
* test: forgot how macros work for a moment
* test: service limit test handles partial update of cols
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Add indexes so compactor can find candidate partitions and specific partition files quickly.
Limit number of level 0 files returned for determining candidates. This should ensure that if comapction is very backed up, it will be able to work through the backlog without evaluating the entire world.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Add configuration options for compactor for the max size of level 0 files and split percentage.
Add metrics for compaction to track the number of candidates, compactions, and durations.
Add functions to separate identifying partitions to compact from running compaction.
Make compaction run in smaller chunks, specifically per partition.
Update compaction to automatically promote level 0 files that are non-overlapping without waiting some period of time.
Closes#4120
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: remove fully processed tombstones
* test: first few tests
* fix: delete SQL
* fix: test how IN (...) works in PG
* fix: test how IN (?) works in PG
* fix: test how IN (?) works in PG
* fix: dynamically add IN (?, ?, ...)
* fix: dynamically add IN (?, ?, ...) & its dynamic values
* fix: add argument directly in the SQL
* test: more tests for catalog read and update functions
* chore: move a subfunction to make it easier to read)
* test: first test for find_can_compact but disabled due to bug
* test: integration tests and a bug fix for find_and_compact
* chore: cleanup
* refactor: address review comments
* fix: put 2 delete processed tombstones and tombstones in a transaction
Lowercases the error messages in the big iox_catalog Error enum for
better composition of messages (no random capitalisation in
glued-together strings, which is common with wrapped errors).
Set to_delete to the time the file was marked as deleted rather than
true.
Fixes#4059.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: `TombstoneRepo::list_by_table`
* feat: `ParquetFileRepo::list_by_table_not_to_delete`
* refactor: `querier` w/o `db`
Get the `querier` to work w/o relying on `db`. A few notes:
- Testing is kinda shallow, we really need to get `query_tests` working
w/ `querier` (see #3934).
- We still run a sync loop for namespaces, tables and schemas. This will
be a replaced by "update namespace incl. tables and schemas on demand".
Note however that we cannot fetch single tables and schemas on demand
at the moment, because DataFusion doesn't implement async schema
inspection (only `scan` / "give me all the chunks" is async). I think
that's OK for now and we can address this later.
- There is NO cache for parquet files and tombstones at the moment. For
correctness, they need to be fetched in a single transaction (or we
need a kinda tricky sequence number / logical clock tracking) and I am
not sure yet how this makes sense when we have the ingester data wired
up and predicates pushed down to the catalog (see next point). So
let's measure first and then decide on a caching strategy for this.
- Predicates are currently NOT pushed down to the catalog. I'll need to
figure out how to extract time range from generic DataFusion
expressions to make that work (it's easier for InfluxRPC queries, but
they are not tested at the moment, see first point).
Sorry that this commit is kinda huge. I initially planned to only
migrate the chunks away from `db` and leave the tables and schemas for a
follow-up PR, but the DataFusion trait structure (chunks are bound to
their tables) makes this kinda pointless.
Closes#3974.
* docs: explain what we're doing
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* docs: mention tracking issues
* docs: explain what we're doing
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* feat: `TableRepo::get_by_namespace_and_name`
* refactor: rework `TableCache`
- dual cache that can also map table names to IDs
- deal w/ missing tables w/o panics
- set proper timeouts to missing data
For #3974.
* test: extend table cache tests
fix: refactor table & col limit enforcement in catalog into single SQL statement
fix: borked rebase
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Configure the postgres catalog to close unused connections after 1
minute, rather than 500s to introduce a bit of fluidity to pool of
connection acquires.
When created in the catalog, parquet files should always have compaction
level 0. Updating the compaction level should always happen in the
compactor.
Only the catalog should need to know about the initial compaction level
value.
This has the advantages of:
- Not needing to create fake parquet file IDs or fake deleted_at
values that aren't used by create before insertion
- Not needing too many arguments for create
- Naming the arguments so it's easier to see what value is what
argument, especially in tests
- Easier to reuse arguments or parts of arguments by using copies of
params, which makes it easier to see differences, especially in tests
Adds indexes to the JOINed fields to reduce execution cost, as the
TableRepo::get_table_persist_info() is currently by far the most
expensive catalog operation.