* feat: function to get parttion candidates from partition table
* chore: cleanup
* fix: make new_file_at the same value as created_at
* chore: cleanup
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: compactor ignores max file count for first file
chore: typo in comment in compactor
* feat: restore special first file in partition compaction logic; add limit
* fix: calculation in compaction max file count
chore: clippy
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Adds a migration to add a column "persisted_sequence_number" that
defines the inclusive upper-bound on sequencer writes materialised and
uploaded to object store for the partition.
* refactor: store per-file column set in catalog
Together with the table-wide schema and the partition-wide sort key, this should
be everything we need to read a parquet file directly into memory
without peeking any file-level metadata.
The querier will use this to directly load parquet files into the read
buffer.
**WARNING: This requires a catalog wipe!**
Ref #4124.
* refactor: use proper `ColumnSet` type
* feat: Change data type of catalog Postgres partition's sort_key from a string to an array of string
* test: add column with comma
* fix: use new protonuf field to avoid incompactible
* fix: ensure sort_key is an empty array rather than NULL
* refactor: address review comments
* refactor: address more comments
* chore: clearer comments
* chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql
* chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql
* fix: Rename migration so it will be applied after
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
This PR is the first step where we add a new column sort_key_arr whose content we'll manually migrate from sort_key.
When we're done with this, we'll merge https://github.com/influxdata/influxdb_iox/pull/4801/ (whose migration script must be adapted slightly to rename the `sort_key_arr` column back to `sort_key`).
All this must be done while we shut down the ingesters and the compactors.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
After checking the postgres workload for the catalog in prod, this
missing index was noted as the cause of unexpectedly expensive plans for
simple queries.
Add indexes so compactor can find candidate partitions and specific partition files quickly.
Limit number of level 0 files returned for determining candidates. This should ensure that if comapction is very backed up, it will be able to work through the backlog without evaluating the entire world.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Set to_delete to the time the file was marked as deleted rather than
true.
Fixes#4059.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Adds indexes to the JOINed fields to reduce execution cost, as the
TableRepo::get_table_persist_info() is currently by far the most
expensive catalog operation.
Allow the caller to set the Postgres schema a migration should be
applied to, rather than restricting the migration to a specific,
hard-coded schema.
BREAKING CHANGE: manually adds a new migration that precedes the
existing migration to ensure the iox_catalog schema exists before
applying the migration. You'll probably have to drop any existing
databases and migrate from scratch:
sqlx database drop; sqlx database create;
* Adds TombstoneId and Tombstone to the iox_catalog with associated interfaces
* Adds SequenceNumber new type for use with Tombstone
* Adds Timestamp new type for use with Tombstone
* Adds constraint to the Postgres schema to enforce tombstone uniqueness by table_id, sequencer_id, and sequence_number
* refactor: ensure sequencers are unique
Adds a unique constraint to ensure only one sequencer record exists for
each Kafka (topic, partition).
* test: use DSN from env for integration tests
Removes the hard-coded DSN, instead sourcing it from the DATABASE_URL
environment variable.
* docs: integration testing for iox_catalog
Documents the required steps in order to run the Postgres integration
tests for the iox_catalog crate.
* feat(iox_catalog): create & list sequencers
Adds support for interacting with the "sequencer" table.
* chore: update lockfile
Running cargo in iox_catalog generates a lockfile diff.
Changed to use the iox_catalog schema in Postgres rather than public.
Updated talbe names to be singular.
Removed the connection_string from query_pool