Commit Graph

66 Commits (dac0db21960c871c298924269d198a8b01849724)

Author SHA1 Message Date
Nga Tran 73f38077b6
feat: add sort_key_ids as array of bigints into catalog partition (#8375)
* feat: add sort_key_ids as array of bigints into catalog partition

* chore: add comments

* chore: remove comments to avoid changing them in the future due to checksum requirement

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-01 14:28:30 +00:00
Carol (Nichols || Goulding) 4a9e76b8b7
feat: Make parquet_file.partition_id optional in the catalog (#8339)
* feat: Make parquet_file.partition_id optional in the catalog

This will acquire a short lock on the table in postgres, per:
<https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads>

This allows us to persist data for new partitions and associate the
Parquet file catalog records with the partition records using only the
partition hash ID, rather than both that are used now.

* fix: Support transition partition ID in the catalog service

* fix: Use transition partition ID in import/export

This commit also removes support for the `--partition-id` flag of the
`influxdb_iox remote store get-table` command, which Andrew approved.

The `--partition-id` filter was getting the results of the catalog gRPC
service's query for Parquet files of a table and then keeping only the
files whose partition IDs matched. The gRPC query is no longer returning
the partition ID from the Parquet file table, and really, this command
should instead be using `GetParquetFilesByPartitionId` to only request
what's needed rather than filtering.

* feat: Support looking up Parquet files by either kind of Partition id

Regardless of which is actually stored on the Parquet file record.

That is, say there's a Partition in the catalog with:

Partition {
    id: 3,
    hash_id: abcdefg,
}

and a Parquet file that has:

ParquetFile {
    partition_hash_id: abcdefg,
}

calling `list_by_partition_not_to_delete(PartitionId(3))` should still
return this Parquet file because it is associated with the partition
that has ID 3.

This is important for the compactor, which is currently only dealing in
PartitionIds, and I'd like to keep it that way for now to avoid having
to change Even More in this PR.

* fix: Use and set new partition ID fields everywhere they want to be

---------

Co-authored-by: Dom <dom@itsallbroken.com>
2023-07-31 12:40:56 +00:00
Joe-Blount 629f9d20db fix: update new_file_at following all compactions 2023-07-20 13:27:54 -05:00
Fraser Savage e894ea73f7
refactor(catalog): Allow kafka columns to be nullable 2023-07-20 11:18:02 +01:00
Carol (Nichols || Goulding) f20e9e6368
fix: Add index on parquet_file.partition_hash_id for lookup perf 2023-07-10 13:40:03 -04:00
Joe-Blount c2442c31f3 chore: create partition table index for created_at 2023-07-07 16:27:05 -05:00
Carol (Nichols || Goulding) 62ba18171a
feat: Add a new hash column on the partition and parquet file tables
This will hold the deterministic ID for partitions.

Until all existing partitions have this value, this is optional/nullable.

The row ID still exists and is used as the main foreign key in the
parquet_file and skipped_compaction tables.

The hash_id has a unique index so that we can look up records based on
it (if it's available).

If the parquet file record has a partition_hash_id value, use that to
generate the object storage path instead of the partition_id.
2023-06-22 09:01:22 -04:00
Marco Neumann 551e838db3
refactor: remove unused PG indices (#7905)
Similar to #7859. To test index usage, execute the following query on
the writer replica:

```sql
SELECT
    n.nspname                                      AS namespace_name,
    t.relname                                      AS table_name,
    pg_size_pretty(pg_relation_size(t.oid))        AS table_size,
    t.reltuples::bigint                            AS num_rows,
    psai.indexrelname                              AS index_name,
    pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
    CASE WHEN i.indisunique THEN 'Y' ELSE 'N' END  AS "unique",
    psai.idx_scan                                  AS number_of_scans,
    psai.idx_tup_read                              AS tuples_read,
    psai.idx_tup_fetch                             AS tuples_fetched
FROM
    pg_index i
    INNER JOIN pg_class t               ON t.oid = i.indrelid
    INNER JOIN pg_namespace n           ON n.oid = t.relnamespace
    INNER JOIN pg_stat_all_indexes psai ON i.indexrelid = psai.indexrelid
WHERE
    n.nspname = 'iox_catalog' AND t.relname = 'parquet_file'
ORDER BY 1, 2, 5;
````

Data for eu-west-1 at `2023-05-31T16:30:00Z`:

```text
namespace_name |  table_name  | table_size | num_rows  |            index_name             | index_size | unique | number_of_scans |  tuples_read   | tuples_fetched
----------------+--------------+------------+-----------+-----------------------------------+------------+--------+-----------------+----------------+----------------
 iox_catalog    | parquet_file | 38 GB      | 146489216 | parquet_file_deleted_at_idx       | 6442 MB    | N      |      1693534991 | 21602734184385 |    21694365037
 iox_catalog    | parquet_file | 38 GB      | 146489216 | parquet_file_partition_delete_idx | 20 MB      | N      |        17854904 |     3087700816 |      384603858
 iox_catalog    | parquet_file | 38 GB      | 146489216 | parquet_file_partition_idx        | 2325 MB    | N      |      1627977474 | 12604272924323 | 11088781876397
 iox_catalog    | parquet_file | 38 GB      | 146489216 | parquet_file_pkey                 | 8290 MB    | Y      |       480767174 |      481021514 |      480733966
 iox_catalog    | parquet_file | 38 GB      | 146489216 | parquet_file_table_delete_idx     | 174 MB     | N      |         1006563 |    24687617719 |      385132581
 iox_catalog    | parquet_file | 38 GB      | 146489216 | parquet_file_table_idx            | 1905 MB    | N      |         9288042 |   351240529272 |          27551
 iox_catalog    | parquet_file | 38 GB      | 146489216 | parquet_location_unique           | 6076 MB    | Y      |       385294957 |         109448 |         109445
````

and at `2023-06-01T13:00:00Z`:

```text
 namespace_name |  table_name  | table_size | num_rows  |            index_name             | index_size | unique | number_of_scans |  tuples_read   | tuples_fetched
----------------+--------------+------------+-----------+-----------------------------------+------------+--------+-----------------+----------------+----------------
 iox_catalog    | parquet_file | 43 GB      | 152684560 | parquet_file_deleted_at_idx       | 6976 MB    | N      |      1693535032 | 21602834620294 |    21736731439
 iox_catalog    | parquet_file | 43 GB      | 152684560 | parquet_file_partition_delete_idx | 21 MB      | N      |        31468423 |     7397141567 |      677909956
 iox_catalog    | parquet_file | 43 GB      | 152684560 | parquet_file_partition_idx        | 2464 MB    | N      |      1627977474 | 12604272924323 | 11088781876397
 iox_catalog    | parquet_file | 43 GB      | 152684560 | parquet_file_pkey                 | 8785 MB    | Y      |       492762975 |      493017342 |      492729691
 iox_catalog    | parquet_file | 43 GB      | 152684560 | parquet_file_table_delete_idx     | 241 MB     | N      |         1136317 |    24735561304 |      429892231
 iox_catalog    | parquet_file | 43 GB      | 152684560 | parquet_file_table_idx            | 2058 MB    | N      |         9288042 |   351240529272 |          27551
 iox_catalog    | parquet_file | 43 GB      | 152684560 | parquet_location_unique           | 6776 MB    | Y      |       399142416 |         124810 |         124807
````

Due to #7842 and #7894, the following indices are no longer used:

- `parquet_file_partition_idx`
- `parquet_file_table_idx`
2023-06-01 13:45:05 +00:00
Marco Neumann e14305ac33
feat: add index for compactor (#7894)
* fix: migration name

* feat: add index for compactor
2023-05-31 12:29:00 +00:00
Marco Neumann e1c1908a0b
refactor: add `parquet_file` PG index for querier (#7842)
* refactor: add `parquet_file` PG index for querier

Currently the `list_by_table_not_to_delete` catalog query is somewhat
expensive:

```text
iox_catalog_prod=> select table_id, sum((to_delete is NULL)::int) as n from parquet_file group by table_id order by n desc limit 5;
 table_id |  n
----------+------
  1489038 | 7221
  1489037 | 7019
  1491534 | 5793
  1491951 | 5522
  1513377 | 5339
(5 rows)

iox_catalog_prod=> EXPLAIN ANALYZE SELECT id, namespace_id, table_id, partition_id, object_store_id,
       min_time, max_time, to_delete, file_size_bytes,
       row_count, compaction_level, created_at, column_set, max_l0_created_at
FROM parquet_file
WHERE table_id = 1489038 AND to_delete IS NULL;
                                                                          QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on parquet_file  (cost=46050.91..47179.26 rows=283 width=200) (actual time=464.368..472.514 rows=7221 loops=1)
   Recheck Cond: ((table_id = 1489038) AND (to_delete IS NULL))
   Heap Blocks: exact=7152
   ->  BitmapAnd  (cost=46050.91..46050.91 rows=283 width=0) (actual time=463.341..463.343 rows=0 loops=1)
         ->  Bitmap Index Scan on parquet_file_table_idx  (cost=0.00..321.65 rows=22545 width=0) (actual time=1.674..1.674 rows=7221 loops=1)
               Index Cond: (table_id = 1489038)
         ->  Bitmap Index Scan on parquet_file_deleted_at_idx  (cost=0.00..45728.86 rows=1525373 width=0) (actual time=460.717..460.717 rows=4772117 loops=1)
               Index Cond: (to_delete IS NULL)
 Planning Time: 0.092 ms
 Execution Time: 472.907 ms
(10 rows)
```

I think this may also be because PostgreSQL kinda chooses the wrong
strategy, because it could just look at the existing index and filter
from there:

```text
iox_catalog_prod=> EXPLAIN ANALYZE SELECT id, namespace_id, table_id, partition_id, object_store_id,
       min_time, max_time, to_delete, file_size_bytes,
       row_count, compaction_level, created_at, column_set, max_l0_created_at
FROM parquet_file
WHERE table_id = 1489038;
                                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using parquet_file_table_idx on parquet_file  (cost=0.57..86237.78 rows=22545 width=200) (actual time=0.057..6.994 rows=7221 loops=1)
   Index Cond: (table_id = 1489038)
 Planning Time: 0.094 ms
 Execution Time: 7.297 ms
(4 rows)
```

However PostgreSQL doesn't know the cardinalities well enough. So
let's add a dedicated index to make the querier faster.

* feat: new migration system

* docs: explain dirty migrations
2023-05-31 10:56:32 +00:00
Carol (Nichols || Goulding) 47157015d9
feat: Add columns to store the partition templates 2023-05-24 10:10:34 -04:00
Marco Neumann b71564f455 refactor: remove ununused `parquet_file` indices
Remove unused Postgres indices. This lower database load but also gives
us room to install actually useful indices (see #7842).

To detect which indices are used, I've used the following query (on the
actual write/master replicate in eu-central-1):

```sql
SELECT
    n.nspname                                      AS namespace_name,
    t.relname                                      AS table_name,
    pg_size_pretty(pg_relation_size(t.oid))        AS table_size,
    t.reltuples::bigint                            AS num_rows,
    psai.indexrelname                              AS index_name,
    pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
    CASE WHEN i.indisunique THEN 'Y' ELSE 'N' END  AS "unique",
    psai.idx_scan                                  AS number_of_scans,
    psai.idx_tup_read                              AS tuples_read,
    psai.idx_tup_fetch                             AS tuples_fetched
FROM
    pg_index i
    INNER JOIN pg_class t               ON t.oid = i.indrelid
    INNER JOIN pg_namespace n           ON n.oid = t.relnamespace
    INNER JOIN pg_stat_all_indexes psai ON i.indexrelid = psai.indexrelid
WHERE
    n.nspname = 'iox_catalog' AND t.relname = 'parquet_file'
ORDER BY 1, 2, 5;
```

At `2023-05-23T16:00:00Z`:

```text
 namespace_name |  table_name  | table_size | num_rows  |                    index_name                    | index_size | unique | number_of_scans |  tuples_read   | tuples_fetched
----------------+--------------+------------+-----------+--------------------------------------------------+------------+--------+-----------------+----------------+----------------
 iox_catalog    | parquet_file | 31 GB      | 120985000 | parquet_file_deleted_at_idx                      | 5398 MB    | N      |      1693383413 | 21036174283392 |    21336337964
 iox_catalog    | parquet_file | 31 GB      | 120985000 | parquet_file_partition_created_idx               | 11 GB      | N      |        34190874 |     4749070532 |       61934212
 iox_catalog    | parquet_file | 31 GB      | 120985000 | parquet_file_partition_idx                       | 2032 MB    | N      |      1612961601 |  9935669905489 |  8611676799872
 iox_catalog    | parquet_file | 31 GB      | 120985000 | parquet_file_pkey                                | 7135 MB    | Y      |       453927041 |      454181262 |      453894565
 iox_catalog    | parquet_file | 31 GB      | 120985000 | parquet_file_shard_compaction_delete_created_idx | 14 GB      | N      |               0 |              0 |              0
 iox_catalog    | parquet_file | 31 GB      | 120985000 | parquet_file_shard_compaction_delete_idx         | 8767 MB    | N      |               2 |          30717 |           4860
 iox_catalog    | parquet_file | 31 GB      | 120985000 | parquet_file_table_idx                           | 1602 MB    | N      |         9136844 |   341839537275 |          27551
 iox_catalog    | parquet_file | 31 GB      | 120985000 | parquet_location_unique                          | 4989 MB    | Y      |       332341872 |           3123 |           3123
```

At `2023-05-24T09:50:00Z` (i.e. nearly 18h later):

```text
 namespace_name |  table_name  | table_size | num_rows  |                    index_name                    | index_size | unique | number_of_scans |  tuples_read   | tuples_fetched
----------------+--------------+------------+-----------+--------------------------------------------------+------------+--------+-----------------+----------------+----------------
 iox_catalog    | parquet_file | 31 GB      | 123869328 | parquet_file_deleted_at_idx                      | 5448 MB    | N      |      1693485804 | 21409285169862 |    21364369704
 iox_catalog    | parquet_file | 31 GB      | 123869328 | parquet_file_partition_created_idx               | 11 GB      | N      |        34190874 |     4749070532 |       61934212
 iox_catalog    | parquet_file | 31 GB      | 123869328 | parquet_file_partition_idx                       | 2044 MB    | N      |      1615214409 | 10159380553599 |  8811036969123
 iox_catalog    | parquet_file | 31 GB      | 123869328 | parquet_file_pkey                                | 7189 MB    | Y      |       455128165 |      455382386 |      455095624
 iox_catalog    | parquet_file | 31 GB      | 123869328 | parquet_file_shard_compaction_delete_created_idx | 14 GB      | N      |               0 |              0 |              0
 iox_catalog    | parquet_file | 31 GB      | 123869328 | parquet_file_shard_compaction_delete_idx         | 8849 MB    | N      |               2 |          30717 |           4860
 iox_catalog    | parquet_file | 31 GB      | 123869328 | parquet_file_table_idx                           | 1618 MB    | N      |         9239071 |   348304417343 |          27551
 iox_catalog    | parquet_file | 31 GB      | 123869328 | parquet_location_unique                          | 5043 MB    | Y      |       343484617 |           3123 |           3123
```

The cluster currently is under load and all components are running.
Conclusion:

- `parquet_file_deleted_at_idx`: Used, likely by the GC. We could
  probably shrink this index by binning `deleted_at` (within the index,
  not within the actual database table), but let's do this in a later PR.
- `parquet_file_partition_created_idx`: Unused and huge (`created_at` is
  NOT binned). So let's remove it.
- `parquet_file_partition_idx`: Used, likely by the compactor and
  querier because we currently don't have a better index (see #7842 as
  well). This includes deleted files as well which is somewhat
  pointless. May become obsolete after #7842, not touching for now.
- `parquet_file_pkey`: Primary key. We should probably use the object
  store UUID as a primary key BTW, which would also make the GC faster.
  Not touching for now.
- `parquet_file_shard_compaction_delete_created_idx`: Huge unused index.
  Shards don't exist anymore. Delete it.
- `parquet_file_shard_compaction_delete_idx`: Same as
  `parquet_file_shard_compaction_delete_created_idx`.
- `parquet_file_table_idx`: Used but is somewhat too large because it
  contains deleted files. Might become obsolete after #7842, don't touch
  for now.
- `parquet_location_unique`: See note `parquet_file_pkey`, it's
  pointless to have two IDs here. Not touching for now but this is a
  potential future improvement.

So we remove:

- `parquet_file_partition_created_idx`
- `parquet_file_shard_compaction_delete_created_idx`
- `parquet_file_shard_compaction_delete_idx`
2023-05-24 12:10:22 +02:00
Marco Neumann d34d23c354
refactor: remove `processed_tombstone` table (#7840)
- the table is unused
- there are no foreign keys or triggers based on this table
- the design is generally not scalable (N*M entries) and tombstones
  should rather have
  a timestamp so we can check if a parquet file includes that
  information or not (or some other form of serialization mechanism)
- it's currently empty in prod (an never was filled w/ data in any
  cluster)
2023-05-22 15:56:23 +00:00
Dom Dwyer 61409f062c
refactor(catalog): soft delete namespace column
Adds a "deleted_at" column that will indicate the timestamp at which is
was marked as logically deleted.
2023-02-09 11:35:27 +01:00
Nga Tran b8a80869d4
feat: introduce a new way of max_sequence_number for ingester, compactor and querier (#6692)
* feat: introduce a new way of max_sequence_number for ingester, compactor and querier

* chore: cleanup

* feat: new column max_l0_created_at to order files for deduplication

* chore: cleanup

* chore: debug info for chnaging cpu.parquet

* fix: update test parquet file

Co-authored-by: Marco Neumann <marco@crepererum.net>
2023-01-26 10:52:47 +00:00
Nga Tran 550cea8bc5
perf: optimize not to update partitions with newly created level 2 files (#6590)
* perf: optimize not to update partitions with newly created level 2 files

* chore: cleanup

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-13 14:46:58 +00:00
Nga Tran b20226797a
fix: make trigger midification in different file (#6526) 2023-01-06 20:34:48 +00:00
Nga Tran b856edf826
feat: function to get parttion candidates from partition table (#6519)
* feat: function to get parttion candidates from partition table

* chore: cleanup

* fix: make new_file_at the same value as created_at

* chore: cleanup

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-06 16:20:45 +00:00
Nga Tran 23807df7a9
feat: trigger that updates partition table when a parquet file is created (#6514)
* feat: trigger that update partition table when a parquet file is created

* chore: simplify epoch of now
2023-01-05 19:57:23 +00:00
Nga Tran 1088baea3d
chore: index for selecting partitions with parquet files created after a given time (#6496)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-04 18:07:07 +00:00
Luke Bond 6263ca234a chore: delete ns postgres impl, test improvements, fix to mem impl 2022-12-16 10:23:50 +00:00
Luke Bond 7c813c170a
feat: reintroduce compactor first file in partition exception (#6176)
* feat: compactor ignores max file count for first file

chore: typo in comment in compactor

* feat: restore special first file in partition compaction logic; add limit

* fix: calculation in compaction max file count

chore: clippy

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-18 15:58:59 +00:00
Nga Tran a3f2fe489c
refactor: remove retention_duration field from namespace catalog table (#6124) 2022-11-11 20:30:42 +00:00
NGA-TRAN 498851eaf5 feat: add catalog columns needed for retention policy 2022-11-01 15:35:15 -04:00
Dom Dwyer 46bbee5423 refactor: reduce default column limit
Reduces the default number of columns allowed per-table, from 1,000 to
200.
2022-10-14 14:45:48 +02:00
Nga Tran 75ff805ee2
feat: instead of adding num_files and memory budget into the reason text column, let us create differnt columns for them. We will be able to filter them easily (#5742)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-26 20:14:04 +00:00
Dom Dwyer 66bf0ff272 refactor(db): NULLable persisted_sequence_number
Makes the partition.persisted_sequence_number column in the catalog DB
NULLable. 0 is a valid persisted sequence number.
2022-09-15 18:19:39 +02:00
Dom Dwyer c5ac17399a refactor(db): persist marker for partition table
Adds a migration to add a column "persisted_sequence_number" that
defines the inclusive upper-bound on sequencer writes materialised and
uploaded to object store for the partition.
2022-09-15 16:10:35 +02:00
Luke Bond ee3f172d45 chore: renamed DB migration for billing trigger 2022-09-13 16:29:14 +01:00
Luke Bond c8b545134e chore: add index to speed up billing_summary upsert 2022-09-13 16:22:44 +01:00
Luke Bond feae712881 fix: parquet_file billing trigger respects to_delete 2022-09-13 16:22:44 +01:00
Luke Bond cc93b2c275 chore: add catalog trigger for billing 2022-09-13 16:22:44 +01:00
Carol (Nichols || Goulding) fbe3e360d2
feat: Record skipped compactions in memory
Connects to #5458.
2022-09-09 15:31:07 -04:00
Nga Tran cbfd37540a
feat: add index on parquet_file(shard_id, compaction_level, to_delete, created_at) (#5544) 2022-09-02 14:27:29 +00:00
Carol (Nichols || Goulding) 8a0fa616cf
fix: Rename columns, tables, indexes and constraints in postgres catalog 2022-09-01 10:00:54 -04:00
Nga Tran a2c82a6f1c
chore: remove min sequence number from the catalog table as we no longer use it (#5178)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-21 20:47:55 +00:00
Marco Neumann be53716e4d
refactor: use IDs for `parquet_file.column_set` (#4965)
* feat: `ColumnRepo::list_by_table_id`

* refactor: use IDs for `parquet_file.column_set`

Closes #4959.

* refactor: introduce `TableSchema::column_id_map`
2022-06-30 15:08:41 +00:00
Marco Neumann 215f297162
refactor: parquet file metadata from catalog (#4949)
* refactor: remove `ParquetFileWithMetadata`

* refactor: remove `ParquetFileRepo::parquet_metadata`

* refactor: parquet file metadata from catalog

Closes #4124.
2022-06-27 15:38:39 +00:00
Nga Tran 92eeb5b232
chore: remove unused sort_key_old from catalog partition (#4944)
* chore: remove unused sort_key_old from catalog partition

* chore: add new line at the end of the SQL file
2022-06-24 15:02:38 +00:00
Marco Neumann 994bc5fefd
refactor: ensure that SQL parquet file column sets are not NULL (#4937)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-24 14:26:18 +00:00
Marco Neumann c3912e34e9
refactor: store per-file column set in catalog (#4908)
* refactor: store per-file column set in catalog

Together with the table-wide schema and the partition-wide sort key, this should
be everything we need to read a parquet file directly into memory
without peeking any file-level metadata.

The querier will use this to directly load parquet files into the read
buffer.

**WARNING: This requires a catalog wipe!**

Ref #4124.

* refactor: use proper `ColumnSet` type
2022-06-21 10:26:12 +00:00
Nga Tran 13c57d524a
feat: Change data type of catalog partition's sort_key from a string to an array of string (#4801)
* feat: Change data type of catalog Postgres partition's sort_key from a string to an array of string

* test: add column with comma

* fix: use new protonuf field to avoid incompactible

* fix: ensure sort_key is an empty array rather than NULL

* refactor: address review comments

* refactor: address more comments

* chore: clearer comments

* chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql

* chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql

* fix: Rename migration so it will be applied after

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
2022-06-10 13:31:31 +00:00
Marko Mikulicic c09f6f6bc9
chore: Incrementally migrate sort_key to array type (#4826)
This PR is the first step where we add a new column sort_key_arr whose content we'll manually migrate from sort_key.

When we're done with this, we'll merge https://github.com/influxdata/influxdb_iox/pull/4801/ (whose migration script must be adapted slightly to rename the `sort_key_arr` column back to `sort_key`).

All this must be done while we shut down the ingesters and the compactors.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-10 11:35:43 +00:00
Marco Neumann 86e8f05ed1
fix: make all catalog IDs 64bit (#4418)
Closes #4365.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-25 16:49:34 +00:00
kodiakhq[bot] e2439c0a4f
Merge branch 'main' into cn/sort-key-catalog 2022-04-04 16:54:48 +00:00
Dom Dwyer 61bc9c83ad refactor: add table_id index on column_name
After checking the postgres workload for the catalog in prod, this
missing index was noted as the cause of unexpectedly expensive plans for
simple queries.
2022-04-04 13:04:25 +01:00
Carol (Nichols || Goulding) c9bc70f03a
feat: Add optional sort_key column to partition table
Connects to #4195.
2022-04-01 15:45:51 -04:00
Paul Dix 6479e1fc8e
fix: add indexes to parquet_file (#4198)
Add indexes so compactor can find candidate partitions and specific partition files quickly.
Limit number of level 0 files returned for determining candidates. This should ensure that if comapction is very backed up, it will be able to work through the backlog without evaluating the entire world.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-01 09:59:39 +00:00
Marko Mikulicic 2c47d77a5b
fix: Backfill namespace_id in schema migration (#4177)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-30 16:31:26 +00:00
Carol (Nichols || Goulding) 5c8a80dca6
fix: Add an index to parquet_file to_delete 2022-03-29 08:15:26 -04:00