2022-01-11 17:51:56 +00:00
|
|
|
# IOx Catalog
|
2022-03-04 20:22:07 +00:00
|
|
|
|
2022-03-07 15:54:27 +00:00
|
|
|
This crate contains the code for the IOx Catalog. This includes the definitions of namespaces,
|
|
|
|
their tables, the columns of those tables and their types, what Parquet files are in object storage
|
2022-06-02 18:01:06 +00:00
|
|
|
and delete tombstones. There's also some configuration information that the overall distributed
|
2022-03-07 15:54:27 +00:00
|
|
|
system uses for operation.
|
2022-01-11 17:51:56 +00:00
|
|
|
|
2022-03-07 15:54:27 +00:00
|
|
|
To run this crate's tests you'll need Postgres installed and running locally. You'll also need to
|
|
|
|
set the `INFLUXDB_IOX_CATALOG_DSN` environment variable so that sqlx will be able to connect to
|
|
|
|
your local DB. For example with user and password filled in:
|
2022-01-11 17:51:56 +00:00
|
|
|
|
|
|
|
```
|
2022-03-04 20:22:07 +00:00
|
|
|
INFLUXDB_IOX_CATALOG_DSN=postgres://<postgres user>:<postgres password>@localhost/iox_shared
|
2022-01-11 17:51:56 +00:00
|
|
|
```
|
|
|
|
|
2022-03-07 15:54:27 +00:00
|
|
|
You can omit the host part if your postgres is running on the default unix domain socket (useful on
|
|
|
|
macos because, by default, the config installed by `brew install postgres` doesn't listen to a TCP
|
|
|
|
port):
|
2022-02-16 10:23:53 +00:00
|
|
|
|
|
|
|
```
|
2022-03-04 20:22:07 +00:00
|
|
|
INFLUXDB_IOX_CATALOG_DSN=postgres:///iox_shared
|
2022-02-16 10:23:53 +00:00
|
|
|
```
|
|
|
|
|
2022-01-31 15:07:38 +00:00
|
|
|
You'll then need to create the database. You can do this via the sqlx command line.
|
2022-01-11 17:51:56 +00:00
|
|
|
|
|
|
|
```
|
|
|
|
cargo install sqlx-cli
|
2022-03-04 20:22:07 +00:00
|
|
|
DATABASE_URL=<dsn> sqlx database create
|
|
|
|
cargo run -q -- catalog setup
|
2022-01-11 17:51:56 +00:00
|
|
|
```
|
|
|
|
|
2022-03-07 15:54:27 +00:00
|
|
|
This will set up the database based on the files in `./migrations` in this crate. SQLx also creates
|
|
|
|
a table to keep track of which migrations have been run.
|
2022-01-14 15:41:38 +00:00
|
|
|
|
2022-03-07 15:54:27 +00:00
|
|
|
NOTE: **do not** use `sqlx database setup`, because that will create the migration table in the
|
|
|
|
wrong schema (namespace). Our `catalog setup` code will do that part by using the same sqlx
|
|
|
|
migration module but with the right namespace setup.
|
2022-03-02 13:56:42 +00:00
|
|
|
|
2022-03-04 20:22:07 +00:00
|
|
|
## Migrations
|
|
|
|
|
2022-03-07 15:54:27 +00:00
|
|
|
If you need to create and run migrations to add, remove, or change the schema, you'll need the
|
|
|
|
`sqlx-cli` tool. Install with `cargo install sqlx-cli` if you haven't already, then run `sqlx
|
|
|
|
migrate --help` to see the commands relevant to migrations.
|
2022-03-04 20:22:07 +00:00
|
|
|
|
2022-01-14 15:41:38 +00:00
|
|
|
## Tests
|
|
|
|
|
|
|
|
To run the Postgres integration tests, ensure the above setup is complete first.
|
|
|
|
|
2022-03-07 15:54:27 +00:00
|
|
|
**CAUTION:** existing data in the database is dropped when tests are run, so you should use a
|
|
|
|
DIFFERENT database name for your test database than your `INFLUXDB_IOX_CATALOG_DSN` database.
|
2022-01-14 15:41:38 +00:00
|
|
|
|
2022-03-07 15:54:27 +00:00
|
|
|
* Set `TEST_INFLUXDB_IOX_CATALOG_DSN=<testdsn>` env as above with the `INFLUXDB_IOX_CATALOG_DSN`
|
|
|
|
env var. The integration tests *will* pick up this value if set in your `.env` file.
|
2022-03-04 20:22:07 +00:00
|
|
|
* Set `TEST_INTEGRATION=1`
|
|
|
|
* Run `cargo test -p iox_catalog`
|
2022-02-16 10:23:53 +00:00
|
|
|
|
|
|
|
## Schema namespace
|
|
|
|
|
2022-03-07 15:54:27 +00:00
|
|
|
All iox catalog tables are created in a `iox_catalog` schema. Remember to set the schema search
|
|
|
|
path when accessing the database with `psql`.
|
2022-02-16 10:23:53 +00:00
|
|
|
|
2022-03-07 15:54:27 +00:00
|
|
|
There are several ways to set the default search path, depending if you want to do it for your
|
|
|
|
session, for the database or for the user.
|
2022-02-16 10:23:53 +00:00
|
|
|
|
2022-03-07 15:54:27 +00:00
|
|
|
Setting a default search path for the database or user may interfere with tests (e.g. it may make
|
|
|
|
some test pass when they should fail). The safest option is set the search path on a per session
|
|
|
|
basis. As always, there are a few ways to do that:
|
2022-02-16 10:23:53 +00:00
|
|
|
|
|
|
|
1. you can type `set search_path to public,iox_catalog;` inside psql.
|
|
|
|
2. you can add (1) to your `~/.psqlrc`
|
|
|
|
3. or you can just pass it as a CLI argument with:
|
|
|
|
|
|
|
|
```
|
|
|
|
psql 'dbname=iox_shared options=-csearch_path=public,iox_catalog'
|
|
|
|
```
|
refactor: add `parquet_file` PG index for querier (#7842)
* refactor: add `parquet_file` PG index for querier
Currently the `list_by_table_not_to_delete` catalog query is somewhat
expensive:
```text
iox_catalog_prod=> select table_id, sum((to_delete is NULL)::int) as n from parquet_file group by table_id order by n desc limit 5;
table_id | n
----------+------
1489038 | 7221
1489037 | 7019
1491534 | 5793
1491951 | 5522
1513377 | 5339
(5 rows)
iox_catalog_prod=> EXPLAIN ANALYZE SELECT id, namespace_id, table_id, partition_id, object_store_id,
min_time, max_time, to_delete, file_size_bytes,
row_count, compaction_level, created_at, column_set, max_l0_created_at
FROM parquet_file
WHERE table_id = 1489038 AND to_delete IS NULL;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on parquet_file (cost=46050.91..47179.26 rows=283 width=200) (actual time=464.368..472.514 rows=7221 loops=1)
Recheck Cond: ((table_id = 1489038) AND (to_delete IS NULL))
Heap Blocks: exact=7152
-> BitmapAnd (cost=46050.91..46050.91 rows=283 width=0) (actual time=463.341..463.343 rows=0 loops=1)
-> Bitmap Index Scan on parquet_file_table_idx (cost=0.00..321.65 rows=22545 width=0) (actual time=1.674..1.674 rows=7221 loops=1)
Index Cond: (table_id = 1489038)
-> Bitmap Index Scan on parquet_file_deleted_at_idx (cost=0.00..45728.86 rows=1525373 width=0) (actual time=460.717..460.717 rows=4772117 loops=1)
Index Cond: (to_delete IS NULL)
Planning Time: 0.092 ms
Execution Time: 472.907 ms
(10 rows)
```
I think this may also be because PostgreSQL kinda chooses the wrong
strategy, because it could just look at the existing index and filter
from there:
```text
iox_catalog_prod=> EXPLAIN ANALYZE SELECT id, namespace_id, table_id, partition_id, object_store_id,
min_time, max_time, to_delete, file_size_bytes,
row_count, compaction_level, created_at, column_set, max_l0_created_at
FROM parquet_file
WHERE table_id = 1489038;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using parquet_file_table_idx on parquet_file (cost=0.57..86237.78 rows=22545 width=200) (actual time=0.057..6.994 rows=7221 loops=1)
Index Cond: (table_id = 1489038)
Planning Time: 0.094 ms
Execution Time: 7.297 ms
(4 rows)
```
However PostgreSQL doesn't know the cardinalities well enough. So
let's add a dedicated index to make the querier faster.
* feat: new migration system
* docs: explain dirty migrations
2023-05-31 10:56:32 +00:00
|
|
|
|
|
|
|
## Failed / Dirty Migrations
|
|
|
|
Migrations might be marked as dirty in prod if they do not run all the way through. In this case, you have to manually
|
|
|
|
(using a read-write shell):
|
|
|
|
|
|
|
|
1. Revert the effect of the migration (e.g. drop created tables, drop created indices)
|
|
|
|
2. Remove the migration from the `_sqlx_migrations`. E.g. if the version of the migration is 1337, this is:
|
|
|
|
|
|
|
|
```sql
|
|
|
|
DELETE FROM _sqlx_migrations
|
|
|
|
WHERE version = 1337;
|
|
|
|
```
|