influxdb/iox_catalog
Dom Dwyer 2d46a364dc
feat: namespace soft-delete support
This commit adds initial support for "soft" namespace deletion, where
the actual records & data remain, but are no longer queryable /
writeable.

Soft deletion is eventually consistent - users can expect to continue
writing to and reading from a bucket after issuing a soft delete call,
until the various components either restart, or have their caches
flushed.

The components treat soft-deleted namespaces differently:

    * router: ignore soft deleted namespaces
    * ingester: accept soft deleted namespaces
    * compactor: accept soft deleted namespaces
    * querier: ignore soft deleted namespaces
    * various gRPC services: ignore soft deleted namespaces

This ensures that the ingester & compactor do not see rows "vanishing"
from the database, and continue to make forward progress.

Writes for the deleted namespace that are buffered in the ingester will
be persisted as normal, allowing us to support "un-delete" operations
where the system is restored to a the state at which the delete was
issued (rather than loosing the buffered data).

Follow-on work is required to ensure GC drops the orphaned parquet files
after the configured GC time, and optimisations such as not compacting
parquet from soft-deleted namespaces seems like a trivial win.
2023-02-13 12:01:35 +01:00
..
migrations refactor(catalog): soft delete namespace column 2023-02-09 11:35:27 +01:00
sqlite/migrations refactor(catalog): soft delete namespace column 2023-02-09 11:35:27 +01:00
src feat: namespace soft-delete support 2023-02-13 12:01:35 +01:00
.gitignore feat: Initial SQLite catalog schema (#6851) 2023-02-06 22:55:14 +00:00
Cargo.toml feat: Initial SQLite catalog schema (#6851) 2023-02-06 22:55:14 +00:00
README.md docs(various): Improve Readability (#4768) 2022-06-02 18:01:06 +00:00
build.rs feat: allow IOx catalog to setup itself (no SQLx CLI required) (#3584) 2022-01-31 15:07:38 +00:00

README.md

IOx Catalog

This crate contains the code for the IOx Catalog. This includes the definitions of namespaces, their tables, the columns of those tables and their types, what Parquet files are in object storage and delete tombstones. There's also some configuration information that the overall distributed system uses for operation.

To run this crate's tests you'll need Postgres installed and running locally. You'll also need to set the INFLUXDB_IOX_CATALOG_DSN environment variable so that sqlx will be able to connect to your local DB. For example with user and password filled in:

INFLUXDB_IOX_CATALOG_DSN=postgres://<postgres user>:<postgres password>@localhost/iox_shared

You can omit the host part if your postgres is running on the default unix domain socket (useful on macos because, by default, the config installed by brew install postgres doesn't listen to a TCP port):

INFLUXDB_IOX_CATALOG_DSN=postgres:///iox_shared

You'll then need to create the database. You can do this via the sqlx command line.

cargo install sqlx-cli
DATABASE_URL=<dsn> sqlx database create
cargo run -q -- catalog setup
cargo run -- catalog topic update iox-shared

This will set up the database based on the files in ./migrations in this crate. SQLx also creates a table to keep track of which migrations have been run.

NOTE: do not use sqlx database setup, because that will create the migration table in the wrong schema (namespace). Our catalog setup code will do that part by using the same sqlx migration module but with the right namespace setup.

Migrations

If you need to create and run migrations to add, remove, or change the schema, you'll need the sqlx-cli tool. Install with cargo install sqlx-cli if you haven't already, then run sqlx migrate --help to see the commands relevant to migrations.

Tests

To run the Postgres integration tests, ensure the above setup is complete first.

CAUTION: existing data in the database is dropped when tests are run, so you should use a DIFFERENT database name for your test database than your INFLUXDB_IOX_CATALOG_DSN database.

  • Set TEST_INFLUXDB_IOX_CATALOG_DSN=<testdsn> env as above with the INFLUXDB_IOX_CATALOG_DSN env var. The integration tests will pick up this value if set in your .env file.
  • Set TEST_INTEGRATION=1
  • Run cargo test -p iox_catalog

Schema namespace

All iox catalog tables are created in a iox_catalog schema. Remember to set the schema search path when accessing the database with psql.

There are several ways to set the default search path, depending if you want to do it for your session, for the database or for the user.

Setting a default search path for the database or user may interfere with tests (e.g. it may make some test pass when they should fail). The safest option is set the search path on a per session basis. As always, there are a few ways to do that:

  1. you can type set search_path to public,iox_catalog; inside psql.
  2. you can add (1) to your ~/.psqlrc
  3. or you can just pass it as a CLI argument with:
psql 'dbname=iox_shared options=-csearch_path=public,iox_catalog'