Commit Graph

64 Commits (936f51013dc765d7e0a19c645f88e7bb97a0aa78)

Author SHA1 Message Date
Marco Neumann 48722783f9
feat: offer metrics for in-mem catalog (#3876)
This can be quite helpful to test certain caching behavior w/o writing
yet-another abstraction layer.
2022-03-01 11:33:54 +00:00
Nga Tran 0e0dc500f6
feat: prepare data to send to querier (#3825)
* feat: changes needed to apply tombstones correctly on the life-cycle ingest bacthes

* refactor: adjust the  design after discussing with Paul

* feat: apply the coming tombstone on all data but persiting one

* chore: fmt

* fix: build on buffer tombstone

* test: delete & write tests for a parition and some cleanup

* feat: No need add processed tombstones for newly created parquet file in the ingester becasue all deletes before that parquet file is created were applied

* chore: cleanup

* feat: intitial implementation for preparing data to send back to the Querier

* feat: full implementation of prepare_data_to_querier

* fix: apply filters for the batches

* chore: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* chore: cleanup

* fix: typos in comments

* fix: typos in comments

* fix: typos in comments

* test: create different scenarios and test them

* chore: fix typos

* test: add tests with deletes

* chore: make pub pub(crate)

* chore: Apply suggestions from code review

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>

* refactor: address review comments

* fix: keep batches in their arrival order

* refactor: not assign unecessary values to enum

* refactor: use bitflags enum

* fix: use bitflags correctly

* chore: Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refactor: avoid using use at the end of the function

* chore: merge main to branch

* fix: fix downgrade versions

* refactor: address review comments

* chore: remove unnecessary comments

* refactor: Make the whole test_utils module test-only and bring paths into module scope

Co-authored-by: Paul Dix <paul@pauldix.net>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
2022-03-01 01:00:45 +00:00
Marco Neumann 6e2470bf5f
feat: create CatalogChunk in querier (#3862)
* feat: `NamespaceRepo::get_by_id`

* feat: create `CatalogChunk` in querier
2022-02-28 17:20:38 +00:00
Marco Neumann b213796c98
feat: sync namespaces in querier (#3865)
* feat: `NamespaceRepo::list`

* feat: sync namespaces in querier

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-28 15:01:28 +00:00
Dom Dwyer 3579252682 refactor: configurable max catalog connections
Allow the maximum number of catalog (postgres) connections to be
specified as part of the catalog configuration.
2022-02-25 11:23:21 +00:00
Dom Dwyer b07f15bec7 refactor: parallel column resolution
A quick change to perform the ColumnRepo::create_or_get() calls in
parallel (up to a maximum of 3 in-flight at any one time) in order to
mitigate the latency of the call and reduce the overall schema
validation call duration.

The in-flight limit is enforced to avoid starving the DB connection pool
of connections.
2022-02-24 21:04:25 +00:00
Paul Dix 8571c132cc
feat: add method to get table persistence information from catalog (#3848) 2022-02-24 16:18:14 +00:00
Carol (Nichols || Goulding) 252ced7adf
feat: Add row count to the parquet_file record in the catalog (#3847)
Fixes #3842.
2022-02-24 15:20:50 +00:00
Marco Neumann d62a052394
feat: extend catalog so we can recover `ParquetChunk`s from it (#3852)
* refactor: less parquet data copying

* feat: `PartitionRepo::get_by_id`

* feat: `TableRepo::get_by_id`

* feat: `ParquetFile::file_size_bytes`

* feat: `ParquetFile::parquet_metadata`

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-24 13:16:15 +00:00
Dom Dwyer aaf8951927 feat: instrument postgres catalog impl
Wraps the postgres implementation of the catalog with a MetricDecorator.

This is slightly intrusive, with the metrics registry being pushed into
the PostgresCatalog type in order to decorate the impls returned when
calling the Catalog::repositories() and Catalog::start_transaction()
methods (rather than being a pure decorator) in order to use static
dispatch and let the compiler optimise away as much overhead as
possible.
2022-02-23 14:34:26 +00:00
Dom Dwyer a0b43323d6 feat: catalog instrumentation decorator
Adds a MetricDecorator type that wraps a Catalog impl, delegating calls
to the underlying impl and recording per-method call latency broken down
by success/error state.
2022-02-23 14:34:26 +00:00
Marco Neumann 10a1670341
test: catalog `maybe_skip_integration!` and `should_panic` (#3834)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-23 13:25:30 +00:00
Paul Dix 276d9b123a
feat: Add min_sequence_number tracking for sequencers in ingester (#3785)
Fixes #3702. This pulls the min sequence tracking into the LifecycleManager. Because the number requires looking at all other partitions in memory, this was the most efficient place to put it. The manager updates the sequencer state after it calls persist. The number is meant to be a lower bound on the sequence number. Issue #3783 will add functionality for the ingester to ignore replayed data that has already been persisted.
2022-02-22 21:53:33 +00:00
Nga Tran a91e2eadc7
feat: apply tombstones to the batches of the ingest life-cycle (#3770)
* feat: changes needed to apply tombstones correctly on the life-cycle ingest bacthes

* refactor: adjust the  design after discussing with Paul

* feat: apply the coming tombstone on all data but persiting one

* chore: fmt

* fix: build on buffer tombstone

* test: delete & write tests for a parition and some cleanup

* feat: No need add processed tombstones for newly created parquet file in the ingester becasue all deletes before that parquet file is created were applied

* chore: cleanup

Co-authored-by: Paul Dix <paul@pauldix.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-22 18:54:21 +00:00
Luke Bond e19609ab7b
feat: routing service protection (#3807)
* chore: db migration for namespace table & column limits

* feat: impl table & column limits in catalog

* chore: improved comment in catalog

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-22 17:26:37 +00:00
dependabot[bot] ad3868ed7c
chore(deps): Bump tokio from 1.16.1 to 1.17.0 (#3814)
* chore(deps): Bump tokio from 1.16.1 to 1.17.0

Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.16.1 to 1.17.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.16.1...tokio-1.17.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build: update workspace-hack

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom Dwyer <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-22 16:27:43 +00:00
Dom Dwyer 2500dc3ac7 fix: add public to schema search path
The _sql_migrations table cannot be created by the `catalog setup`
bootstrap command because it is created before the migrations run, one
of which creates the iox_catalog namespace the catalog operates in.

This change allows the _sql_migrations table to be created in the public
schema to start the migration process, with everything else living in
the iox_catalog schema.
2022-02-21 11:58:55 +00:00
Luke Bond 7969ec682c
feat: catalog now supports dsn-file:// (#3787)
* feat: catalog now supports dsn-file://

* chore: rename fn in catalog PG hotswap code
2022-02-18 16:42:55 +00:00
Dom Dwyer 568a15510e fix: assert on conflicting tombstone creation
If two callers call create_or_get() for a tombstone, providing the same
(table_id, sequencer_id, sequence_number) triplet but a different
predicate / timestamps the catalog MUST NOT silently continue.

As this is unexpected, this behaviour causes a panic.
2022-02-17 16:29:55 +00:00
Dom Dwyer 0cc5c979c6 fix: assert on conflicting partition creation
If two callers call create_or_get() for a partition, providing the same
partition key & table ID, but different sequence numbers the catalog
MUST NOT continue silently.

As this is unexpected, this behaviour causes a panic.
2022-02-17 16:29:55 +00:00
Dom Dwyer 0796bd079b test(iox_catalog): assert no spurious writes
This commit introduces test cases that ensure that for each of the
tombstone & partition repos:

    * create_or_get() is idempotent
    * create_or_get() does not silently drop conflicting writes

The former covers the expected use case: callers issuing potentially
multiple calls to create_or_get() with the same args, causing the
catalog impl to transparently turn the subsequent calls into NOPs.

The latter illustrates an issue where multiple calls to create_or_get()
with differing arguments is silently accepted by the catalog, causing
both callers to believe they have committed the (differing) details they
provided. This test is expected to pass but fails in this commit.
2022-02-17 16:29:55 +00:00
Dom Dwyer 4d54f8b42c refactor: remove migration create schema 2022-02-17 14:41:32 +00:00
Dom Dwyer 44e9eaf92b test(iox_catalog): isolated catalog tests
This commit changes the iox_catalog test harness so that each test is
run in an independent, randomly generated schema to avoid concurrent
tests interfering with each other.

Each test creates a randomly-named schema, grants permissions to the new
schema, runs the full migration stack against the new schema, and then
executes the code under test.
2022-02-17 14:19:01 +00:00
Dom Dwyer 3b378418f7 refactor: do not specify schema in migrations
Allow the caller to set the Postgres schema a migration should be
applied to, rather than restricting the migration to a specific,
hard-coded schema.

BREAKING CHANGE: manually adds a new migration that precedes the
existing migration to ensure the iox_catalog schema exists before
applying the migration. You'll probably have to drop any existing
databases and migrate from scratch:

    sqlx database drop; sqlx database create;
2022-02-17 14:15:58 +00:00
Marko Mikulicic 4655d4b4f1
docs: Improve iox_catalog testing docs (#3760)
* docs: Improve iox_catalog testing docs

* fix: Update iox_catalog/README.md

Co-authored-by: Dom <dom@itsallbroken.com>

Co-authored-by: Dom <dom@itsallbroken.com>
2022-02-16 10:23:53 +00:00
Luke Bond a66e29e5b3
chore: port sqlx-hotswap-pool over from conductor (#3750)
* chore: port sqlx-hotswap-pool over from conductor

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>

* chore: workspace hack fixes

* fix: unique schema per test db connection

* fix: adjust search path in catalog pg tests to see if it fixes test schema issue

* fix: actually fixed sqlx hotswap pool test

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-15 16:18:36 +00:00
Marco Neumann 9e7a27b344
fix: default Kafka topic name is `iox-shared` (#3747)
Do NOT use underscores in the Kafka topic because this is not supported
by Kafka. This was initially fixed by #3555 but reverted by #3623.
2022-02-15 12:34:46 +00:00
Marco Neumann c6e374a025
feat: allow catalog access w/o a transaction (#3735)
* feat: allow catalog access w/o a transaction

Now the caller has the full control if they want to use a transaction or
not.

* fix: remove non-transaction-safe `create_many`

* fix: remove unnecessary transactions
2022-02-15 10:15:36 +00:00
Nga Tran d3bd03e37a
feat: Support Projection Pushdown for a QueryableBatch (#3712)
* feat: projection pushdown for QueryableBatch

* chore: clean up and remove unwrap

* fix: Add Sync to a Snafu source to have the code compile

* chore: cleanup and add comments for tests

* refactor: Add tests for scanning non existing columns and fix related bugs

* chore: modify comment to trigger auto check in github work
2022-02-10 19:29:21 +00:00
Marco Neumann 5de4d6203f
refactor: catalog transaction (#3660)
* refactor: catalog Unit of Work (= transaction)

Setup an inteface to handle Units of Work within our catalog. Previously
both the Postgres and the in-mem backend used "mini-transactions on
demand". Now the caller has a clear way to establish boundaries and
gets read and write isolation. A single `Arc<dyn Catalog>` can create as
many `Box<dyn UnitOfWork>` as you like, but note that depending on the
backend you may not scale infinitely (postgres will likely impose
certain limits and the in-mem backend limits concurrency to 1 to keep
things simple).

* docs: improve wording

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refactor: rename Unit of Work to Transaction

* test: improve `test_txn_isolation`

* feat: clearify transaction drop semantics

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-08 13:38:33 +00:00
Paul Dix 245676ff5d
feat: add method to catalog to get its partition info by id (#3650)
This is a little bit specific for how things are structured in IngesterData right now. Easy enough to take back out later if/when we restructure.
2022-02-07 20:57:25 +00:00
Nga Tran 17fbeaaade
feat: insert the persisted info into the catalog in one transaction (#3636)
* feat: add ProcessedTombstoneRepo

* feat: add function add_parquet_file_with_tombstones

* fix: remove unecessary use

* feat: handling transaction when adding parquet file and its processed tombstones

* feat: tests update catalog for parquet file and processed tombstones

* fix: make add parquet file & its processed tombstones fully transactional

* chore: cleanup

* test: add integration tests for new catalog update functions

* chore: remove catalog_update.rs

* chore: cleanup

* fix: assert the right values

* fix: create unique namespace

* fix: support non transaction create_many

* test: remove tests that do not work in a transaction

* fix: one more case with unique namespace

* chore: more verification around for better understanding why certain tests fail

* fix: compare difference rather than absolute becasue the DB already has data

* fix: fix the argument provided to SQL

* fix: return non-empty processed tombstones

* fix: insert the right parquet file

* chore: remove unsed file

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-07 14:44:15 +00:00
Dom Dwyer 0fd122e365 refactor: "inf" retention const
Adds the iox_catalog::INFINITE_RETENTION_POLICY constant.
2022-02-04 15:35:33 +00:00
Paul Dix ce46bbaada
feat: wire up the write buffer to the ingester process (#3533)
This adds the scaffolding for the ingester server to consume data from Kafka. This ingests data in an in memory structure while creating records in the catalog for any partitions that don't yet exist.

I've removed catalog_update.rs in ingester for now. That was mostly a placeholder and will be going in a combination of handler.rs and data.rs on my next PR which will have some primitive lifecycle wired up.

There's one ugly bit here where the DML write is cloned because it's getting borrowed to output spans and metrics. I'll need to follow up with a refactor to make it so that the DML write's tables can be consumed without it gumming up the metrics stuff.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-03 11:47:28 +00:00
Dom Dwyer f9f9beac36 refactor: get_schema_by_name no Option
Fetching a NamespaceSchema always succeeds and never returns None.
2022-02-02 13:04:53 +00:00
Carol (Nichols || Goulding) ea18c71e6d
feat: Create an object store path for a new parquet file 2022-01-31 10:36:32 -05:00
Marco Neumann 74c251febb
feat: allow IOx catalog to setup itself (no SQLx CLI required) (#3584)
* feat: allow IOx catalog to setup itself (no SQLx CLI required)

* refactor: use SQLx macro instead of hand-rolled build script
2022-01-31 15:07:38 +00:00
Dom 32d7c4cbfe
refactor: remove InfluxColumnType::IOx (#3565)
* refactor: remove InfluxColumnType::IOx

Remove unused column variant - see #3554 for context.

* refactor: reserve SEMANTIC_TYPE_IOX name in proto

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 21:15:36 +00:00
Dom 9201023ea4
feat: schema validation for MutableBuffer instances (#3554)
* refactor: Debug bounds on Catalog trait

* feat: validate MutableBatch schema

Changes the schema validation code to validate MutableBatch instances
(coming from a pre-parsed LP write, and non-LP-based writes) instead of
parsed LP lines.

* refactor: Send bound on boxed errors

* refactor: clippy

Allow assert_eq!(bool, bool) for readability.

* refactor: no PartialEq<MB Column> for ColumnSchema

Remove the PartialEq<mutable_buffer::Column> for ColumnSchema - it's
definitely more readable as a method call.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 20:55:18 +00:00
Dom ce568ab447
build(iox_catalog): remove unused dependencies (#3552)
* build: don't pull in all of tokio

We already specify the tokio features we need so "full" (all features)
is not necessary.

* build: remove chrono dependency

Appears unused.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 15:14:42 +00:00
Paul Dix bb893510a0 feat: Add scaffolding for ingester server
* Adds a new ingester command to start an ingester server
* Moves previous ingester server over to handler
* Skeleton for gRPC and HTTP handlers
2022-01-21 18:02:19 -05:00
Paul Dix bfa54033bd refactor: Clean up the Catalog API
This updates the catalog API to make it easier to work with for consumers. I also found a bug in the MemCatalog implementation while refactoring the tests to work with the new API definition. Consumers will now be able to Arc wrap the catalog and use it across awaits.
2022-01-21 16:01:13 -05:00
Paul Dix bfc085c20d feat: add get kafka_topic by name to catalog 2022-01-19 17:32:23 -05:00
Paul Dix 860e5a30ca refactor: update ingester to get sequencer record and not attempt to create 2022-01-19 17:15:10 -05:00
Paul Dix 172d75c6d7 feat: add sequencer get_by_kafka_topic_id_and_partition to catalog 2022-01-19 16:45:06 -05:00
Paul Dix 28db06297f fix: clear postgres schema in test wasn't deleting parquet_file 2022-01-19 16:30:54 -05:00
Paul Dix 41038721e1 feat: Add parquet file records to iox_catalog
* Adds ParquetFile and scaffolding to IOx catalog
* Changed the file_location in parquet_file to object_store_id which is a uuid
2022-01-19 14:14:54 -05:00
Paul Dix f36d66deb7 feat: Add Tombstone to Catalog
* Adds TombstoneId and Tombstone to the iox_catalog with associated interfaces
* Adds SequenceNumber new type for use with Tombstone
* Adds Timestamp new type for use with Tombstone
* Adds constraint to the Postgres schema to enforce tombstone uniqueness by table_id, sequencer_id, and sequence_number
2022-01-18 18:17:21 -05:00
Paul Dix 8067316c33 fix: typo in partitionid description 2022-01-18 17:33:05 -05:00
Paul Dix b1510675ae refactor: add new type for Kafka Partition in Catalog 2022-01-18 15:13:18 -05:00