* feat: Add a way to run ingester with an in-memory catalog from the CLI
If you set the --catalog-dsn string to "mem", rather than using that as
a Postgres connection URL, create an in-memory catalog.
Planning on using this in tests, so not documenting.
* fix: Set default topic to the same value as SHARED_KAFKA_TOPIC
Namely, both should use an underscore. I don't think there's a way to
directly share these values between a constant and an annotation.
* feat: Add a flight API (handshake only) to ingester
* fix: Create partitions if using file-based write buffer
* fix: Change the server fixture to handle ingester server type
For now, the ingester doesn't implement the deployment API. Not sure if
it should or not.
* feat: Start implementing ingester do_get, namely decoding the query
Skip serialization of the predicate for the moment.
* refactor: Rename ingest protos to ingester to match crate name
* refactor: Rename QueryResults to QueryData
* feat: Move ingester flight client to new querier crate
* fix: Off by one error, different starting indexes in sequencers
* fix: Create new CLI argument to pick the catalog type
* fix: Create a CLI option to set the number of topics to auto-create in the write buffer
* fix: Check the arrow flight service's health to tell that the ingester gRPC is up
* fix: Set postgres as the default catalog type
* fix: Return an error rather than panicking if CLI args aren't right
* feat: add ProcessedTombstoneRepo
* feat: add function add_parquet_file_with_tombstones
* fix: remove unecessary use
* feat: handling transaction when adding parquet file and its processed tombstones
* feat: tests update catalog for parquet file and processed tombstones
* fix: make add parquet file & its processed tombstones fully transactional
* chore: cleanup
* test: add integration tests for new catalog update functions
* chore: remove catalog_update.rs
* chore: cleanup
* fix: assert the right values
* fix: create unique namespace
* fix: support non transaction create_many
* test: remove tests that do not work in a transaction
* fix: one more case with unique namespace
* chore: more verification around for better understanding why certain tests fail
* fix: compare difference rather than absolute becasue the DB already has data
* fix: fix the argument provided to SQL
* fix: return non-empty processed tombstones
* fix: insert the right parquet file
* chore: remove unsed file
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: remove InfluxColumnType::IOx
Remove unused column variant - see #3554 for context.
* refactor: reserve SEMANTIC_TYPE_IOX name in proto
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
1. Remove `new_empty` logic. It's a leftover from the time when the
`PreservedCatalog` owned the in-memory catalog.
2. Make `db_name` a part of the `PreservedCatalogConfig`.
Store the "maximum persisted timestamp" instead of the "minimum
unpersisted timestamp". This avoids the need to calculate the next
timestamp from the current one (which was done via "max TS + 1ns").
The old calculation was prone to overflow panics. Since the
timestamps in this calculation originate from user-provided data (and
not the wall clock), this was an easy DoS vector that could be triggered
via the following line protocol:
```text
table_1 foo=1 <i64::MAX>
```
which is
```text
table_1 foo=1 9223372036854775807
```
Bonus points: the timestamp persisted in the partition
checkpoints is now the very same that was used by the split query during
persistence. Consistence FTW!
Fixes#2225.
We no longer need hacky pointer tricks to de-duplicate delete predicates
when collecting them for catalog checkpoints. This was once required
when the delete predicates didn't implement `Eq` and `Hash` but now it's
all way easier.
`DeletePredicate` is a simpler version of `Predicate` that is based on
IOx `DeleteExpr` instead of the full-blown DataFusion `Expr`. This will
allow us to do a couple of things (in follow-up changes):
- Order and de-duplicate delete predicates
- Normalize predicates
- Infallible serialization
- Smaller memory footprint
Note that this change only affects delete expressions. Query expressions
that are supported via the API are not changed. The query subsystem also
still uses the full-featured expressions/predicates (delete
expressions/predicates are converted to the more powerful DataFusion
version on-the-fly).
Due to the timing of the "persist" lifecycle action and that delete
predicates might arrive at any time + the fact that we don't wanna hold
transaction locks for too long, we should accept delete predicates for
chunks that are currently "persisting" even though that lifecycle action
might fail.
First step towards #2518. Creates the Rust API to communicate delete
predicates between the preserved catalog and the in-memory catalog and
adds tests ensuring that the in-mem catalog produces the wanted errors
as well as correct checkpoints (similar to how this is done for the
parquet file tracking already).
**This does NOT contain the actual preservation!**
We changed from Google timestamp (which use variable-sized integers) to
our own fixed-sized integer timestamps so that the size of the parquet
metadata does not depend on the timestamp. However with the introduction
of compression this is the case anyways (since slightly different
timestamps lead to different compression results) and we need now
derministic timestamps for tests. So there is now point in using our own
timestamp type. Switching back to the variable-sized type also shrinks
the post-compression results a bit.
This makes it clearer which traits and functions users of the preserved
catalog must implement. This also splits the error types into smaller
enums that are easier to understand.
This change should make it easier to implement new functionality (like
capturing delete predicates).