We no longer need hacky pointer tricks to de-duplicate delete predicates
when collecting them for catalog checkpoints. This was once required
when the delete predicates didn't implement `Eq` and `Hash` but now it's
all way easier.
`DeletePredicate` is a simpler version of `Predicate` that is based on
IOx `DeleteExpr` instead of the full-blown DataFusion `Expr`. This will
allow us to do a couple of things (in follow-up changes):
- Order and de-duplicate delete predicates
- Normalize predicates
- Infallible serialization
- Smaller memory footprint
Note that this change only affects delete expressions. Query expressions
that are supported via the API are not changed. The query subsystem also
still uses the full-featured expressions/predicates (delete
expressions/predicates are converted to the more powerful DataFusion
version on-the-fly).
Due to the timing of the "persist" lifecycle action and that delete
predicates might arrive at any time + the fact that we don't wanna hold
transaction locks for too long, we should accept delete predicates for
chunks that are currently "persisting" even though that lifecycle action
might fail.
First step towards #2518. Creates the Rust API to communicate delete
predicates between the preserved catalog and the in-memory catalog and
adds tests ensuring that the in-mem catalog produces the wanted errors
as well as correct checkpoints (similar to how this is done for the
parquet file tracking already).
**This does NOT contain the actual preservation!**
We changed from Google timestamp (which use variable-sized integers) to
our own fixed-sized integer timestamps so that the size of the parquet
metadata does not depend on the timestamp. However with the introduction
of compression this is the case anyways (since slightly different
timestamps lead to different compression results) and we need now
derministic timestamps for tests. So there is now point in using our own
timestamp type. Switching back to the variable-sized type also shrinks
the post-compression results a bit.
This makes it clearer which traits and functions users of the preserved
catalog must implement. This also splits the error types into smaller
enums that are easier to understand.
This change should make it easier to implement new functionality (like
capturing delete predicates).
Two reasons:
1. I wanna decouple `parquet_file` from `query` (nearly done, needs a
small follow-up PR).
2. `predicate` will have more and more features (like serialization)
which justifies a new home