`DeletePredicate` is a simpler version of `Predicate` that is based on
IOx `DeleteExpr` instead of the full-blown DataFusion `Expr`. This will
allow us to do a couple of things (in follow-up changes):
- Order and de-duplicate delete predicates
- Normalize predicates
- Infallible serialization
- Smaller memory footprint
Note that this change only affects delete expressions. Query expressions
that are supported via the API are not changed. The query subsystem also
still uses the full-featured expressions/predicates (delete
expressions/predicates are converted to the more powerful DataFusion
version on-the-fly).
Place a dedicated `Expr` type between datafusion and protobuf. While
this type is currently only used during serialization, it will be used
within `Predicate` in a follow-up change to enable de-duplication of
predicates and avoid the double parsing of datafusion expressions (we
are already somewhat parsing them when we check for valid delete
predicates).
This change looks larger than it is. In practice it just separates the
conversion "datafusion => protobuf" into "datafusion => IOx => protobuf"
and "protobuf => datafusion" into "protobuf => IOx => datafusion". So
(apart from the error types) this is functionally the same.
The motivations are:
1. The API uses a SINGLE predicate and adds that to many chunks. With
`Arc<Vec<...>>` you gain nothing, with `Vec<Arc<...>>` the predicate
is only stored once (in many vectors)
2. While we currently add predicates blindly to all chunks, we can be way
smarter in the future and prune out tables, partitions or even single
chunks (based on statistics). With that, it will be rare that many
chunks share the exact same set of predicates.
3. It would be nice if we could de-duplicate predicates when writing them
to the preserved catalog without needing to repeat the pruning
discussed in point 2. This is way easier to implement whan chunks
exists in `Arc`s.
4. As a side-note: the `Arc<Vec<...>>` wasn't really cloned around but
instead was created many time. So the new version should be more
memory efficient out of the box.
Two reasons:
1. I wanna decouple `parquet_file` from `query` (nearly done, needs a
small follow-up PR).
2. `predicate` will have more and more features (like serialization)
which justifies a new home