Commit Graph

29 Commits (8e9dd227534f57e607c7d6b89fad055bde6e02fe)

Author SHA1 Message Date
Marco Neumann 150e7c222f chore: enable linting for `predicate` crate 2021-10-04 17:58:40 +02:00
Marco Neumann d8dddc358f
fix: typo in error message
Co-authored-by: Nga Tran <ntran@influxdata.com>
2021-10-04 16:56:43 +02:00
Marco Neumann 75ac6e8646 refactor: make `DeletePredicate::range` non-optional 2021-10-04 16:36:20 +02:00
Marco Neumann d1835a3eee fix: doc links 2021-10-04 16:36:20 +02:00
Marco Neumann 5a5a929b9e refactor: introduce `DeletePredicate`
`DeletePredicate` is a simpler version of `Predicate` that is based on
IOx `DeleteExpr` instead of the full-blown DataFusion `Expr`. This will
allow us to do a couple of things (in follow-up changes):

- Order and de-duplicate delete predicates
- Normalize predicates
- Infallible serialization
- Smaller memory footprint

Note that this change only affects delete expressions. Query expressions
that are supported via the API are not changed. The query subsystem also
still uses the full-featured expressions/predicates (delete
expressions/predicates are converted to the more powerful DataFusion
version on-the-fly).
2021-10-04 16:36:20 +02:00
dependabot[bot] d1f5209869
chore(deps): bump arrow from 5.4.0 to 5.5.0
Bumps [arrow](https://github.com/apache/arrow-rs) from 5.4.0 to 5.5.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/5.5.0/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/5.4.0...5.5.0)

---
updated-dependencies:
- dependency-name: arrow
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-04 08:55:38 +00:00
Marco Neumann 51246e3c8d docs: remove outdated comment 2021-10-04 10:44:30 +02:00
Marco Neumann ec0f256ad7 refactor: `Expr` => `DeleteExpr` 2021-10-04 10:28:09 +02:00
Marco Neumann a0bbbdb197 refactor: use dedicated in-memory `Expr` type during IO
Place a dedicated `Expr` type between datafusion and protobuf. While
this type is currently only used during serialization, it will be used
within `Predicate` in a follow-up change to enable de-duplication of
predicates and avoid the double parsing of datafusion expressions (we
are already somewhat parsing them when we check for valid delete
predicates).

This change looks larger than it is. In practice it just separates the
conversion "datafusion => protobuf" into "datafusion => IOx => protobuf"
and "protobuf => datafusion" into "protobuf => IOx => datafusion". So
(apart from the error types) this is functionally the same.
2021-10-04 09:59:14 +02:00
Nga Tran 154dd4460e refactor: address review comments 2021-10-01 16:37:22 -04:00
Nga Tran ee94e9038a test: finalize codin up delete http endpoints and end-to-end tests 2021-10-01 12:15:00 -04:00
Nga Tran 53a45d89b7 feat: support HTTP Delete Endpoints 2021-09-30 18:09:17 -04:00
Andrew Lamb a55a21c644
chore: Update datafusion (#2635)
* chore: Update datafusion and sqlparser

* fix: remove STACK_SIZE workaround

* chore: update datafusion_util

* chore: update predicate

* chore: update query_tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-09-27 14:13:19 +00:00
Nga Tran e80d8aa391 chore: update latest datafusion and use its lit_timestamp_nano 2021-09-22 11:30:42 -04:00
Nga Tran 2f371b6f79 refactor: address review comments 2021-09-21 14:46:24 -04:00
Nga Tran 85989cc8a3 test: add more delete tests and test scenarios 2021-09-20 18:18:08 -04:00
kodiakhq[bot] c7e6fffaaa
Merge branch 'main' into ntran/delete_scan 2021-09-20 13:29:47 +00:00
Marco Neumann acf698c366 fix: delete predicate sorting 2021-09-20 10:48:32 +02:00
Nga Tran 364d245eae feat: apply negated delete predicates during scan 2021-09-17 16:20:42 -04:00
Nga Tran 60a866ddcb refactor: merge delete predicates into select predicate 2021-09-17 07:52:33 -04:00
Marco Neumann ec943081c7 refactor: `Arc<Vec<...>>` => `Vec<Arc<...>>` for del predicates
The motivations are:

1. The API uses a SINGLE predicate and adds that to many chunks. With
   `Arc<Vec<...>>` you gain nothing, with `Vec<Arc<...>>` the predicate
   is only stored once (in many vectors)
2. While we currently add predicates blindly to all chunks, we can be way
   smarter in the future and prune out tables, partitions or even single
   chunks (based on statistics). With that, it will be rare that many
   chunks share the exact same set of predicates.
3. It would be nice if we could de-duplicate predicates when writing them
   to the preserved catalog without needing to repeat the pruning
   discussed in point 2. This is way easier to implement whan chunks
   exists in `Arc`s.
4. As a side-note: the `Arc<Vec<...>>` wasn't really cloned around but
   instead was created many time. So the new version should be more
   memory efficient out of the box.
2021-09-16 17:16:09 +02:00
kodiakhq[bot] 33cd1cffad
Merge branch 'main' into ntran/delete_read 2021-09-16 13:22:50 +00:00
Nga Tran 61e1eac135 fix: fix the cases of multi[le expressions in delete predicate 2021-09-15 17:00:21 -04:00
Marco Neumann 72021e010f test: test unsupported cases of predicate serialization 2021-09-15 18:18:05 +02:00
Marco Neumann c5ebf4a2e6 refactor: remove unused predicate serialization variants 2021-09-15 18:05:11 +02:00
Marco Neumann 44eb3b994d feat: `Predicate` serialization
Closes #2493.
2021-09-15 16:37:25 +02:00
Nga Tran 63cc7b3fb0 test: more tests to discover what still need to be done 2021-09-14 17:57:30 -04:00
Nga Tran f4f140d3b7 chore: merge main to branch 2021-09-14 13:25:32 -04:00
Marco Neumann bfaba78dc3 refactor: move `predicate` into its own crate
Two reasons:

1. I wanna decouple `parquet_file` from `query` (nearly done, needs a
   small follow-up PR).
2. `predicate` will have more and more features (like serialization)
   which justifies a new home
2021-09-14 17:13:02 +02:00