Commit Graph

282 Commits (5177e69514b5b92ca3241590574c7fd93aedc0e9)

Author SHA1 Message Date
kodiakhq[bot] d72a494198
Merge branch 'main' into crepererum/in_mem_expr_part5 2021-10-05 16:20:24 +00:00
Marco Neumann b8aa4c33ce refactor: use protobuf bytes for transaction UUIDs 2021-10-05 12:27:48 +02:00
Marco Neumann bb7a27e5ed refactor: use proper sets during delete predicate collection
We no longer need hacky pointer tricks to de-duplicate delete predicates
when collecting them for catalog checkpoints. This was once required
when the delete predicates didn't implement `Eq` and `Hash` but now it's
all way easier.
2021-10-05 10:37:34 +02:00
Marco Neumann 28ccf2a8c3 refactor: `TransactionHandle::delete_predicate` cannot fail 2021-10-05 09:41:46 +02:00
Marco Neumann 10c1a72402 refactor: remove unused fields from `DeletePredicate` 2021-10-05 09:29:24 +02:00
Marco Neumann 97881079e8 refactor: make `ChunkOrder` non-zero
This will make it easier to handle missing values.

Helps with #2633.
2021-10-04 17:49:12 +02:00
Marco Neumann 75ac6e8646 refactor: make `DeletePredicate::range` non-optional 2021-10-04 16:36:20 +02:00
Marco Neumann d1835a3eee fix: doc links 2021-10-04 16:36:20 +02:00
Marco Neumann 5a5a929b9e refactor: introduce `DeletePredicate`
`DeletePredicate` is a simpler version of `Predicate` that is based on
IOx `DeleteExpr` instead of the full-blown DataFusion `Expr`. This will
allow us to do a couple of things (in follow-up changes):

- Order and de-duplicate delete predicates
- Normalize predicates
- Infallible serialization
- Smaller memory footprint

Note that this change only affects delete expressions. Query expressions
that are supported via the API are not changed. The query subsystem also
still uses the full-featured expressions/predicates (delete
expressions/predicates are converted to the more powerful DataFusion
version on-the-fly).
2021-10-04 16:36:20 +02:00
Edd Robinson e72f7e958c test: update expected results 2021-10-04 12:20:21 +01:00
Andrew Lamb 7316f3407a
fix: Reduce log noise when no files are deleted (#2671)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-09-30 08:55:30 +00:00
Carol (Nichols || Goulding) 92583aee82 fix: Remove streaming API since we're not streaming anyway 2021-09-29 08:19:32 -04:00
Carol (Nichols || Goulding) d05528bcfd refactor: Use s3_request for put requests
Which meant we also needed to change the byte stream to be a closure
that can generate a byte stream
2021-09-29 08:19:32 -04:00
Raphael Taylor-Davies 86cee568d5
feat: use upstream pbjson (#2650)
* feat: use upstream pbjson

* chore: fmt
2021-09-28 16:29:26 +00:00
kodiakhq[bot] b16e7ea91a
Merge branch 'main' into crepererum/issue2518c 2021-09-22 16:09:04 +00:00
Marco Neumann d7b697dfe9 chore: remove unused `object_store` => `tracker` dep 2021-09-22 11:13:40 +02:00
Marco Neumann 981ee0c6df refactor: accept unknown chunks in persisted delete predicates
Due to the timing of the "persist" lifecycle action and that delete
predicates might arrive at any time + the fact that we don't wanna hold
transaction locks for too long, we should accept delete predicates for
chunks that are currently "persisting" even though that lifecycle action
might fail.
2021-09-22 09:29:50 +02:00
Marco Neumann 6682178d6f feat: teach preserved catalog to handle delete predicates 2021-09-20 15:51:14 +02:00
Marco Neumann cef5aeee52 refactor: introduce `ChunkId` type 2021-09-20 13:10:41 +02:00
Marco Neumann acf698c366 fix: delete predicate sorting 2021-09-20 10:48:32 +02:00
Marco Neumann 0c5ba3786b refactor: rename closure to make syntax a bit clearer 2021-09-20 10:48:32 +02:00
Marco Neumann 4c4fd59724 docs: extend comment about (not) cleanup up delete predicates 2021-09-20 10:48:32 +02:00
Marco Neumann 492d991f49 feat: delete catalog pres. catalog <=> in-mem catalog API
First step towards #2518. Creates the Rust API to communicate delete
predicates between the preserved catalog and the in-memory catalog and
adds tests ensuring that the in-mem catalog produces the wanted errors
as well as correct checkpoints (similar to how this is done for the
parquet file tracking already).

**This does NOT contain the actual preservation!**
2021-09-20 10:48:32 +02:00
Marco Neumann 831e55d79e refactor: make error messages more precise 2021-09-20 09:42:55 +02:00
Marco Neumann 9c80d32af5 refactor: use normal google timestamps in parquet metadata again
We changed from Google timestamp (which use variable-sized integers) to
our own fixed-sized integer timestamps so that the size of the parquet
metadata does not depend on the timestamp. However with the introduction
of compression this is the case anyways (since slightly different
timestamps lead to different compression results) and we need now
derministic timestamps for tests. So there is now point in using our own
timestamp type. Switching back to the variable-sized type also shrinks
the post-compression results a bit.
2021-09-20 09:34:03 +02:00
Marco Neumann afc507ae14 feat: compress encoded parquet metadata
Depending on the number of columns, this should safe between 60% and
75%.
2021-09-20 09:33:18 +02:00
Marco Neumann 2820db5583 refactor: split preserved catalog `api` into `core` and `interface`
This makes it clearer which traits and functions users of the preserved
catalog must implement. This also splits the error types into smaller
enums that are easier to understand.

This change should make it easier to implement new functionality (like
capturing delete predicates).
2021-09-16 10:30:11 +02:00
Raphael Taylor-Davies c66095cad1
feat: remove metrics crate (#2552) 2021-09-15 19:43:33 +00:00
kodiakhq[bot] de732b4273
Merge branch 'main' into crepererum/parquet_file_wo_query 2021-09-15 07:15:19 +00:00
Marco Neumann 509c07330d refactor: decouple `parquet_file` from `query` 2021-09-14 18:26:16 +02:00
kodiakhq[bot] d60aa5940b
Merge branch 'main' into crepererum/chunk_order_type 2021-09-14 16:25:17 +00:00
Marco Neumann bfaba78dc3 refactor: move `predicate` into its own crate
Two reasons:

1. I wanna decouple `parquet_file` from `query` (nearly done, needs a
   small follow-up PR).
2. `predicate` will have more and more features (like serialization)
   which justifies a new home
2021-09-14 17:13:02 +02:00
Marco Neumann becef1c75f refactor: introduce `ChunkOrder` type 2021-09-14 17:10:23 +02:00
Marco Neumann 1d8edd4683 fix: metadata size increased 2021-09-14 13:03:26 +02:00
Marco Neumann 45cb00d8c0 refactor: track chunk order in chunks 2021-09-14 13:00:55 +02:00
Marco Neumann 4769b67d14 feat: API-level code to prune old transaction from catalog 2021-09-14 10:26:38 +02:00
Marco Neumann f93984cd94 refactor: clarify wording
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-09-14 09:43:55 +02:00
Marco Neumann e7edb65b1d feat: show number of stripped bytes in catalog dump 2021-09-14 09:43:55 +02:00
Raphael Taylor-Davies 44918e4afc
feat: migrate chunk metrics (#2491)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-09-09 16:02:16 +00:00
Marco Neumann 4a863993ec feat: "dump catalog" debug CLI 2021-09-02 08:08:20 +02:00
Marco Neumann 581ee64049 feat: add functions to dump catalog data to text 2021-09-02 08:07:07 +02:00
Marco Neumann 06c941d798 refactor: split up `make_record_batch` 2021-09-01 11:26:05 +02:00
Marco Neumann 6ce586a2ac docs: add docstrings to `PreservedCatalog` members 2021-09-01 11:26:05 +02:00
Marco Neumann 70a5ffeae7 test: allow creation of deterministic chunks and transactions 2021-09-01 11:26:05 +02:00
Marco Neumann 06833110ab test: allow creation of less complex parquet chunks 2021-09-01 11:26:05 +02:00
Marco Neumann 27248850e5 refactor: use `byte::Bytes` for metadata in protobuf messages
That simplifies printing a bit since we `Vec<u8>` prints quite badly.
2021-09-01 11:26:05 +02:00
Marco Neumann a312f81bf2 refactor: move `storage_testing` to `storage::tests` 2021-08-27 15:59:59 +02:00
Marco Neumann a2efe3299d refactor: restructure catalog code in `parquet_file`
No functional change (except for slightly changing error messages). This
will make it easier to add more functionality.
2021-08-27 15:06:31 +02:00
Carol (Nichols || Goulding) 7ca177978e fix: Add missing await from a logical merge conflict 2021-08-26 09:27:16 -04:00
Carol (Nichols || Goulding) 18ba3b5c59 feat: Create database directories with a generation ID 2021-08-26 09:14:22 -04:00