Commit Graph

350 Commits (ca331503a533b3813673480e027c2c9e47ef17b9)

Author SHA1 Message Date
Marco Neumann 1523e0edcd refactor: clean up preserved catalog interface
1. Remove `new_empty` logic. It's a leftover from the time when the
   `PreservedCatalog` owned the in-memory catalog.
2. Make `db_name` a part of the `PreservedCatalogConfig`.
2021-10-13 13:58:11 +02:00
Raphael Taylor-Davies 8414e6edbb
feat: migrate preserved catalog to TimeProvider (#2722) (#2808)
* feat: migrate preserved catalog to TimeProvider (#2722)

* fix: deterministic catalog prune tests

* fix: failing test

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-10-12 14:43:05 +00:00
Raphael Taylor-Davies 3dfe400e6b
feat: migrate write path to TimeProvider (#2722) (#2807) 2021-10-12 12:09:08 +00:00
Raphael Taylor-Davies b39e01f7ba
feat: migrate PersistenceWindows to TimeProvider (#2722) (#2798) 2021-10-11 20:40:00 +00:00
Raphael Taylor-Davies 06c2c23322
refactor: create PreservedCatalogConfig struct (#2793)
* refactor: create PreservedCatalogConfig struct

* chore: fmt

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-10-11 15:43:05 +00:00
Carol (Nichols || Goulding) 5da2f7b1b0
Merge branch 'main' into cn/less-database-name 2021-10-11 10:35:42 -04:00
Raphael Taylor-Davies afe34751e7
refactor: split out schema crate (#2781)
* refactor: split out schema crate

* chore: fix doc
2021-10-11 09:45:08 +00:00
Carol (Nichols || Goulding) 8407735e00 fix: Pass the database name into PreservedCatalog 2021-10-08 15:25:10 -04:00
Carol (Nichols || Goulding) 276aef69c9 refactor: Move PreservedCatalog test helper functions to test helpers and use them more 2021-10-08 15:25:10 -04:00
Carol (Nichols || Goulding) 3aff4fcb07 refactor: Extract test helper functions for common catalog operations
This will make the next change easier, and I think it makes the tests
easier to read.
2021-10-08 15:25:10 -04:00
kodiakhq[bot] 559a7e0221
Merge branch 'main' into cn/chunk-addr-smaller 2021-10-08 17:26:20 +00:00
Carol (Nichols || Goulding) fbe76935f4 fix: Remove some calls to iox_object_store.database_name 2021-10-08 09:50:14 -04:00
Marco Neumann 64bda1fc08 feat: improve `Debug`/`Display` for test `ChunkId`s 2021-10-08 13:55:56 +02:00
Marco Neumann d3de6bb6e4 refactor: `max_persisted_timestamp` => `flush_timestamp`
There might be data left before this timestamp that wasn't persisted
(e.g. incoming data while the persistence was running).
2021-10-08 12:36:23 +02:00
Marco Neumann 63a932fa37 refactor: "min unpersisted ts" => "max persisted ts"
Store the "maximum persisted timestamp" instead of the "minimum
unpersisted timestamp". This avoids the need to calculate the next
timestamp from the current one (which was done via "max TS + 1ns").

The old calculation was prone to overflow panics. Since the
timestamps in this calculation originate from user-provided data (and
not the wall clock), this was an easy DoS vector that could be triggered
via the following line protocol:

```text
table_1 foo=1 <i64::MAX>
```

which is

```text
table_1 foo=1 9223372036854775807
```

Bonus points: the timestamp persisted in the partition
checkpoints is now the very same that was used by the split query during
persistence. Consistence FTW!

Fixes #2225.
2021-10-08 11:52:49 +02:00
kodiakhq[bot] 7d6be3f500
Merge branch 'main' into crepererum/issue2748 2021-10-07 09:04:18 +00:00
Marco Neumann 63d74be490 refactor: make `ChunkId` a UUID 2021-10-07 10:23:27 +02:00
Marco Neumann 2a52fd90d9 fix: transaction pruning logic for "nothing to do" 2021-10-07 10:14:42 +02:00
kodiakhq[bot] d72a494198
Merge branch 'main' into crepererum/in_mem_expr_part5 2021-10-05 16:20:24 +00:00
Marco Neumann b8aa4c33ce refactor: use protobuf bytes for transaction UUIDs 2021-10-05 12:27:48 +02:00
Marco Neumann bb7a27e5ed refactor: use proper sets during delete predicate collection
We no longer need hacky pointer tricks to de-duplicate delete predicates
when collecting them for catalog checkpoints. This was once required
when the delete predicates didn't implement `Eq` and `Hash` but now it's
all way easier.
2021-10-05 10:37:34 +02:00
Marco Neumann 28ccf2a8c3 refactor: `TransactionHandle::delete_predicate` cannot fail 2021-10-05 09:41:46 +02:00
Marco Neumann 10c1a72402 refactor: remove unused fields from `DeletePredicate` 2021-10-05 09:29:24 +02:00
Marco Neumann 97881079e8 refactor: make `ChunkOrder` non-zero
This will make it easier to handle missing values.

Helps with #2633.
2021-10-04 17:49:12 +02:00
Marco Neumann 75ac6e8646 refactor: make `DeletePredicate::range` non-optional 2021-10-04 16:36:20 +02:00
Marco Neumann d1835a3eee fix: doc links 2021-10-04 16:36:20 +02:00
Marco Neumann 5a5a929b9e refactor: introduce `DeletePredicate`
`DeletePredicate` is a simpler version of `Predicate` that is based on
IOx `DeleteExpr` instead of the full-blown DataFusion `Expr`. This will
allow us to do a couple of things (in follow-up changes):

- Order and de-duplicate delete predicates
- Normalize predicates
- Infallible serialization
- Smaller memory footprint

Note that this change only affects delete expressions. Query expressions
that are supported via the API are not changed. The query subsystem also
still uses the full-featured expressions/predicates (delete
expressions/predicates are converted to the more powerful DataFusion
version on-the-fly).
2021-10-04 16:36:20 +02:00
Edd Robinson e72f7e958c test: update expected results 2021-10-04 12:20:21 +01:00
Andrew Lamb 7316f3407a
fix: Reduce log noise when no files are deleted (#2671)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-09-30 08:55:30 +00:00
Carol (Nichols || Goulding) 92583aee82 fix: Remove streaming API since we're not streaming anyway 2021-09-29 08:19:32 -04:00
Carol (Nichols || Goulding) d05528bcfd refactor: Use s3_request for put requests
Which meant we also needed to change the byte stream to be a closure
that can generate a byte stream
2021-09-29 08:19:32 -04:00
Raphael Taylor-Davies 86cee568d5
feat: use upstream pbjson (#2650)
* feat: use upstream pbjson

* chore: fmt
2021-09-28 16:29:26 +00:00
kodiakhq[bot] b16e7ea91a
Merge branch 'main' into crepererum/issue2518c 2021-09-22 16:09:04 +00:00
Marco Neumann d7b697dfe9 chore: remove unused `object_store` => `tracker` dep 2021-09-22 11:13:40 +02:00
Marco Neumann 981ee0c6df refactor: accept unknown chunks in persisted delete predicates
Due to the timing of the "persist" lifecycle action and that delete
predicates might arrive at any time + the fact that we don't wanna hold
transaction locks for too long, we should accept delete predicates for
chunks that are currently "persisting" even though that lifecycle action
might fail.
2021-09-22 09:29:50 +02:00
Marco Neumann 6682178d6f feat: teach preserved catalog to handle delete predicates 2021-09-20 15:51:14 +02:00
Marco Neumann cef5aeee52 refactor: introduce `ChunkId` type 2021-09-20 13:10:41 +02:00
Marco Neumann acf698c366 fix: delete predicate sorting 2021-09-20 10:48:32 +02:00
Marco Neumann 0c5ba3786b refactor: rename closure to make syntax a bit clearer 2021-09-20 10:48:32 +02:00
Marco Neumann 4c4fd59724 docs: extend comment about (not) cleanup up delete predicates 2021-09-20 10:48:32 +02:00
Marco Neumann 492d991f49 feat: delete catalog pres. catalog <=> in-mem catalog API
First step towards #2518. Creates the Rust API to communicate delete
predicates between the preserved catalog and the in-memory catalog and
adds tests ensuring that the in-mem catalog produces the wanted errors
as well as correct checkpoints (similar to how this is done for the
parquet file tracking already).

**This does NOT contain the actual preservation!**
2021-09-20 10:48:32 +02:00
Marco Neumann 831e55d79e refactor: make error messages more precise 2021-09-20 09:42:55 +02:00
Marco Neumann 9c80d32af5 refactor: use normal google timestamps in parquet metadata again
We changed from Google timestamp (which use variable-sized integers) to
our own fixed-sized integer timestamps so that the size of the parquet
metadata does not depend on the timestamp. However with the introduction
of compression this is the case anyways (since slightly different
timestamps lead to different compression results) and we need now
derministic timestamps for tests. So there is now point in using our own
timestamp type. Switching back to the variable-sized type also shrinks
the post-compression results a bit.
2021-09-20 09:34:03 +02:00
Marco Neumann afc507ae14 feat: compress encoded parquet metadata
Depending on the number of columns, this should safe between 60% and
75%.
2021-09-20 09:33:18 +02:00
Marco Neumann 2820db5583 refactor: split preserved catalog `api` into `core` and `interface`
This makes it clearer which traits and functions users of the preserved
catalog must implement. This also splits the error types into smaller
enums that are easier to understand.

This change should make it easier to implement new functionality (like
capturing delete predicates).
2021-09-16 10:30:11 +02:00
Raphael Taylor-Davies c66095cad1
feat: remove metrics crate (#2552) 2021-09-15 19:43:33 +00:00
kodiakhq[bot] de732b4273
Merge branch 'main' into crepererum/parquet_file_wo_query 2021-09-15 07:15:19 +00:00
Marco Neumann 509c07330d refactor: decouple `parquet_file` from `query` 2021-09-14 18:26:16 +02:00
kodiakhq[bot] d60aa5940b
Merge branch 'main' into crepererum/chunk_order_type 2021-09-14 16:25:17 +00:00
Marco Neumann bfaba78dc3 refactor: move `predicate` into its own crate
Two reasons:

1. I wanna decouple `parquet_file` from `query` (nearly done, needs a
   small follow-up PR).
2. `predicate` will have more and more features (like serialization)
   which justifies a new home
2021-09-14 17:13:02 +02:00
Marco Neumann becef1c75f refactor: introduce `ChunkOrder` type 2021-09-14 17:10:23 +02:00
Marco Neumann 1d8edd4683 fix: metadata size increased 2021-09-14 13:03:26 +02:00
Marco Neumann 45cb00d8c0 refactor: track chunk order in chunks 2021-09-14 13:00:55 +02:00
Marco Neumann 4769b67d14 feat: API-level code to prune old transaction from catalog 2021-09-14 10:26:38 +02:00
Marco Neumann f93984cd94 refactor: clarify wording
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-09-14 09:43:55 +02:00
Marco Neumann e7edb65b1d feat: show number of stripped bytes in catalog dump 2021-09-14 09:43:55 +02:00
Raphael Taylor-Davies 44918e4afc
feat: migrate chunk metrics (#2491)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-09-09 16:02:16 +00:00
Marco Neumann 4a863993ec feat: "dump catalog" debug CLI 2021-09-02 08:08:20 +02:00
Marco Neumann 581ee64049 feat: add functions to dump catalog data to text 2021-09-02 08:07:07 +02:00
Marco Neumann 06c941d798 refactor: split up `make_record_batch` 2021-09-01 11:26:05 +02:00
Marco Neumann 6ce586a2ac docs: add docstrings to `PreservedCatalog` members 2021-09-01 11:26:05 +02:00
Marco Neumann 70a5ffeae7 test: allow creation of deterministic chunks and transactions 2021-09-01 11:26:05 +02:00
Marco Neumann 06833110ab test: allow creation of less complex parquet chunks 2021-09-01 11:26:05 +02:00
Marco Neumann 27248850e5 refactor: use `byte::Bytes` for metadata in protobuf messages
That simplifies printing a bit since we `Vec<u8>` prints quite badly.
2021-09-01 11:26:05 +02:00
Marco Neumann a312f81bf2 refactor: move `storage_testing` to `storage::tests` 2021-08-27 15:59:59 +02:00
Marco Neumann a2efe3299d refactor: restructure catalog code in `parquet_file`
No functional change (except for slightly changing error messages). This
will make it easier to add more functionality.
2021-08-27 15:06:31 +02:00
Carol (Nichols || Goulding) 7ca177978e fix: Add missing await from a logical merge conflict 2021-08-26 09:27:16 -04:00
Carol (Nichols || Goulding) 18ba3b5c59 feat: Create database directories with a generation ID 2021-08-26 09:14:22 -04:00
Marco Neumann 026202a05c fix: correctly account for parquet metadata size
We need to hold the parquet metadata in memory so that we're able to
create catalog checkpoints. We used to do that by holding the decoded
structure (provided by the upstream `parquet` crate) in memory and
serializing that data on demand to Apache Thrift.

There are two drawbacks:

1. We did not account for the memory usage of the decoded structures (or
   at least not fully).
2. We actually don't need the decoded data in-memory, since for the
   checkpoint creation we only need to write the serialized data.

So this PR changes our wrapper so it holds the serialized data which is
then only decoded when it's really necessary. Since the serialized data
is a simple byte vector, we can also easily account for the size.

Note that this makes the accounted size of parquet chunks larger.
However this data was always there, we just ignored it up until now. If
the size of the parquet metadata really becomes an issue, we could trait
some CPU time for memory by compressing it.
2021-08-26 13:24:32 +02:00
Andrew Lamb 3ca0d5d42f
Merge branch 'main' into cn/bump 2021-08-19 14:08:49 -04:00
Raphael Taylor-Davies b0e8b75a8a fix: TestCatalogState unique chunk ID 2021-08-19 17:19:12 +01:00
Carol (Nichols || Goulding) 7246f2702a fix: Bump transaction version because of a change in the Parquet files 2021-08-19 09:32:37 -04:00
Raphael Taylor-Davies 5a841600d9 feat: make catalog state test deterministic (#2349) 2021-08-19 14:04:27 +01:00
Carol (Nichols || Goulding) 6390156c0e fix: Remove error types not used anywhere 2021-08-18 11:32:39 -04:00
Carol (Nichols || Goulding) ef0e1a3f60 refactor: Extract a transaction file path type 2021-08-18 11:32:39 -04:00
Carol (Nichols || Goulding) 6d5cb9c117 refactor: Extract a ParquetFilePath to handle paths to parquet files in a db's object store 2021-08-18 11:32:39 -04:00
Ning Sun c012e996ab
refactor: remove display methods, use fmt::Display instead. (#2272)
* refactor: remove display methods, use fmt::Display instead.

Signed-off-by: Ning Sun <sunng@protonmail.com>

* refactor: update a few calls from .display to .to_string()

* fix: consistently use `Path` rather than occasionally `DirsAndFileName`

* fix: fixup for merge conflicts

* fix: update test

* fix: Catch another case or two

* fix: fmt

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-08-16 18:00:22 +00:00
Carol (Nichols || Goulding) 564238ad8c refactor: Organize uses 2021-08-12 15:05:32 -04:00
Carol (Nichols || Goulding) ae6b0e669b refactor: Extract a database persister type that wraps object store
Connects to #2193.
2021-08-12 15:05:32 -04:00
Carol (Nichols || Goulding) daa534ee32 refactor: Incorporate Path parsing into the TransactionFile type 2021-08-12 09:06:14 -04:00
Carol (Nichols || Goulding) ee3173efb1 refactor: Simplify implementation of parse_file_path 2021-08-12 09:06:14 -04:00
Carol (Nichols || Goulding) dbd1718fd2 refactor: Use the TransactionKey type 2021-08-12 09:06:14 -04:00
Carol (Nichols || Goulding) 7f7a911a9a refactor: Extract a TransactionFile type to manage transaction paths 2021-08-12 09:06:06 -04:00
Dom 3de6b44e23
build: use new rustdoc lint name (#2261)
* fix: nocache feature code rot

The MBChunk::snapshot code when using the "nocache" option no longer
compiles - this commit updates it to match the not(nocache) code.

* build: use updated broken_intra_doc_links name

The broken_intra_doc_links lint was renamed
rustdoc::broken_intra_doc_links

https://doc.rust-lang.org/rustdoc/lints.html
2021-08-11 19:48:51 +00:00
Marco Neumann 8721c5fcd6 fix: improve error messages 2021-08-09 10:54:23 +02:00
Marco Neumann 950286e5b7 feat: make replay planning work w/ unordered checkpoints 2021-08-09 10:54:23 +02:00
Andrew Lamb d41b44d312
feat: use zstd compression when writing parquet files (#2218)
* feat: use ZSTD when writing parquet files

* fix: test
2021-08-06 18:45:55 +00:00
Andrew Lamb e92e94caad
chore: Update deps (including arrow 5.1.0, tonic -> 0.5, and prost 0.5) (#2172)
* chore: Update deps (including arrow 5.0.0 --> arrow 5.1.0)

* chore: update all the things

* refactor: Update serving readiness check due to change in Tonic API

* chore: update more deps

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-08-05 15:57:38 +00:00
Andrew Lamb 1ccaa433e8
fix: Temporarily disable parquet predicate pushdown (#2164) 2021-07-30 20:24:30 +00:00
Carol (Nichols || Goulding) 9d15798288 fix: Address or allow Clippy warnings new with Rust 1.54 2021-07-30 09:59:59 -04:00
kodiakhq[bot] 545222303f
Merge branch 'main' into cn/cc-only 2021-07-29 17:18:16 +00:00
Carol (Nichols || Goulding) ad0a9549de fix: Avoid an unnecessary parsing of iox metadata
In one case where ParquetChunk::new was being called, the calling code
had just parsed the IoxMetadata too. In the other case, the calling code
had just *created* the IoxMetadata being parsed. In both cases, this
re-parsing wasn't actually needed; the two bits of info
ParquetChunk::new can be easily passed in.
2021-07-28 14:25:56 -04:00
Carol (Nichols || Goulding) af7866a638 refactor: Remove first/last write times from ParquetFile chunks 2021-07-28 14:12:36 -04:00
Marco Neumann 04e797c706 refactor: pass sequencer numbers directly to DB checkpoint
First of all using a partition checkpoint as some kind of intermediate
representation was kinda a hack because partition checkpoints should
only created for to-be-persisted partitions, not for the others.
API-wise it should only be possible to construct a partition checkpoint
from a flush handle.

Also we were only able to construct partition checkpoints for partitions
that had unpersisted data, otherwise there was no sane way to fill the
`min_unpersisted_timestamp`. We must however scan all partitions no
matter if there is unpersisted data so that we can determine the maximum
seen sequence numbers. This was caught by a replay test resulting in a
catalog state where the last database checkpoint had lower maximum seen
sequence numbers than some partition checkpoint, bailing out with an
error.

So overall it turns out that passing the sequencer numbers directly
instead of wrapping them into a partition checkpoint is the better
implementation.
2021-07-28 17:28:34 +02:00
Andrew Lamb 5fb3e00f2a
fix: Properly record total_count and null_count in statistics (#2103)
* fix: Properly record total_count and null_count in statistics

* fix: fix statistics calculation in mutable_buffer

* refactor: expose null counts in read_buffer

* refactor: expose null_count in parquet_file

* fix: update server crate tests

* fix: update query_tests tests

* docs: tweak comments

* refactor: Use storage_stats rather than adding `null_count`

* refactor: rename test data field for clarity

* fix: fixup merge conflicts

* refactor: rename initial_non_null_count to initial_total_count

* refactor: caculate null_count as row_count - to_add
2021-07-26 18:13:36 +00:00
Carol (Nichols || Goulding) 0acb0efbc9 fix: Bump METADATA and TRANSACTION versions 2021-07-26 10:52:42 -04:00
Jake Goulding d928bc84e6 feat: Thread time_of_{first,last}_write through Parquet metadata 2021-07-23 14:07:35 -04:00
Carol (Nichols || Goulding) 9604ce7084 fix: Don't pass table name around when it's only returned back
The read_statistics, read_statistics_from_parquet_row_group,
load_parquet_from_store, and load_parquet_from_store_for_chunk functions
weren't ever using table name, they just passed it around and passed it
back.
2021-07-23 13:48:16 -04:00
Carol (Nichols || Goulding) 3c794153dd refactor: Organize uses 2021-07-23 13:48:15 -04:00
kodiakhq[bot] 5b5453a020
Merge branch 'main' into pd/add-parquet-cache 2021-07-22 20:21:53 +00:00