Commit Graph

304 Commits (e838a22f9252b692554e4d63c109ab8028228c29)

Author SHA1 Message Date
kodiakhq[bot] de732b4273
Merge branch 'main' into crepererum/parquet_file_wo_query 2021-09-15 07:15:19 +00:00
Marco Neumann 509c07330d refactor: decouple `parquet_file` from `query` 2021-09-14 18:26:16 +02:00
kodiakhq[bot] d60aa5940b
Merge branch 'main' into crepererum/chunk_order_type 2021-09-14 16:25:17 +00:00
Marco Neumann bfaba78dc3 refactor: move `predicate` into its own crate
Two reasons:

1. I wanna decouple `parquet_file` from `query` (nearly done, needs a
   small follow-up PR).
2. `predicate` will have more and more features (like serialization)
   which justifies a new home
2021-09-14 17:13:02 +02:00
Marco Neumann becef1c75f refactor: introduce `ChunkOrder` type 2021-09-14 17:10:23 +02:00
Marco Neumann 1d8edd4683 fix: metadata size increased 2021-09-14 13:03:26 +02:00
Marco Neumann 45cb00d8c0 refactor: track chunk order in chunks 2021-09-14 13:00:55 +02:00
Marco Neumann 4769b67d14 feat: API-level code to prune old transaction from catalog 2021-09-14 10:26:38 +02:00
Marco Neumann f93984cd94 refactor: clarify wording
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-09-14 09:43:55 +02:00
Marco Neumann e7edb65b1d feat: show number of stripped bytes in catalog dump 2021-09-14 09:43:55 +02:00
Raphael Taylor-Davies 44918e4afc
feat: migrate chunk metrics (#2491)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-09-09 16:02:16 +00:00
Marco Neumann 4a863993ec feat: "dump catalog" debug CLI 2021-09-02 08:08:20 +02:00
Marco Neumann 581ee64049 feat: add functions to dump catalog data to text 2021-09-02 08:07:07 +02:00
Marco Neumann 06c941d798 refactor: split up `make_record_batch` 2021-09-01 11:26:05 +02:00
Marco Neumann 6ce586a2ac docs: add docstrings to `PreservedCatalog` members 2021-09-01 11:26:05 +02:00
Marco Neumann 70a5ffeae7 test: allow creation of deterministic chunks and transactions 2021-09-01 11:26:05 +02:00
Marco Neumann 06833110ab test: allow creation of less complex parquet chunks 2021-09-01 11:26:05 +02:00
Marco Neumann 27248850e5 refactor: use `byte::Bytes` for metadata in protobuf messages
That simplifies printing a bit since we `Vec<u8>` prints quite badly.
2021-09-01 11:26:05 +02:00
Marco Neumann a312f81bf2 refactor: move `storage_testing` to `storage::tests` 2021-08-27 15:59:59 +02:00
Marco Neumann a2efe3299d refactor: restructure catalog code in `parquet_file`
No functional change (except for slightly changing error messages). This
will make it easier to add more functionality.
2021-08-27 15:06:31 +02:00
Carol (Nichols || Goulding) 7ca177978e fix: Add missing await from a logical merge conflict 2021-08-26 09:27:16 -04:00
Carol (Nichols || Goulding) 18ba3b5c59 feat: Create database directories with a generation ID 2021-08-26 09:14:22 -04:00
Marco Neumann 026202a05c fix: correctly account for parquet metadata size
We need to hold the parquet metadata in memory so that we're able to
create catalog checkpoints. We used to do that by holding the decoded
structure (provided by the upstream `parquet` crate) in memory and
serializing that data on demand to Apache Thrift.

There are two drawbacks:

1. We did not account for the memory usage of the decoded structures (or
   at least not fully).
2. We actually don't need the decoded data in-memory, since for the
   checkpoint creation we only need to write the serialized data.

So this PR changes our wrapper so it holds the serialized data which is
then only decoded when it's really necessary. Since the serialized data
is a simple byte vector, we can also easily account for the size.

Note that this makes the accounted size of parquet chunks larger.
However this data was always there, we just ignored it up until now. If
the size of the parquet metadata really becomes an issue, we could trait
some CPU time for memory by compressing it.
2021-08-26 13:24:32 +02:00
Andrew Lamb 3ca0d5d42f
Merge branch 'main' into cn/bump 2021-08-19 14:08:49 -04:00
Raphael Taylor-Davies b0e8b75a8a fix: TestCatalogState unique chunk ID 2021-08-19 17:19:12 +01:00
Carol (Nichols || Goulding) 7246f2702a fix: Bump transaction version because of a change in the Parquet files 2021-08-19 09:32:37 -04:00
Raphael Taylor-Davies 5a841600d9 feat: make catalog state test deterministic (#2349) 2021-08-19 14:04:27 +01:00
Carol (Nichols || Goulding) 6390156c0e fix: Remove error types not used anywhere 2021-08-18 11:32:39 -04:00
Carol (Nichols || Goulding) ef0e1a3f60 refactor: Extract a transaction file path type 2021-08-18 11:32:39 -04:00
Carol (Nichols || Goulding) 6d5cb9c117 refactor: Extract a ParquetFilePath to handle paths to parquet files in a db's object store 2021-08-18 11:32:39 -04:00
Ning Sun c012e996ab
refactor: remove display methods, use fmt::Display instead. (#2272)
* refactor: remove display methods, use fmt::Display instead.

Signed-off-by: Ning Sun <sunng@protonmail.com>

* refactor: update a few calls from .display to .to_string()

* fix: consistently use `Path` rather than occasionally `DirsAndFileName`

* fix: fixup for merge conflicts

* fix: update test

* fix: Catch another case or two

* fix: fmt

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-08-16 18:00:22 +00:00
Carol (Nichols || Goulding) 564238ad8c refactor: Organize uses 2021-08-12 15:05:32 -04:00
Carol (Nichols || Goulding) ae6b0e669b refactor: Extract a database persister type that wraps object store
Connects to #2193.
2021-08-12 15:05:32 -04:00
Carol (Nichols || Goulding) daa534ee32 refactor: Incorporate Path parsing into the TransactionFile type 2021-08-12 09:06:14 -04:00
Carol (Nichols || Goulding) ee3173efb1 refactor: Simplify implementation of parse_file_path 2021-08-12 09:06:14 -04:00
Carol (Nichols || Goulding) dbd1718fd2 refactor: Use the TransactionKey type 2021-08-12 09:06:14 -04:00
Carol (Nichols || Goulding) 7f7a911a9a refactor: Extract a TransactionFile type to manage transaction paths 2021-08-12 09:06:06 -04:00
Dom 3de6b44e23
build: use new rustdoc lint name (#2261)
* fix: nocache feature code rot

The MBChunk::snapshot code when using the "nocache" option no longer
compiles - this commit updates it to match the not(nocache) code.

* build: use updated broken_intra_doc_links name

The broken_intra_doc_links lint was renamed
rustdoc::broken_intra_doc_links

https://doc.rust-lang.org/rustdoc/lints.html
2021-08-11 19:48:51 +00:00
Marco Neumann 8721c5fcd6 fix: improve error messages 2021-08-09 10:54:23 +02:00
Marco Neumann 950286e5b7 feat: make replay planning work w/ unordered checkpoints 2021-08-09 10:54:23 +02:00
Andrew Lamb d41b44d312
feat: use zstd compression when writing parquet files (#2218)
* feat: use ZSTD when writing parquet files

* fix: test
2021-08-06 18:45:55 +00:00
Andrew Lamb e92e94caad
chore: Update deps (including arrow 5.1.0, tonic -> 0.5, and prost 0.5) (#2172)
* chore: Update deps (including arrow 5.0.0 --> arrow 5.1.0)

* chore: update all the things

* refactor: Update serving readiness check due to change in Tonic API

* chore: update more deps

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-08-05 15:57:38 +00:00
Andrew Lamb 1ccaa433e8
fix: Temporarily disable parquet predicate pushdown (#2164) 2021-07-30 20:24:30 +00:00
Carol (Nichols || Goulding) 9d15798288 fix: Address or allow Clippy warnings new with Rust 1.54 2021-07-30 09:59:59 -04:00
kodiakhq[bot] 545222303f
Merge branch 'main' into cn/cc-only 2021-07-29 17:18:16 +00:00
Carol (Nichols || Goulding) ad0a9549de fix: Avoid an unnecessary parsing of iox metadata
In one case where ParquetChunk::new was being called, the calling code
had just parsed the IoxMetadata too. In the other case, the calling code
had just *created* the IoxMetadata being parsed. In both cases, this
re-parsing wasn't actually needed; the two bits of info
ParquetChunk::new can be easily passed in.
2021-07-28 14:25:56 -04:00
Carol (Nichols || Goulding) af7866a638 refactor: Remove first/last write times from ParquetFile chunks 2021-07-28 14:12:36 -04:00
Marco Neumann 04e797c706 refactor: pass sequencer numbers directly to DB checkpoint
First of all using a partition checkpoint as some kind of intermediate
representation was kinda a hack because partition checkpoints should
only created for to-be-persisted partitions, not for the others.
API-wise it should only be possible to construct a partition checkpoint
from a flush handle.

Also we were only able to construct partition checkpoints for partitions
that had unpersisted data, otherwise there was no sane way to fill the
`min_unpersisted_timestamp`. We must however scan all partitions no
matter if there is unpersisted data so that we can determine the maximum
seen sequence numbers. This was caught by a replay test resulting in a
catalog state where the last database checkpoint had lower maximum seen
sequence numbers than some partition checkpoint, bailing out with an
error.

So overall it turns out that passing the sequencer numbers directly
instead of wrapping them into a partition checkpoint is the better
implementation.
2021-07-28 17:28:34 +02:00
Andrew Lamb 5fb3e00f2a
fix: Properly record total_count and null_count in statistics (#2103)
* fix: Properly record total_count and null_count in statistics

* fix: fix statistics calculation in mutable_buffer

* refactor: expose null counts in read_buffer

* refactor: expose null_count in parquet_file

* fix: update server crate tests

* fix: update query_tests tests

* docs: tweak comments

* refactor: Use storage_stats rather than adding `null_count`

* refactor: rename test data field for clarity

* fix: fixup merge conflicts

* refactor: rename initial_non_null_count to initial_total_count

* refactor: caculate null_count as row_count - to_add
2021-07-26 18:13:36 +00:00
Carol (Nichols || Goulding) 0acb0efbc9 fix: Bump METADATA and TRANSACTION versions 2021-07-26 10:52:42 -04:00
Jake Goulding d928bc84e6 feat: Thread time_of_{first,last}_write through Parquet metadata 2021-07-23 14:07:35 -04:00
Carol (Nichols || Goulding) 9604ce7084 fix: Don't pass table name around when it's only returned back
The read_statistics, read_statistics_from_parquet_row_group,
load_parquet_from_store, and load_parquet_from_store_for_chunk functions
weren't ever using table name, they just passed it around and passed it
back.
2021-07-23 13:48:16 -04:00
Carol (Nichols || Goulding) 3c794153dd refactor: Organize uses 2021-07-23 13:48:15 -04:00
kodiakhq[bot] 5b5453a020
Merge branch 'main' into pd/add-parquet-cache 2021-07-22 20:21:53 +00:00
Paul Dix 88e29dede9 chore: remove extraneous example code from parquet storage 2021-07-22 16:21:13 -04:00
Andrew Lamb 01c79f1a1a
fix: Print all timestamps using RFC3339 format (#2098)
* fix: Use IOx pretty printer rather than arrow pretty printer

* chore: update tests in the query crate

* chore: update influxdb_iox tests

* chore: Update end to end tests

* chore: update query_tests

* chore: update mutable_buffer tests

* refactor: update parquet_file tests

* refactor: update db tests

* chore: update kafka integration test output

* fix: merge conflict
2021-07-22 19:04:52 +00:00
Marco Neumann 50241bae9e refactor: do not abuse `uint64::MAX` as sentinal for `None` 2021-07-22 12:51:43 +02:00
Paul Dix d95b5df03e refactor: move cache to ObjectStore
Since the consumers of ObjectStore always use the concrete type rather than the ObjectStoreApi trait, it makes more sense to just change the concrete type to have a pointer to the cache. This removes the cache from the ObjectStoreApi trait and changes the ObjectStore to be a regular struct rather than a tuple around the ObjectStoreIntegration. Future work will have the server configure the cache on the ObjectStore struct when its options are set.
2021-07-21 18:27:56 -04:00
Paul Dix d0ea812041 feat: add skeleton for object store file cache 2021-07-21 18:27:56 -04:00
Marco Neumann 57a9d5ade0 refactor: correctly track "seen" ranges in persistence checkpoints
Now we can handle all these cases:

There are two partitions w/ a single write each:

1. A reads sequence number 1
2. B reads sequence number 2
3. we persist A which only knows the sequences up until 1
=> the DB checkpoint needs the global max, otherwise we forget sequences
   during replay (2 in this case, so B would be gone)

1. B reads sequence number 1
2. A reads sequence number 2
3. we persist A which (w/o this commit) would not track the sequencer at
   all in this checkpoint (since there is nothing to replay)
=> we MUST also remember that we already read up until 2, otherwise we'll
   re-read 2 after replay
=> the partition checkpoint needs the local seen max (no matter if there's
   something to to persist)
2021-07-21 19:19:49 +02:00
Marco Neumann a5fc1c7d38 fix: collect min AND max in database checkpoints
This is required to correctly handle the following case:

1. There are two partitions A and B w/ a single write each (from the same
   sequencer).
2. We persist A:
   - The partition checkpoint for A will be empty because after persistence
     there will be nothing to replay (the single write is persisted and
     we're ready).
   - The database checkpoint that contains the global minimum of all ranges
     recognizes that for the sequencer there is indeed something left (the
     minimum sequence number from B).
3. DB restart happens, replay starts
4. We scan all persisted files, figure out that we have a DB checkpoint
   with a sequence minimum but (w/o the change in this commit) there is no
   maximum. Only partition checkpoints contain maxima, and the only partition
   checkpoint that was persisted was the one for partition A and that one was
   empty (see above).
5. So now how do we recover partition B?
2021-07-21 14:48:29 +02:00
Andrew Lamb 4da8a16c18
chore: update to arrow 5.0 and master datafusion (#2049)
* chore: update to arrow 5.0 and master datafusion

* fix: Update test for change in object size
2021-07-19 12:49:51 +00:00
Jake Goulding 42b56ad657 refactor: Use SNAFU's context instead of `ok_or_else` 2021-07-16 09:59:54 -04:00
Jake Goulding 939d15a21f perf: Avoid clone when an error doesn't occur 2021-07-16 09:59:54 -04:00
Marco Neumann f57ba6afdb
fix: use fixed-size timestamps for parquet metadata (#2032)
This fixes flaky tests that rely on predictable files sizes.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-16 13:14:02 +00:00
Andrew Lamb 0c86d1dccf
feat: Record parquet bytes size in catalog / parquet_file (#2006)
* feat: Store object store size in parquet_file

* fix: update TRANSACTION_VERSION to 8

* refactor: rename os_bytes --> file_size_bytes
2021-07-15 12:07:11 +00:00
Marco Neumann 40047a76bc refactor: `remove_parquet` cannot fail 2021-07-15 12:07:56 +02:00
Raphael Taylor-Davies 1d00fa2fd8
refactor: track memory metrics in catalog (#1995)
* refactor: track memory metrics in catalog

* chore: update comment
2021-07-14 16:23:00 +00:00
Andrew Lamb d35b74c226
fix: Fix doc build warnings (#1945)
* fix: Fix doc build warnings

* refactor: add deny bare_urls to crates

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-13 08:03:42 +00:00
Andrew Lamb 670826daf9
refactor: make object_store construction interface consistent (#1944)
* refactor: make object_store construction interface consistent

* fix: benchmarks

* fix: doc build

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-12 12:56:36 +00:00
Marco Neumann 18893e76e0 refactor: convert some table name and part. key String to Arcs
This has the (somewhat nice) side effect that it shrinks the in-mem
catalog a bit as well because nw `ParquetChunk` is a bit smaller making
the chunk stage enum smaller as well.
2021-07-08 14:34:28 +02:00
Marco Neumann b528ac2b55 feat: store schemas per table
This way we can:

- check for schema matches even for writes going into different
  partitions
- solve #1768 and #1884 in some future PR

Closes #1897.
2021-07-08 09:18:09 +02:00
Andrew Lamb e6d995cbd8
chore: Update to Rust 1.53.0 (#1922)
* chore: Update to Rust 1.53.0

* fix: Update to latest clippy standards

* fix: bad refactor

* fix: Update escaping

* test: update test output

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-07 18:02:03 +00:00
Marco Neumann 4ca2d3e148 chore: move persistence windows related code into own crate
The entire persistence windows data structures (including the
checkpoints) have nothing to do with the mutable buffer per se. So lets
move them into their own crate. This also makes `parquet_file` not
longer depend on `mutable_buffer`.
2021-07-05 10:23:58 +02:00
Marco Neumann d96e15c3f7 docs: explain why we store checkpoints in parquet files 2021-07-05 09:42:46 +02:00
Marco Neumann cdab1bed05 feat: persist part+db checkpoint in parquets and catalog
This will be required for replay on server startup.
2021-07-05 09:42:46 +02:00
Jacob Marble 0779b0d9bd
feat: add gRPC listener for new write protocol (#1842)
* feat: add gRPC listener for new write protocol

* chore: clippy happy

* chore: lint

* chore: cargo fmt --all

* chore: cargo clippy

* chore: protobuf-lint

* chore: more formatting

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 16:15:12 +00:00
Marco Neumann 4204127b05 refactor: use protobuf for in-parquet metadata 2021-06-30 16:51:37 +02:00
Marco Neumann ddc9cd49ca chore: bump preserved catalog version 2021-06-29 14:23:06 +02:00
Marco Neumann 3ebb6a3037 refactor: do not capture txn-specific information in parquet files
This helps with #1821.
2021-06-29 14:22:36 +02:00
kodiakhq[bot] eda9532eb2
Merge branch 'main' into crepererum/issue1821-cleanup-lock 2021-06-29 10:48:43 +00:00
Marco Neumann 48df13de05 refactor: use parking lot for catalog cleanup 2021-06-29 12:47:29 +02:00
Marco Neumann f824f235b4
fix: fix info log message
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-06-29 12:35:05 +02:00
Marco Neumann 778a611fb8 docs: add clarifying comment for rebuild test 2021-06-29 11:58:19 +02:00
Marco Neumann 17f89ea8d0 docs: fix comment about lock downgrade 2021-06-29 11:53:55 +02:00
Marco Neumann 2cd5ce98be refactor: do not pass locks around for catalog cleanup 2021-06-29 10:21:41 +02:00
Marco Neumann 730a23faa3 refactor: improve locking around the parquet file cleanup
Instead of (ab)using the transaction lock to prevent the cleanup job
from removing just-written parquet files, use a dedicated lock. This
will later allow us to write parquet files before starting a transaction
(i.e. w/o holding the transaction lock).

This will help with #1821.
2021-06-29 10:20:03 +02:00
Marco Neumann 6ec24353bf refactor: only rebuild a single txn for pres. catalogs
Stop relying on in-parquet transaction information during catalog
rebuilds. This has some downsides (no fork detection, only a single
transaction hence no time travel) but will allow that we remove
transaction information from parquet files, so that we can finally move
the actual parquet file storage out of the transaction lock.

This will help with #1821.
2021-06-28 15:10:44 +02:00
Andrew Lamb 0a03605bbc
refactor: pull Channel --> Stream adapater into its own module (#1793)
* refactor: pull Channel --> Stream adapater into its own module

* docs: Update query/src/exec/stream.rs

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
2021-06-24 10:35:45 +00:00
kodiakhq[bot] 59993e8b8f
Merge branch 'main' into crepererum/issue1623 2021-06-23 12:40:05 +00:00
Marco Neumann c395409b51 feat: include UUIDv4 into parquet file names
Change schema from

```text
<server_id>/<db_name>/data/<part_key>/<chunk_id>/<table_name>.parquet
```

to

```text
<server_id>/<db_name>/data/<table_name>/<part_key>/<chunk_id>.<uuid>.parquet
```

So parquet files will NEVER be overwritten. This is especially helpful
when dealing with old catalog leftovers (i.e. a parquet file that
belonged to an old but wiped catalog). It also simplifies the reasoning
about file references in the future and follows what other dataset
formats are usually doing (i.e. never replace files).

Also use `ChunkAddr` where it makes sense.
2021-06-23 14:30:28 +02:00
kodiakhq[bot] 70817a474c
Merge branch 'main' into crepererum/issue1740-d 2021-06-23 12:29:54 +00:00
Raphael Taylor-Davies 5cd911c74a
fix: correct row count for object store chunks (#1789) 2021-06-23 12:06:49 +00:00
Marco Neumann 1636f47565 refactor: remove dead code 2021-06-23 10:51:22 +02:00
Marco Neumann cf55df68b5 refactor: remove some `Arc`s around the in-mem catalog
This is for #1740.
2021-06-23 10:51:22 +02:00
Marco Neumann e36b6f9c7a docs: fix intra-doc link 2021-06-23 10:25:05 +02:00
Marco Neumann 67508094b4 fix: double ref
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
2021-06-23 10:25:05 +02:00
Marco Neumann d2be641864 refactor: make checkpointing easier to use
Don't mix commit+checkpoint in a single call so that the caller has to
reason about the error type and which of the two operations has failed.
Splitting it also makes it easier to create the correct checkpoint data.
2021-06-23 10:25:05 +02:00
Marco Neumann 4a961694ec refactor: make caller sync mem<>OS view during catalog transactions
This is for #1740. Greatly simplifies the integration of the persisted
catalog into the DB.
2021-06-23 10:25:05 +02:00
Marco Neumann d1db0dfaeb refactor: remove type parameter from preserved catalog
For #1740.
2021-06-22 10:53:10 +02:00