Commit Graph

127 Commits (96fb595cc0a4dde98451eb923c2e4300e4af9622)

Author SHA1 Message Date
Marco Neumann 28d1dc4da1 chore: bump preserved catalog version 2021-06-10 16:01:13 +02:00
Marco Neumann 80ee36cd1a refactor: slightly streamline path parsing code in pres. catalog 2021-06-10 15:59:28 +02:00
Marco Neumann 7e7332c9ce refactor: make comparison a bit less confusing 2021-06-10 15:42:21 +02:00
Marco Neumann fd581e2ec9 docs: fix confusion wording in `CatalogState::files` 2021-06-10 15:42:21 +02:00
Marco Neumann be9b3a4853 fix: protobuf lint fixes 2021-06-10 15:42:21 +02:00
Marco Neumann 294c304491 feat: impl catalog checkpointing infrastructure
This implements a way to add checkpoints to the preserved catalog and
speed up replay.

Note: This leaves the "hook it up into the actual DB" for a future PR.

Issue: #1381.
2021-06-10 15:42:21 +02:00
Marco Neumann 188cacec54 refactor: use `Arc` to pass `ParquetFileMetaData`
This will be handy when the catalog state must be able to return
metadata objects so that we can create checkpoints, esp. when we use
multi-chunk parquet files in some midterm future.
2021-06-10 15:42:21 +02:00
Marco Neumann c7412740e4 refactor: prepare to read and write multiple file types for catalog
Prepares #1381.
2021-06-10 15:42:21 +02:00
Marco Neumann 33e364ed78 feat: add encoding info to transaction protobuf
This should help with #1381.
2021-06-10 15:42:21 +02:00
Marco Neumann 4fe2d7af9c chore: enforce `clippy::future_not_send` for `parquet_file` 2021-06-09 18:18:27 +02:00
Andrew Lamb ab0aed0f2e
refactor: Remove a layer of channels in parquet read stream (#1648)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-07 16:47:04 +00:00
Raphael Taylor-Davies 1e7ef193a6
refactor: use field metadata to store influx types (#1642)
* refactor: use field metadata to store influx types

make SchemaBuilder non-consuming

* chore: remove unused variants

* chore: fix lints
2021-06-07 13:26:39 +00:00
Marco Neumann c830542464 feat: add info log when cleanup limit is reached 2021-06-04 11:12:29 +02:00
Marco Neumann 91df8a30e7 feat: limit number of files during storage cleanup
Since the number of parquet files can potentially be unbound (aka very
very large) and we do not want to hold the transaction lock for too
long and also want to limit memory consumption of the cleanup routine,
let's limit the number of files that we collect for cleanup.
2021-06-03 17:43:11 +02:00
Marco Neumann 85139abbbb fix: use structured logging for cleanup logs 2021-06-03 11:23:29 +02:00
Andrew Lamb 32c6ed1f34
refactor: More cleanup related to multi-table chunks (#1604)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-02 17:00:23 +00:00
Marco Neumann e5b65e10ac test: ensure that `find_last_transaction_timestamp` indeed returns the last timestamp 2021-06-02 10:15:06 +02:00
Marco Neumann 98e413d5a9 fix: do not unwrap broken timestamps in serialized catalog 2021-06-02 10:15:06 +02:00
Marco Neumann fc0a74920f fix: use clearer error text 2021-06-02 09:41:19 +02:00
Marco Neumann 2a0b2698c6 fix: use structured logging
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-06-02 09:41:19 +02:00
Marco Neumann 64bf8c5182 docs: add code comment explaining why we parse transaction timestamps
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-06-02 09:41:19 +02:00
Marco Neumann 77aeb5ca5d refactor: use protobuf-native Timestamp instead of string 2021-06-02 09:41:19 +02:00
Marco Neumann 9b9400803b refactor!: bump transaction version to 2 2021-06-02 09:41:19 +02:00
Marco Neumann 5f77b7b92b feat: add `parquet_file::catalog::find_last_transaction_timestamp` 2021-06-02 09:41:19 +02:00
Marco Neumann 9aee961e2a test: test loading catalogs from broken protobufs 2021-06-02 09:41:19 +02:00
Marco Neumann 0a625b50e6 feat: store transaction timestamp in preserved catalog 2021-06-02 09:41:19 +02:00
Andrew Lamb d8fbb7b410
refactor: Remove last vestiges of multi-table chunks from PartitionChunk API (#1588)
* refactor: Remove last vestiges of multi-table chunks from PartitionChunk API

* fix: remove test that can no longer fail

* fix: update tests + code review comments

* fix: clippy

* fix: clippy

* fix: restore test_measurement_fields_error test
2021-06-01 16:12:33 +00:00
Andrew Lamb d3711a5591
refactor: Use ParquetExec from DataFusion to read parquet files (#1580)
* refactor: use ParquetExec to read parquet files

* fix: test

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-01 14:44:07 +00:00
Andrew Lamb 64328dcf1c
feat: cache schema on catalog chunks too (#1575) 2021-06-01 12:42:46 +00:00
Andrew Lamb 00e735ef0d
chore: remove unused dependencies (#1583) 2021-05-29 10:31:57 +00:00
Raphael Taylor-Davies db432de137
feat: add distinct count to StatValues (#1568) 2021-05-28 17:41:34 +00:00
kodiakhq[bot] 6098c7cd00
Merge branch 'main' into crepererum/issue1376 2021-05-28 07:13:15 +00:00
Andrew Lamb f3bec93ef1
feat: Cache TableSummary in Catalog rather than computing it on demand (#1569)
* feat: Cache `TableSummary` in catalog Chunks

* refactor: use consistent table summary
2021-05-27 16:03:05 +00:00
Marco Neumann dd2a976907 feat: add a flag to ignore metadata errors during catalog rebuild 2021-05-27 13:10:14 +02:00
Marco Neumann bc7389dc38 fix: fix typo
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-05-27 12:51:01 +02:00
Marco Neumann 48307e4ab2 docs: adjust error description to reflect internal errors
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-05-27 12:51:01 +02:00
Marco Neumann d6f0dc7059 feat: implement catalog rebuilding from files
Closes #1376.
2021-05-27 12:51:01 +02:00
Marco Neumann 024323912a docs: explain what `PreservedCatalog::wipe` offers 2021-05-27 12:48:41 +02:00
Raphael Taylor-Davies 4fcc04e6c9
chore: enable arrow prettyprint feature (#1566) 2021-05-27 10:28:14 +00:00
Marco Neumann 9f451423d5 feat: log files that are deleted 2021-05-26 12:49:44 +02:00
Marco Neumann 24ec1a472e fix: do NOT delete parquet files that are reachable by time travel 2021-05-26 12:38:54 +02:00
Marco Neumann 5983336366 refactor: rename `parquet_file::{utils => test_utils}` 2021-05-26 11:09:29 +02:00
Marco Neumann d7e3bc569e refactor: shorten time we hold the transaction lock during clean-up 2021-05-26 11:04:57 +02:00
Marco Neumann 18f5dd9ae1 test: ensure transaction lock exists during cleanup planning 2021-05-26 11:04:57 +02:00
Marco Neumann b55eae98da fix: do not delete non-parquet files during catalog-driven cleanup 2021-05-26 11:04:57 +02:00
Marco Neumann 5ed16ff294 refactor: improve error message in `parquet_file::cleanup` 2021-05-26 11:04:57 +02:00
Marco Neumann 14fdf3b7c7 feat: implement object store cleanup core routine 2021-05-26 11:02:40 +02:00
Marco Neumann cc78b5317d feat: add method to get all parquet files from catalog state 2021-05-26 11:02:40 +02:00
Marco Neumann 953114af2e feat: add method to abort catalog transaction 2021-05-26 11:02:40 +02:00
Marco Neumann 92fcd7e940 feat: add a way to get OS, server ID and DB name from catalog 2021-05-26 11:02:40 +02:00