kodiakhq[bot]
b57f397057
Merge branch 'main' into crepererum/checkpoint_during_restore
2021-06-14 13:54:03 +00:00
Marco Neumann
0a7dcc3779
test: adjust read-write parquet test to newest test data
2021-06-14 14:24:24 +02:00
Marco Neumann
d6f6ddfdaa
fix: fix NULL handling in parquet stats
2021-06-14 14:24:09 +02:00
Marco Neumann
eae56630fb
test: add test for all-NULL float column metadata
2021-06-14 13:48:34 +02:00
Marco Neumann
3f9bcf7cd9
fix: fix NaN handling in parquet stats
2021-06-14 13:44:52 +02:00
Marco Neumann
ea96210e98
test: enable unblocked test
2021-06-14 13:44:52 +02:00
Marco Neumann
518f7c6f15
refactor: wrap upstream parquet MD into struct + clean up interface
...
This prevents users from `parquet_file::metadata` to also depend on
`parquet` directly. Furthermore they don't need to important dozend of
functions and can instead just use `IoxParquetMetaData` directly.
2021-06-14 13:17:01 +02:00
Marco Neumann
030d0d2b9a
feat: create checkpoint during catalog rebuild
2021-06-14 10:55:56 +02:00
Marco Neumann
df866f72e0
refactor: store parquet metadata in chunk
...
This will be useful for #1381 .
At the moment we parse schema and stats eagerly and store them alongside
the parquet metadata in memory. Technically this is not required since
this is basically duplicate data. In the future we might trade-off some
of this memory against CPU consumption by parsing schema and stats on
demand.
2021-06-14 10:08:31 +02:00
Marco Neumann
e6699ff15a
test: ensure that `find_last_transaction_timestamp` considers checkpoints
2021-06-14 10:04:50 +02:00
Marco Neumann
f8a518bbed
refactor: inline `Table` into `parquet_file::chunk::Chunk`
...
Note that the resulting size estimations are different because we were
double-counting `Table`. `mem::size_of::<Self>()` is recursive for
non-boxed types since the child will be part of the parent structure.
Issue: #1295 .
2021-06-11 11:54:31 +02:00
Marco Neumann
28d1dc4da1
chore: bump preserved catalog version
2021-06-10 16:01:13 +02:00
Marco Neumann
80ee36cd1a
refactor: slightly streamline path parsing code in pres. catalog
2021-06-10 15:59:28 +02:00
Marco Neumann
7e7332c9ce
refactor: make comparison a bit less confusing
2021-06-10 15:42:21 +02:00
Marco Neumann
fd581e2ec9
docs: fix confusion wording in `CatalogState::files`
2021-06-10 15:42:21 +02:00
Marco Neumann
be9b3a4853
fix: protobuf lint fixes
2021-06-10 15:42:21 +02:00
Marco Neumann
294c304491
feat: impl catalog checkpointing infrastructure
...
This implements a way to add checkpoints to the preserved catalog and
speed up replay.
Note: This leaves the "hook it up into the actual DB" for a future PR.
Issue: #1381 .
2021-06-10 15:42:21 +02:00
Marco Neumann
188cacec54
refactor: use `Arc` to pass `ParquetFileMetaData`
...
This will be handy when the catalog state must be able to return
metadata objects so that we can create checkpoints, esp. when we use
multi-chunk parquet files in some midterm future.
2021-06-10 15:42:21 +02:00
Marco Neumann
c7412740e4
refactor: prepare to read and write multiple file types for catalog
...
Prepares #1381 .
2021-06-10 15:42:21 +02:00
Marco Neumann
33e364ed78
feat: add encoding info to transaction protobuf
...
This should help with #1381 .
2021-06-10 15:42:21 +02:00
Marco Neumann
4fe2d7af9c
chore: enforce `clippy::future_not_send` for `parquet_file`
2021-06-09 18:18:27 +02:00
Andrew Lamb
ab0aed0f2e
refactor: Remove a layer of channels in parquet read stream ( #1648 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-07 16:47:04 +00:00
Raphael Taylor-Davies
1e7ef193a6
refactor: use field metadata to store influx types ( #1642 )
...
* refactor: use field metadata to store influx types
make SchemaBuilder non-consuming
* chore: remove unused variants
* chore: fix lints
2021-06-07 13:26:39 +00:00
Marco Neumann
c830542464
feat: add info log when cleanup limit is reached
2021-06-04 11:12:29 +02:00
Marco Neumann
91df8a30e7
feat: limit number of files during storage cleanup
...
Since the number of parquet files can potentially be unbound (aka very
very large) and we do not want to hold the transaction lock for too
long and also want to limit memory consumption of the cleanup routine,
let's limit the number of files that we collect for cleanup.
2021-06-03 17:43:11 +02:00
Marco Neumann
85139abbbb
fix: use structured logging for cleanup logs
2021-06-03 11:23:29 +02:00
Andrew Lamb
32c6ed1f34
refactor: More cleanup related to multi-table chunks ( #1604 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-02 17:00:23 +00:00
Marco Neumann
e5b65e10ac
test: ensure that `find_last_transaction_timestamp` indeed returns the last timestamp
2021-06-02 10:15:06 +02:00
Marco Neumann
98e413d5a9
fix: do not unwrap broken timestamps in serialized catalog
2021-06-02 10:15:06 +02:00
Marco Neumann
fc0a74920f
fix: use clearer error text
2021-06-02 09:41:19 +02:00
Marco Neumann
2a0b2698c6
fix: use structured logging
...
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-06-02 09:41:19 +02:00
Marco Neumann
64bf8c5182
docs: add code comment explaining why we parse transaction timestamps
...
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-06-02 09:41:19 +02:00
Marco Neumann
77aeb5ca5d
refactor: use protobuf-native Timestamp instead of string
2021-06-02 09:41:19 +02:00
Marco Neumann
9b9400803b
refactor!: bump transaction version to 2
2021-06-02 09:41:19 +02:00
Marco Neumann
5f77b7b92b
feat: add `parquet_file::catalog::find_last_transaction_timestamp`
2021-06-02 09:41:19 +02:00
Marco Neumann
9aee961e2a
test: test loading catalogs from broken protobufs
2021-06-02 09:41:19 +02:00
Marco Neumann
0a625b50e6
feat: store transaction timestamp in preserved catalog
2021-06-02 09:41:19 +02:00
Andrew Lamb
d8fbb7b410
refactor: Remove last vestiges of multi-table chunks from PartitionChunk API ( #1588 )
...
* refactor: Remove last vestiges of multi-table chunks from PartitionChunk API
* fix: remove test that can no longer fail
* fix: update tests + code review comments
* fix: clippy
* fix: clippy
* fix: restore test_measurement_fields_error test
2021-06-01 16:12:33 +00:00
Andrew Lamb
d3711a5591
refactor: Use ParquetExec from DataFusion to read parquet files ( #1580 )
...
* refactor: use ParquetExec to read parquet files
* fix: test
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-01 14:44:07 +00:00
Andrew Lamb
64328dcf1c
feat: cache schema on catalog chunks too ( #1575 )
2021-06-01 12:42:46 +00:00
Andrew Lamb
00e735ef0d
chore: remove unused dependencies ( #1583 )
2021-05-29 10:31:57 +00:00
Raphael Taylor-Davies
db432de137
feat: add distinct count to StatValues ( #1568 )
2021-05-28 17:41:34 +00:00
kodiakhq[bot]
6098c7cd00
Merge branch 'main' into crepererum/issue1376
2021-05-28 07:13:15 +00:00
Andrew Lamb
f3bec93ef1
feat: Cache TableSummary in Catalog rather than computing it on demand ( #1569 )
...
* feat: Cache `TableSummary` in catalog Chunks
* refactor: use consistent table summary
2021-05-27 16:03:05 +00:00
Marco Neumann
dd2a976907
feat: add a flag to ignore metadata errors during catalog rebuild
2021-05-27 13:10:14 +02:00
Marco Neumann
bc7389dc38
fix: fix typo
...
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-05-27 12:51:01 +02:00
Marco Neumann
48307e4ab2
docs: adjust error description to reflect internal errors
...
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-05-27 12:51:01 +02:00
Marco Neumann
d6f0dc7059
feat: implement catalog rebuilding from files
...
Closes #1376 .
2021-05-27 12:51:01 +02:00
Marco Neumann
024323912a
docs: explain what `PreservedCatalog::wipe` offers
2021-05-27 12:48:41 +02:00
Raphael Taylor-Davies
4fcc04e6c9
chore: enable arrow prettyprint feature ( #1566 )
2021-05-27 10:28:14 +00:00