influxdb

Commit Graph

Author	SHA1	Message	Date
kodiakhq[bot]	b57f397057	Merge branch 'main' into crepererum/checkpoint_during_restore	2021-06-14 13:54:03 +00:00
Marco Neumann	0a7dcc3779	test: adjust read-write parquet test to newest test data	2021-06-14 14:24:24 +02:00
Marco Neumann	d6f6ddfdaa	fix: fix NULL handling in parquet stats	2021-06-14 14:24:09 +02:00
Marco Neumann	eae56630fb	test: add test for all-NULL float column metadata	2021-06-14 13:48:34 +02:00
Marco Neumann	3f9bcf7cd9	fix: fix NaN handling in parquet stats	2021-06-14 13:44:52 +02:00
Marco Neumann	ea96210e98	test: enable unblocked test	2021-06-14 13:44:52 +02:00
Marco Neumann	518f7c6f15	refactor: wrap upstream parquet MD into struct + clean up interface This prevents users from `parquet_file::metadata` to also depend on `parquet` directly. Furthermore they don't need to important dozend of functions and can instead just use `IoxParquetMetaData` directly.	2021-06-14 13:17:01 +02:00
Marco Neumann	030d0d2b9a	feat: create checkpoint during catalog rebuild	2021-06-14 10:55:56 +02:00
Marco Neumann	df866f72e0	refactor: store parquet metadata in chunk This will be useful for #1381. At the moment we parse schema and stats eagerly and store them alongside the parquet metadata in memory. Technically this is not required since this is basically duplicate data. In the future we might trade-off some of this memory against CPU consumption by parsing schema and stats on demand.	2021-06-14 10:08:31 +02:00
Marco Neumann	e6699ff15a	test: ensure that `find_last_transaction_timestamp` considers checkpoints	2021-06-14 10:04:50 +02:00
Marco Neumann	f8a518bbed	refactor: inline `Table` into `parquet_file::chunk::Chunk` Note that the resulting size estimations are different because we were double-counting `Table`. `mem::size_of::<Self>()` is recursive for non-boxed types since the child will be part of the parent structure. Issue: #1295.	2021-06-11 11:54:31 +02:00
Marco Neumann	28d1dc4da1	chore: bump preserved catalog version	2021-06-10 16:01:13 +02:00
Marco Neumann	80ee36cd1a	refactor: slightly streamline path parsing code in pres. catalog	2021-06-10 15:59:28 +02:00
Marco Neumann	7e7332c9ce	refactor: make comparison a bit less confusing	2021-06-10 15:42:21 +02:00
Marco Neumann	fd581e2ec9	docs: fix confusion wording in `CatalogState::files`	2021-06-10 15:42:21 +02:00
Marco Neumann	be9b3a4853	fix: protobuf lint fixes	2021-06-10 15:42:21 +02:00
Marco Neumann	294c304491	feat: impl catalog checkpointing infrastructure This implements a way to add checkpoints to the preserved catalog and speed up replay. Note: This leaves the "hook it up into the actual DB" for a future PR. Issue: #1381.	2021-06-10 15:42:21 +02:00
Marco Neumann	188cacec54	refactor: use `Arc` to pass `ParquetFileMetaData` This will be handy when the catalog state must be able to return metadata objects so that we can create checkpoints, esp. when we use multi-chunk parquet files in some midterm future.	2021-06-10 15:42:21 +02:00
Marco Neumann	c7412740e4	refactor: prepare to read and write multiple file types for catalog Prepares #1381.	2021-06-10 15:42:21 +02:00
Marco Neumann	33e364ed78	feat: add encoding info to transaction protobuf This should help with #1381.	2021-06-10 15:42:21 +02:00
Marco Neumann	4fe2d7af9c	chore: enforce `clippy::future_not_send` for `parquet_file`	2021-06-09 18:18:27 +02:00
Andrew Lamb	ab0aed0f2e	refactor: Remove a layer of channels in parquet read stream (#1648 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-07 16:47:04 +00:00
Raphael Taylor-Davies	1e7ef193a6	refactor: use field metadata to store influx types (#1642 ) * refactor: use field metadata to store influx types make SchemaBuilder non-consuming * chore: remove unused variants * chore: fix lints	2021-06-07 13:26:39 +00:00
Marco Neumann	c830542464	feat: add info log when cleanup limit is reached	2021-06-04 11:12:29 +02:00
Marco Neumann	91df8a30e7	feat: limit number of files during storage cleanup Since the number of parquet files can potentially be unbound (aka very very large) and we do not want to hold the transaction lock for too long and also want to limit memory consumption of the cleanup routine, let's limit the number of files that we collect for cleanup.	2021-06-03 17:43:11 +02:00
Marco Neumann	85139abbbb	fix: use structured logging for cleanup logs	2021-06-03 11:23:29 +02:00
Andrew Lamb	32c6ed1f34	refactor: More cleanup related to multi-table chunks (#1604 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-02 17:00:23 +00:00
Marco Neumann	e5b65e10ac	test: ensure that `find_last_transaction_timestamp` indeed returns the last timestamp	2021-06-02 10:15:06 +02:00
Marco Neumann	98e413d5a9	fix: do not unwrap broken timestamps in serialized catalog	2021-06-02 10:15:06 +02:00
Marco Neumann	fc0a74920f	fix: use clearer error text	2021-06-02 09:41:19 +02:00
Marco Neumann	2a0b2698c6	fix: use structured logging Co-authored-by: Andrew Lamb <alamb@influxdata.com>	2021-06-02 09:41:19 +02:00
Marco Neumann	64bf8c5182	docs: add code comment explaining why we parse transaction timestamps Co-authored-by: Andrew Lamb <alamb@influxdata.com>	2021-06-02 09:41:19 +02:00
Marco Neumann	77aeb5ca5d	refactor: use protobuf-native Timestamp instead of string	2021-06-02 09:41:19 +02:00
Marco Neumann	9b9400803b	refactor!: bump transaction version to 2	2021-06-02 09:41:19 +02:00
Marco Neumann	5f77b7b92b	feat: add `parquet_file::catalog::find_last_transaction_timestamp`	2021-06-02 09:41:19 +02:00
Marco Neumann	9aee961e2a	test: test loading catalogs from broken protobufs	2021-06-02 09:41:19 +02:00
Marco Neumann	0a625b50e6	feat: store transaction timestamp in preserved catalog	2021-06-02 09:41:19 +02:00
Andrew Lamb	d8fbb7b410	refactor: Remove last vestiges of multi-table chunks from PartitionChunk API (#1588 ) * refactor: Remove last vestiges of multi-table chunks from PartitionChunk API * fix: remove test that can no longer fail * fix: update tests + code review comments * fix: clippy * fix: clippy * fix: restore test_measurement_fields_error test	2021-06-01 16:12:33 +00:00
Andrew Lamb	d3711a5591	refactor: Use ParquetExec from DataFusion to read parquet files (#1580 ) * refactor: use ParquetExec to read parquet files * fix: test Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-01 14:44:07 +00:00
Andrew Lamb	64328dcf1c	feat: cache schema on catalog chunks too (#1575 )	2021-06-01 12:42:46 +00:00
Andrew Lamb	00e735ef0d	chore: remove unused dependencies (#1583 )	2021-05-29 10:31:57 +00:00
Raphael Taylor-Davies	db432de137	feat: add distinct count to StatValues (#1568 )	2021-05-28 17:41:34 +00:00
kodiakhq[bot]	6098c7cd00	Merge branch 'main' into crepererum/issue1376	2021-05-28 07:13:15 +00:00
Andrew Lamb	f3bec93ef1	feat: Cache TableSummary in Catalog rather than computing it on demand (#1569 ) * feat: Cache `TableSummary` in catalog Chunks * refactor: use consistent table summary	2021-05-27 16:03:05 +00:00
Marco Neumann	dd2a976907	feat: add a flag to ignore metadata errors during catalog rebuild	2021-05-27 13:10:14 +02:00
Marco Neumann	bc7389dc38	fix: fix typo Co-authored-by: Andrew Lamb <alamb@influxdata.com>	2021-05-27 12:51:01 +02:00
Marco Neumann	48307e4ab2	docs: adjust error description to reflect internal errors Co-authored-by: Andrew Lamb <alamb@influxdata.com>	2021-05-27 12:51:01 +02:00
Marco Neumann	d6f0dc7059	feat: implement catalog rebuilding from files Closes #1376.	2021-05-27 12:51:01 +02:00
Marco Neumann	024323912a	docs: explain what `PreservedCatalog::wipe` offers	2021-05-27 12:48:41 +02:00
Raphael Taylor-Davies	4fcc04e6c9	chore: enable arrow prettyprint feature (#1566 )	2021-05-27 10:28:14 +00:00

1 2 3

138 Commits (a449d5ef7433fcadcffe5991971c28849be69541)