influxdb

Commit Graph

Author	SHA1	Message	Date
Marco Neumann	b8aa4c33ce	refactor: use protobuf bytes for transaction UUIDs	2021-10-05 12:27:48 +02:00
Marco Neumann	10c1a72402	refactor: remove unused fields from `DeletePredicate`	2021-10-05 09:29:24 +02:00
Marco Neumann	97881079e8	refactor: make `ChunkOrder` non-zero This will make it easier to handle missing values. Helps with #2633.	2021-10-04 17:49:12 +02:00
Marco Neumann	75ac6e8646	refactor: make `DeletePredicate::range` non-optional	2021-10-04 16:36:20 +02:00
Marco Neumann	d1835a3eee	fix: doc links	2021-10-04 16:36:20 +02:00
Marco Neumann	5a5a929b9e	refactor: introduce `DeletePredicate` `DeletePredicate` is a simpler version of `Predicate` that is based on IOx `DeleteExpr` instead of the full-blown DataFusion `Expr`. This will allow us to do a couple of things (in follow-up changes): - Order and de-duplicate delete predicates - Normalize predicates - Infallible serialization - Smaller memory footprint Note that this change only affects delete expressions. Query expressions that are supported via the API are not changed. The query subsystem also still uses the full-featured expressions/predicates (delete expressions/predicates are converted to the more powerful DataFusion version on-the-fly).	2021-10-04 16:36:20 +02:00
Edd Robinson	e72f7e958c	test: update expected results	2021-10-04 12:20:21 +01:00
Andrew Lamb	7316f3407a	fix: Reduce log noise when no files are deleted (#2671 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-09-30 08:55:30 +00:00
Carol (Nichols \|\| Goulding)	92583aee82	fix: Remove streaming API since we're not streaming anyway	2021-09-29 08:19:32 -04:00
Carol (Nichols \|\| Goulding)	d05528bcfd	refactor: Use s3_request for put requests Which meant we also needed to change the byte stream to be a closure that can generate a byte stream	2021-09-29 08:19:32 -04:00
Raphael Taylor-Davies	86cee568d5	feat: use upstream pbjson (#2650 ) * feat: use upstream pbjson * chore: fmt	2021-09-28 16:29:26 +00:00
kodiakhq[bot]	b16e7ea91a	Merge branch 'main' into crepererum/issue2518c	2021-09-22 16:09:04 +00:00
Marco Neumann	d7b697dfe9	chore: remove unused `object_store` => `tracker` dep	2021-09-22 11:13:40 +02:00
Marco Neumann	981ee0c6df	refactor: accept unknown chunks in persisted delete predicates Due to the timing of the "persist" lifecycle action and that delete predicates might arrive at any time + the fact that we don't wanna hold transaction locks for too long, we should accept delete predicates for chunks that are currently "persisting" even though that lifecycle action might fail.	2021-09-22 09:29:50 +02:00
Marco Neumann	6682178d6f	feat: teach preserved catalog to handle delete predicates	2021-09-20 15:51:14 +02:00
Marco Neumann	cef5aeee52	refactor: introduce `ChunkId` type	2021-09-20 13:10:41 +02:00
Marco Neumann	acf698c366	fix: delete predicate sorting	2021-09-20 10:48:32 +02:00
Marco Neumann	0c5ba3786b	refactor: rename closure to make syntax a bit clearer	2021-09-20 10:48:32 +02:00
Marco Neumann	4c4fd59724	docs: extend comment about (not) cleanup up delete predicates	2021-09-20 10:48:32 +02:00
Marco Neumann	492d991f49	feat: delete catalog pres. catalog <=> in-mem catalog API First step towards #2518. Creates the Rust API to communicate delete predicates between the preserved catalog and the in-memory catalog and adds tests ensuring that the in-mem catalog produces the wanted errors as well as correct checkpoints (similar to how this is done for the parquet file tracking already). This does NOT contain the actual preservation!	2021-09-20 10:48:32 +02:00
Marco Neumann	831e55d79e	refactor: make error messages more precise	2021-09-20 09:42:55 +02:00
Marco Neumann	9c80d32af5	refactor: use normal google timestamps in parquet metadata again We changed from Google timestamp (which use variable-sized integers) to our own fixed-sized integer timestamps so that the size of the parquet metadata does not depend on the timestamp. However with the introduction of compression this is the case anyways (since slightly different timestamps lead to different compression results) and we need now derministic timestamps for tests. So there is now point in using our own timestamp type. Switching back to the variable-sized type also shrinks the post-compression results a bit.	2021-09-20 09:34:03 +02:00
Marco Neumann	afc507ae14	feat: compress encoded parquet metadata Depending on the number of columns, this should safe between 60% and 75%.	2021-09-20 09:33:18 +02:00
Marco Neumann	2820db5583	refactor: split preserved catalog `api` into `core` and `interface` This makes it clearer which traits and functions users of the preserved catalog must implement. This also splits the error types into smaller enums that are easier to understand. This change should make it easier to implement new functionality (like capturing delete predicates).	2021-09-16 10:30:11 +02:00
Raphael Taylor-Davies	c66095cad1	feat: remove metrics crate (#2552 )	2021-09-15 19:43:33 +00:00
kodiakhq[bot]	de732b4273	Merge branch 'main' into crepererum/parquet_file_wo_query	2021-09-15 07:15:19 +00:00
Marco Neumann	509c07330d	refactor: decouple `parquet_file` from `query`	2021-09-14 18:26:16 +02:00
kodiakhq[bot]	d60aa5940b	Merge branch 'main' into crepererum/chunk_order_type	2021-09-14 16:25:17 +00:00
Marco Neumann	bfaba78dc3	refactor: move `predicate` into its own crate Two reasons: 1. I wanna decouple `parquet_file` from `query` (nearly done, needs a small follow-up PR). 2. `predicate` will have more and more features (like serialization) which justifies a new home	2021-09-14 17:13:02 +02:00
Marco Neumann	becef1c75f	refactor: introduce `ChunkOrder` type	2021-09-14 17:10:23 +02:00
Marco Neumann	1d8edd4683	fix: metadata size increased	2021-09-14 13:03:26 +02:00
Marco Neumann	45cb00d8c0	refactor: track chunk order in chunks	2021-09-14 13:00:55 +02:00
Marco Neumann	4769b67d14	feat: API-level code to prune old transaction from catalog	2021-09-14 10:26:38 +02:00
Marco Neumann	f93984cd94	refactor: clarify wording Co-authored-by: Andrew Lamb <alamb@influxdata.com>	2021-09-14 09:43:55 +02:00
Marco Neumann	e7edb65b1d	feat: show number of stripped bytes in catalog dump	2021-09-14 09:43:55 +02:00
Raphael Taylor-Davies	44918e4afc	feat: migrate chunk metrics (#2491 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-09-09 16:02:16 +00:00
Marco Neumann	4a863993ec	feat: "dump catalog" debug CLI	2021-09-02 08:08:20 +02:00
Marco Neumann	581ee64049	feat: add functions to dump catalog data to text	2021-09-02 08:07:07 +02:00
Marco Neumann	06c941d798	refactor: split up `make_record_batch`	2021-09-01 11:26:05 +02:00
Marco Neumann	6ce586a2ac	docs: add docstrings to `PreservedCatalog` members	2021-09-01 11:26:05 +02:00
Marco Neumann	70a5ffeae7	test: allow creation of deterministic chunks and transactions	2021-09-01 11:26:05 +02:00
Marco Neumann	06833110ab	test: allow creation of less complex parquet chunks	2021-09-01 11:26:05 +02:00
Marco Neumann	27248850e5	refactor: use `byte::Bytes` for metadata in protobuf messages That simplifies printing a bit since we `Vec<u8>` prints quite badly.	2021-09-01 11:26:05 +02:00
Marco Neumann	a312f81bf2	refactor: move `storage_testing` to `storage::tests`	2021-08-27 15:59:59 +02:00
Marco Neumann	a2efe3299d	refactor: restructure catalog code in `parquet_file` No functional change (except for slightly changing error messages). This will make it easier to add more functionality.	2021-08-27 15:06:31 +02:00
Carol (Nichols \|\| Goulding)	7ca177978e	fix: Add missing await from a logical merge conflict	2021-08-26 09:27:16 -04:00
Carol (Nichols \|\| Goulding)	18ba3b5c59	feat: Create database directories with a generation ID	2021-08-26 09:14:22 -04:00
Marco Neumann	026202a05c	fix: correctly account for parquet metadata size We need to hold the parquet metadata in memory so that we're able to create catalog checkpoints. We used to do that by holding the decoded structure (provided by the upstream `parquet` crate) in memory and serializing that data on demand to Apache Thrift. There are two drawbacks: 1. We did not account for the memory usage of the decoded structures (or at least not fully). 2. We actually don't need the decoded data in-memory, since for the checkpoint creation we only need to write the serialized data. So this PR changes our wrapper so it holds the serialized data which is then only decoded when it's really necessary. Since the serialized data is a simple byte vector, we can also easily account for the size. Note that this makes the accounted size of parquet chunks larger. However this data was always there, we just ignored it up until now. If the size of the parquet metadata really becomes an issue, we could trait some CPU time for memory by compressing it.	2021-08-26 13:24:32 +02:00
Andrew Lamb	3ca0d5d42f	Merge branch 'main' into cn/bump	2021-08-19 14:08:49 -04:00
Raphael Taylor-Davies	b0e8b75a8a	fix: TestCatalogState unique chunk ID	2021-08-19 17:19:12 +01:00

1 2 3 4 5 ...

279 Commits (e525d75c9a7aba7d6fb633025cee70fc4cc85431)