influxdb

Commit Graph

Author	SHA1	Message	Date
Raphael Taylor-Davies	c5cf03511c	fix: parquet column count statistics (#2124 ) (#3444 ) * fix: parquet metadata total_count (#2124) * chore: review feedback Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-01-11 21:56:24 +00:00
Marco Neumann	f3f6f335a9	chore: upgrade to snafu 0.7 (#3440 )	2022-01-11 19:22:36 +00:00
Marco Neumann	37bb7f2120	chore: `cargo update` dependabot currently doesn't work due to https://github.com/dependabot/dependabot-core/issues/4574 Excluded `quote` due to https://github.com/dtolnay/quote/issues/204	2022-01-11 14:57:51 +01:00
Nga Tran	ec8644a39a	refactor: return clearer error message	2021-12-07 12:24:28 -05:00
Nga Tran	561c5ed8e7	refactor: make checking no data happen during reading inout stream	2021-12-07 12:03:41 -05:00
Nga Tran	c992c82582	chore: Merge branch 'main' into ntran/compact_os_tests	2021-12-07 11:08:12 -05:00
Raphael Taylor-Davies	5fdaa5b4ab	chore: don't panic with invalid parquet (#3309 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-12-06 21:15:35 +00:00
Carol (Nichols \|\| Goulding)	7499eac067	fix: Disable uuid serde feature; we're not actually serializing any UUIDs Connects to #3117.	2021-12-06 09:37:31 -05:00
Carol (Nichols \|\| Goulding)	02c297e850	fix: Always specify the parking_lot feature of tokio to get potential perf boost	2021-12-06 09:37:15 -05:00
Carol (Nichols \|\| Goulding)	0b24b3c227	fix: Use a consistent version specifier when depending on the futures crate	2021-12-06 09:37:12 -05:00
Raphael Taylor-Davies	bca561366b	feat: don't copy parquet files out of disk object store (#3282 ) (#3293 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-12-05 16:31:40 +00:00
Raphael Taylor-Davies	11067bfe3f	feat: simplify parquet reader (#3282 ) (#3291 ) * feat: simplify parquet reader (#3282) * chore: add back log line	2021-12-03 23:21:58 +00:00
Nga Tran	86f9fe0bcb	refactor: no longer need to create and test no-row-groups parquet files	2021-12-03 15:14:04 -05:00
Nga Tran	152281e428	fix: Capture the right 'no data' while parquet has no data	2021-12-03 12:19:48 -05:00
kodiakhq[bot]	2857b6a990	Merge branch 'main' into er/feat/load_chunk_cli	2021-12-02 20:20:56 +00:00
Edd Robinson	b4ea9887ba	refactor: error name	2021-12-02 20:14:02 +00:00
Carol (Nichols \|\| Goulding)	5d0fd1c603	fix: Allow dead code on fields that are now detected as never read	2021-12-02 11:52:01 -05:00
Edd Robinson	88aedc556e	feat: add FromStr implementation	2021-12-02 12:59:52 +00:00
Nga Tran	bf74608dc8	docs: not persist of the input stream is empty	2021-12-01 17:53:19 -05:00
Nga Tran	f085af034e	refactor: not persist empty chunk resulting from deleting & deduplicating (#3274 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-12-01 20:57:30 +00:00
Nga Tran	f53cdca010	feat: handling empty compacted stream	2021-11-30 18:13:36 -05:00
Raphael Taylor-Davies	197634ed50	feat: reload chunk back into read buffer (#3209 ) (#3216 ) * feat: reload chunk back into read buffer (#3209) * chore: fix logical conflict Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-11-29 11:34:55 +00:00
kodiakhq[bot]	d16a7759ca	Merge branch 'main' into cn/workspace-hack	2021-11-22 17:05:31 +00:00
Raphael Taylor-Davies	73d60539ad	refactor: use ChunkGenerator in parquet_catalog (#2209 ) (#3167 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-11-22 10:29:33 +00:00
Carol (Nichols \|\| Goulding)	9fd4a560f5	feat: Results of running cargo hakari manage-deps	2021-11-19 09:21:57 -05:00
Raphael Taylor-Davies	ca4e0ad13b	refactor: add parquet chunk generator (#2209 ) (#3163 ) * refactor: add parquet chunk generator (#2209) * fix: tests Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-11-19 12:35:18 +00:00
Carol (Nichols \|\| Goulding)	c8d80e5c28	fix: Change database paths to be under /dbs/ instead of under /[server id]/	2021-11-05 10:14:06 -04:00
Andrew Lamb	1902c4f8a9	chore: Update DataFusion (#3012 ) * chore: Update DataFusion * fix: restore Cargo.log Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-11-02 18:06:21 +00:00
Marco Neumann	4c9570b519	refactor: move `catalog` protobuf to `preserved_catalog` This makes it clearer what's going since the contained messages are only for the preserved part, not the in-mem catalog and its management.	2021-11-01 18:07:25 +01:00
dependabot[bot]	c540b40f05	chore(deps): bump tokio from 1.12.0 to 1.13.0 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.12.0 to 1.13.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.12.0...tokio-1.13.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2021-11-01 11:21:59 +00:00
Carol (Nichols \|\| Goulding)	990f768cda	fix: Assign a UUID when creating a database	2021-10-28 13:20:28 -04:00
Carol (Nichols \|\| Goulding)	8198c1ff2a	refactor: Rename IoxObjectStore constructors to better match what server does with Databases	2021-10-28 13:20:27 -04:00
Marco Neumann	bc7244c48e	chore: use Rust edition 2021	2021-10-25 10:58:20 +02:00
Andrew Lamb	a82dc6f5f0	chore: Update datafusion + arrow (#2903 ) * chore: Update datafusion to latest, arrow to 6.0.0 * fix: Update tests * fix: bubble internal error Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-10-19 17:14:08 +00:00
Marco Neumann	d8f35d8ee9	chore: remove unused `parquet_file` => `chrono` dep	2021-10-19 14:45:56 +02:00
Marco Neumann	28195b9c0c	chore: new `parquet_catalog` crate	2021-10-14 14:34:59 +02:00
Andrew Lamb	0568452a0c	chore: Update datafusion (#2838 ) * chore: update datafusion version * refactor: Update to use new datafusion apis * fix: do not upgrade other packages	2021-10-13 20:51:19 +00:00
Marco Neumann	1523e0edcd	refactor: clean up preserved catalog interface 1. Remove `new_empty` logic. It's a leftover from the time when the `PreservedCatalog` owned the in-memory catalog. 2. Make `db_name` a part of the `PreservedCatalogConfig`.	2021-10-13 13:58:11 +02:00
Raphael Taylor-Davies	8414e6edbb	feat: migrate preserved catalog to TimeProvider (#2722 ) (#2808 ) * feat: migrate preserved catalog to TimeProvider (#2722) * fix: deterministic catalog prune tests * fix: failing test Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-10-12 14:43:05 +00:00
Raphael Taylor-Davies	3dfe400e6b	feat: migrate write path to TimeProvider (#2722 ) (#2807 )	2021-10-12 12:09:08 +00:00
Raphael Taylor-Davies	b39e01f7ba	feat: migrate PersistenceWindows to TimeProvider (#2722 ) (#2798 )	2021-10-11 20:40:00 +00:00
Raphael Taylor-Davies	06c2c23322	refactor: create PreservedCatalogConfig struct (#2793 ) * refactor: create PreservedCatalogConfig struct * chore: fmt Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-10-11 15:43:05 +00:00
Carol (Nichols \|\| Goulding)	5da2f7b1b0	Merge branch 'main' into cn/less-database-name	2021-10-11 10:35:42 -04:00
Raphael Taylor-Davies	afe34751e7	refactor: split out schema crate (#2781 ) * refactor: split out schema crate * chore: fix doc	2021-10-11 09:45:08 +00:00
Carol (Nichols \|\| Goulding)	8407735e00	fix: Pass the database name into PreservedCatalog	2021-10-08 15:25:10 -04:00
Carol (Nichols \|\| Goulding)	276aef69c9	refactor: Move PreservedCatalog test helper functions to test helpers and use them more	2021-10-08 15:25:10 -04:00
Carol (Nichols \|\| Goulding)	3aff4fcb07	refactor: Extract test helper functions for common catalog operations This will make the next change easier, and I think it makes the tests easier to read.	2021-10-08 15:25:10 -04:00
kodiakhq[bot]	559a7e0221	Merge branch 'main' into cn/chunk-addr-smaller	2021-10-08 17:26:20 +00:00
Carol (Nichols \|\| Goulding)	fbe76935f4	fix: Remove some calls to iox_object_store.database_name	2021-10-08 09:50:14 -04:00
Marco Neumann	64bda1fc08	feat: improve `Debug`/`Display` for test `ChunkId`s	2021-10-08 13:55:56 +02:00
Marco Neumann	d3de6bb6e4	refactor: `max_persisted_timestamp` => `flush_timestamp` There might be data left before this timestamp that wasn't persisted (e.g. incoming data while the persistence was running).	2021-10-08 12:36:23 +02:00
Marco Neumann	63a932fa37	refactor: "min unpersisted ts" => "max persisted ts" Store the "maximum persisted timestamp" instead of the "minimum unpersisted timestamp". This avoids the need to calculate the next timestamp from the current one (which was done via "max TS + 1ns"). The old calculation was prone to overflow panics. Since the timestamps in this calculation originate from user-provided data (and not the wall clock), this was an easy DoS vector that could be triggered via the following line protocol: ```text table_1 foo=1 <i64::MAX> ``` which is ```text table_1 foo=1 9223372036854775807 ``` Bonus points: the timestamp persisted in the partition checkpoints is now the very same that was used by the split query during persistence. Consistence FTW! Fixes #2225.	2021-10-08 11:52:49 +02:00
kodiakhq[bot]	7d6be3f500	Merge branch 'main' into crepererum/issue2748	2021-10-07 09:04:18 +00:00
Marco Neumann	63d74be490	refactor: make `ChunkId` a UUID	2021-10-07 10:23:27 +02:00
Marco Neumann	2a52fd90d9	fix: transaction pruning logic for "nothing to do"	2021-10-07 10:14:42 +02:00
kodiakhq[bot]	d72a494198	Merge branch 'main' into crepererum/in_mem_expr_part5	2021-10-05 16:20:24 +00:00
Marco Neumann	b8aa4c33ce	refactor: use protobuf bytes for transaction UUIDs	2021-10-05 12:27:48 +02:00
Marco Neumann	bb7a27e5ed	refactor: use proper sets during delete predicate collection We no longer need hacky pointer tricks to de-duplicate delete predicates when collecting them for catalog checkpoints. This was once required when the delete predicates didn't implement `Eq` and `Hash` but now it's all way easier.	2021-10-05 10:37:34 +02:00
Marco Neumann	28ccf2a8c3	refactor: `TransactionHandle::delete_predicate` cannot fail	2021-10-05 09:41:46 +02:00
Marco Neumann	10c1a72402	refactor: remove unused fields from `DeletePredicate`	2021-10-05 09:29:24 +02:00
Marco Neumann	97881079e8	refactor: make `ChunkOrder` non-zero This will make it easier to handle missing values. Helps with #2633.	2021-10-04 17:49:12 +02:00
Marco Neumann	75ac6e8646	refactor: make `DeletePredicate::range` non-optional	2021-10-04 16:36:20 +02:00
Marco Neumann	d1835a3eee	fix: doc links	2021-10-04 16:36:20 +02:00
Marco Neumann	5a5a929b9e	refactor: introduce `DeletePredicate` `DeletePredicate` is a simpler version of `Predicate` that is based on IOx `DeleteExpr` instead of the full-blown DataFusion `Expr`. This will allow us to do a couple of things (in follow-up changes): - Order and de-duplicate delete predicates - Normalize predicates - Infallible serialization - Smaller memory footprint Note that this change only affects delete expressions. Query expressions that are supported via the API are not changed. The query subsystem also still uses the full-featured expressions/predicates (delete expressions/predicates are converted to the more powerful DataFusion version on-the-fly).	2021-10-04 16:36:20 +02:00
Edd Robinson	e72f7e958c	test: update expected results	2021-10-04 12:20:21 +01:00
Andrew Lamb	7316f3407a	fix: Reduce log noise when no files are deleted (#2671 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-09-30 08:55:30 +00:00
Carol (Nichols \|\| Goulding)	92583aee82	fix: Remove streaming API since we're not streaming anyway	2021-09-29 08:19:32 -04:00
Carol (Nichols \|\| Goulding)	d05528bcfd	refactor: Use s3_request for put requests Which meant we also needed to change the byte stream to be a closure that can generate a byte stream	2021-09-29 08:19:32 -04:00
Raphael Taylor-Davies	86cee568d5	feat: use upstream pbjson (#2650 ) * feat: use upstream pbjson * chore: fmt	2021-09-28 16:29:26 +00:00
kodiakhq[bot]	b16e7ea91a	Merge branch 'main' into crepererum/issue2518c	2021-09-22 16:09:04 +00:00
Marco Neumann	d7b697dfe9	chore: remove unused `object_store` => `tracker` dep	2021-09-22 11:13:40 +02:00
Marco Neumann	981ee0c6df	refactor: accept unknown chunks in persisted delete predicates Due to the timing of the "persist" lifecycle action and that delete predicates might arrive at any time + the fact that we don't wanna hold transaction locks for too long, we should accept delete predicates for chunks that are currently "persisting" even though that lifecycle action might fail.	2021-09-22 09:29:50 +02:00
Marco Neumann	6682178d6f	feat: teach preserved catalog to handle delete predicates	2021-09-20 15:51:14 +02:00
Marco Neumann	cef5aeee52	refactor: introduce `ChunkId` type	2021-09-20 13:10:41 +02:00
Marco Neumann	acf698c366	fix: delete predicate sorting	2021-09-20 10:48:32 +02:00
Marco Neumann	0c5ba3786b	refactor: rename closure to make syntax a bit clearer	2021-09-20 10:48:32 +02:00
Marco Neumann	4c4fd59724	docs: extend comment about (not) cleanup up delete predicates	2021-09-20 10:48:32 +02:00
Marco Neumann	492d991f49	feat: delete catalog pres. catalog <=> in-mem catalog API First step towards #2518. Creates the Rust API to communicate delete predicates between the preserved catalog and the in-memory catalog and adds tests ensuring that the in-mem catalog produces the wanted errors as well as correct checkpoints (similar to how this is done for the parquet file tracking already). This does NOT contain the actual preservation!	2021-09-20 10:48:32 +02:00
Marco Neumann	831e55d79e	refactor: make error messages more precise	2021-09-20 09:42:55 +02:00
Marco Neumann	9c80d32af5	refactor: use normal google timestamps in parquet metadata again We changed from Google timestamp (which use variable-sized integers) to our own fixed-sized integer timestamps so that the size of the parquet metadata does not depend on the timestamp. However with the introduction of compression this is the case anyways (since slightly different timestamps lead to different compression results) and we need now derministic timestamps for tests. So there is now point in using our own timestamp type. Switching back to the variable-sized type also shrinks the post-compression results a bit.	2021-09-20 09:34:03 +02:00
Marco Neumann	afc507ae14	feat: compress encoded parquet metadata Depending on the number of columns, this should safe between 60% and 75%.	2021-09-20 09:33:18 +02:00
Marco Neumann	2820db5583	refactor: split preserved catalog `api` into `core` and `interface` This makes it clearer which traits and functions users of the preserved catalog must implement. This also splits the error types into smaller enums that are easier to understand. This change should make it easier to implement new functionality (like capturing delete predicates).	2021-09-16 10:30:11 +02:00
Raphael Taylor-Davies	c66095cad1	feat: remove metrics crate (#2552 )	2021-09-15 19:43:33 +00:00
kodiakhq[bot]	de732b4273	Merge branch 'main' into crepererum/parquet_file_wo_query	2021-09-15 07:15:19 +00:00
Marco Neumann	509c07330d	refactor: decouple `parquet_file` from `query`	2021-09-14 18:26:16 +02:00
kodiakhq[bot]	d60aa5940b	Merge branch 'main' into crepererum/chunk_order_type	2021-09-14 16:25:17 +00:00
Marco Neumann	bfaba78dc3	refactor: move `predicate` into its own crate Two reasons: 1. I wanna decouple `parquet_file` from `query` (nearly done, needs a small follow-up PR). 2. `predicate` will have more and more features (like serialization) which justifies a new home	2021-09-14 17:13:02 +02:00
Marco Neumann	becef1c75f	refactor: introduce `ChunkOrder` type	2021-09-14 17:10:23 +02:00
Marco Neumann	1d8edd4683	fix: metadata size increased	2021-09-14 13:03:26 +02:00
Marco Neumann	45cb00d8c0	refactor: track chunk order in chunks	2021-09-14 13:00:55 +02:00
Marco Neumann	4769b67d14	feat: API-level code to prune old transaction from catalog	2021-09-14 10:26:38 +02:00
Marco Neumann	f93984cd94	refactor: clarify wording Co-authored-by: Andrew Lamb <alamb@influxdata.com>	2021-09-14 09:43:55 +02:00
Marco Neumann	e7edb65b1d	feat: show number of stripped bytes in catalog dump	2021-09-14 09:43:55 +02:00
Raphael Taylor-Davies	44918e4afc	feat: migrate chunk metrics (#2491 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-09-09 16:02:16 +00:00
Marco Neumann	4a863993ec	feat: "dump catalog" debug CLI	2021-09-02 08:08:20 +02:00
Marco Neumann	581ee64049	feat: add functions to dump catalog data to text	2021-09-02 08:07:07 +02:00
Marco Neumann	06c941d798	refactor: split up `make_record_batch`	2021-09-01 11:26:05 +02:00
Marco Neumann	6ce586a2ac	docs: add docstrings to `PreservedCatalog` members	2021-09-01 11:26:05 +02:00
Marco Neumann	70a5ffeae7	test: allow creation of deterministic chunks and transactions	2021-09-01 11:26:05 +02:00
Marco Neumann	06833110ab	test: allow creation of less complex parquet chunks	2021-09-01 11:26:05 +02:00
Marco Neumann	27248850e5	refactor: use `byte::Bytes` for metadata in protobuf messages That simplifies printing a bit since we `Vec<u8>` prints quite badly.	2021-09-01 11:26:05 +02:00
Marco Neumann	a312f81bf2	refactor: move `storage_testing` to `storage::tests`	2021-08-27 15:59:59 +02:00
Marco Neumann	a2efe3299d	refactor: restructure catalog code in `parquet_file` No functional change (except for slightly changing error messages). This will make it easier to add more functionality.	2021-08-27 15:06:31 +02:00
Carol (Nichols \|\| Goulding)	7ca177978e	fix: Add missing await from a logical merge conflict	2021-08-26 09:27:16 -04:00
Carol (Nichols \|\| Goulding)	18ba3b5c59	feat: Create database directories with a generation ID	2021-08-26 09:14:22 -04:00
Marco Neumann	026202a05c	fix: correctly account for parquet metadata size We need to hold the parquet metadata in memory so that we're able to create catalog checkpoints. We used to do that by holding the decoded structure (provided by the upstream `parquet` crate) in memory and serializing that data on demand to Apache Thrift. There are two drawbacks: 1. We did not account for the memory usage of the decoded structures (or at least not fully). 2. We actually don't need the decoded data in-memory, since for the checkpoint creation we only need to write the serialized data. So this PR changes our wrapper so it holds the serialized data which is then only decoded when it's really necessary. Since the serialized data is a simple byte vector, we can also easily account for the size. Note that this makes the accounted size of parquet chunks larger. However this data was always there, we just ignored it up until now. If the size of the parquet metadata really becomes an issue, we could trait some CPU time for memory by compressing it.	2021-08-26 13:24:32 +02:00
Andrew Lamb	3ca0d5d42f	Merge branch 'main' into cn/bump	2021-08-19 14:08:49 -04:00
Raphael Taylor-Davies	b0e8b75a8a	fix: TestCatalogState unique chunk ID	2021-08-19 17:19:12 +01:00
Carol (Nichols \|\| Goulding)	7246f2702a	fix: Bump transaction version because of a change in the Parquet files	2021-08-19 09:32:37 -04:00
Raphael Taylor-Davies	5a841600d9	feat: make catalog state test deterministic (#2349 )	2021-08-19 14:04:27 +01:00
Carol (Nichols \|\| Goulding)	6390156c0e	fix: Remove error types not used anywhere	2021-08-18 11:32:39 -04:00
Carol (Nichols \|\| Goulding)	ef0e1a3f60	refactor: Extract a transaction file path type	2021-08-18 11:32:39 -04:00
Carol (Nichols \|\| Goulding)	6d5cb9c117	refactor: Extract a ParquetFilePath to handle paths to parquet files in a db's object store	2021-08-18 11:32:39 -04:00
Ning Sun	c012e996ab	refactor: remove display methods, use fmt::Display instead. (#2272 ) * refactor: remove display methods, use fmt::Display instead. Signed-off-by: Ning Sun <sunng@protonmail.com> * refactor: update a few calls from .display to .to_string() * fix: consistently use `Path` rather than occasionally `DirsAndFileName` * fix: fixup for merge conflicts * fix: update test * fix: Catch another case or two * fix: fmt Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-08-16 18:00:22 +00:00
Carol (Nichols \|\| Goulding)	564238ad8c	refactor: Organize uses	2021-08-12 15:05:32 -04:00
Carol (Nichols \|\| Goulding)	ae6b0e669b	refactor: Extract a database persister type that wraps object store Connects to #2193.	2021-08-12 15:05:32 -04:00
Carol (Nichols \|\| Goulding)	daa534ee32	refactor: Incorporate Path parsing into the TransactionFile type	2021-08-12 09:06:14 -04:00
Carol (Nichols \|\| Goulding)	ee3173efb1	refactor: Simplify implementation of parse_file_path	2021-08-12 09:06:14 -04:00
Carol (Nichols \|\| Goulding)	dbd1718fd2	refactor: Use the TransactionKey type	2021-08-12 09:06:14 -04:00
Carol (Nichols \|\| Goulding)	7f7a911a9a	refactor: Extract a TransactionFile type to manage transaction paths	2021-08-12 09:06:06 -04:00
Dom	3de6b44e23	build: use new rustdoc lint name (#2261 ) * fix: nocache feature code rot The MBChunk::snapshot code when using the "nocache" option no longer compiles - this commit updates it to match the not(nocache) code. * build: use updated broken_intra_doc_links name The broken_intra_doc_links lint was renamed rustdoc::broken_intra_doc_links https://doc.rust-lang.org/rustdoc/lints.html	2021-08-11 19:48:51 +00:00
Marco Neumann	8721c5fcd6	fix: improve error messages	2021-08-09 10:54:23 +02:00
Marco Neumann	950286e5b7	feat: make replay planning work w/ unordered checkpoints	2021-08-09 10:54:23 +02:00
Andrew Lamb	d41b44d312	feat: use zstd compression when writing parquet files (#2218 ) * feat: use ZSTD when writing parquet files * fix: test	2021-08-06 18:45:55 +00:00
Andrew Lamb	e92e94caad	chore: Update deps (including arrow 5.1.0, tonic -> 0.5, and prost 0.5) (#2172 ) * chore: Update deps (including arrow 5.0.0 --> arrow 5.1.0) * chore: update all the things * refactor: Update serving readiness check due to change in Tonic API * chore: update more deps Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-08-05 15:57:38 +00:00
Andrew Lamb	1ccaa433e8	fix: Temporarily disable parquet predicate pushdown (#2164 )	2021-07-30 20:24:30 +00:00
Carol (Nichols \|\| Goulding)	9d15798288	fix: Address or allow Clippy warnings new with Rust 1.54	2021-07-30 09:59:59 -04:00
kodiakhq[bot]	545222303f	Merge branch 'main' into cn/cc-only	2021-07-29 17:18:16 +00:00
Carol (Nichols \|\| Goulding)	ad0a9549de	fix: Avoid an unnecessary parsing of iox metadata In one case where ParquetChunk::new was being called, the calling code had just parsed the IoxMetadata too. In the other case, the calling code had just created the IoxMetadata being parsed. In both cases, this re-parsing wasn't actually needed; the two bits of info ParquetChunk::new can be easily passed in.	2021-07-28 14:25:56 -04:00
Carol (Nichols \|\| Goulding)	af7866a638	refactor: Remove first/last write times from ParquetFile chunks	2021-07-28 14:12:36 -04:00
Marco Neumann	04e797c706	refactor: pass sequencer numbers directly to DB checkpoint First of all using a partition checkpoint as some kind of intermediate representation was kinda a hack because partition checkpoints should only created for to-be-persisted partitions, not for the others. API-wise it should only be possible to construct a partition checkpoint from a flush handle. Also we were only able to construct partition checkpoints for partitions that had unpersisted data, otherwise there was no sane way to fill the `min_unpersisted_timestamp`. We must however scan all partitions no matter if there is unpersisted data so that we can determine the maximum seen sequence numbers. This was caught by a replay test resulting in a catalog state where the last database checkpoint had lower maximum seen sequence numbers than some partition checkpoint, bailing out with an error. So overall it turns out that passing the sequencer numbers directly instead of wrapping them into a partition checkpoint is the better implementation.	2021-07-28 17:28:34 +02:00
Andrew Lamb	5fb3e00f2a	fix: Properly record total_count and null_count in statistics (#2103 ) * fix: Properly record total_count and null_count in statistics * fix: fix statistics calculation in mutable_buffer * refactor: expose null counts in read_buffer * refactor: expose null_count in parquet_file * fix: update server crate tests * fix: update query_tests tests * docs: tweak comments * refactor: Use storage_stats rather than adding `null_count` * refactor: rename test data field for clarity * fix: fixup merge conflicts * refactor: rename initial_non_null_count to initial_total_count * refactor: caculate null_count as row_count - to_add	2021-07-26 18:13:36 +00:00
Carol (Nichols \|\| Goulding)	0acb0efbc9	fix: Bump METADATA and TRANSACTION versions	2021-07-26 10:52:42 -04:00
Jake Goulding	d928bc84e6	feat: Thread time_of_{first,last}_write through Parquet metadata	2021-07-23 14:07:35 -04:00
Carol (Nichols \|\| Goulding)	9604ce7084	fix: Don't pass table name around when it's only returned back The read_statistics, read_statistics_from_parquet_row_group, load_parquet_from_store, and load_parquet_from_store_for_chunk functions weren't ever using table name, they just passed it around and passed it back.	2021-07-23 13:48:16 -04:00
Carol (Nichols \|\| Goulding)	3c794153dd	refactor: Organize uses	2021-07-23 13:48:15 -04:00
kodiakhq[bot]	5b5453a020	Merge branch 'main' into pd/add-parquet-cache	2021-07-22 20:21:53 +00:00
Paul Dix	88e29dede9	chore: remove extraneous example code from parquet storage	2021-07-22 16:21:13 -04:00
Andrew Lamb	01c79f1a1a	fix: Print all timestamps using RFC3339 format (#2098 ) * fix: Use IOx pretty printer rather than arrow pretty printer * chore: update tests in the query crate * chore: update influxdb_iox tests * chore: Update end to end tests * chore: update query_tests * chore: update mutable_buffer tests * refactor: update parquet_file tests * refactor: update db tests * chore: update kafka integration test output * fix: merge conflict	2021-07-22 19:04:52 +00:00
Marco Neumann	50241bae9e	refactor: do not abuse `uint64::MAX` as sentinal for `None`	2021-07-22 12:51:43 +02:00
Paul Dix	d95b5df03e	refactor: move cache to ObjectStore Since the consumers of ObjectStore always use the concrete type rather than the ObjectStoreApi trait, it makes more sense to just change the concrete type to have a pointer to the cache. This removes the cache from the ObjectStoreApi trait and changes the ObjectStore to be a regular struct rather than a tuple around the ObjectStoreIntegration. Future work will have the server configure the cache on the ObjectStore struct when its options are set.	2021-07-21 18:27:56 -04:00
Paul Dix	d0ea812041	feat: add skeleton for object store file cache	2021-07-21 18:27:56 -04:00
Marco Neumann	57a9d5ade0	refactor: correctly track "seen" ranges in persistence checkpoints Now we can handle all these cases: There are two partitions w/ a single write each: 1. A reads sequence number 1 2. B reads sequence number 2 3. we persist A which only knows the sequences up until 1 => the DB checkpoint needs the global max, otherwise we forget sequences during replay (2 in this case, so B would be gone) 1. B reads sequence number 1 2. A reads sequence number 2 3. we persist A which (w/o this commit) would not track the sequencer at all in this checkpoint (since there is nothing to replay) => we MUST also remember that we already read up until 2, otherwise we'll re-read 2 after replay => the partition checkpoint needs the local seen max (no matter if there's something to to persist)	2021-07-21 19:19:49 +02:00
Marco Neumann	a5fc1c7d38	fix: collect min AND max in database checkpoints This is required to correctly handle the following case: 1. There are two partitions A and B w/ a single write each (from the same sequencer). 2. We persist A: - The partition checkpoint for A will be empty because after persistence there will be nothing to replay (the single write is persisted and we're ready). - The database checkpoint that contains the global minimum of all ranges recognizes that for the sequencer there is indeed something left (the minimum sequence number from B). 3. DB restart happens, replay starts 4. We scan all persisted files, figure out that we have a DB checkpoint with a sequence minimum but (w/o the change in this commit) there is no maximum. Only partition checkpoints contain maxima, and the only partition checkpoint that was persisted was the one for partition A and that one was empty (see above). 5. So now how do we recover partition B?	2021-07-21 14:48:29 +02:00
Andrew Lamb	4da8a16c18	chore: update to arrow 5.0 and master datafusion (#2049 ) * chore: update to arrow 5.0 and master datafusion * fix: Update test for change in object size	2021-07-19 12:49:51 +00:00
Jake Goulding	42b56ad657	refactor: Use SNAFU's context instead of `ok_or_else`	2021-07-16 09:59:54 -04:00
Jake Goulding	939d15a21f	perf: Avoid clone when an error doesn't occur	2021-07-16 09:59:54 -04:00
Marco Neumann	f57ba6afdb	fix: use fixed-size timestamps for parquet metadata (#2032 ) This fixes flaky tests that rely on predictable files sizes. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-16 13:14:02 +00:00
Andrew Lamb	0c86d1dccf	feat: Record parquet bytes size in catalog / parquet_file (#2006 ) * feat: Store object store size in parquet_file * fix: update TRANSACTION_VERSION to 8 * refactor: rename os_bytes --> file_size_bytes	2021-07-15 12:07:11 +00:00
Marco Neumann	40047a76bc	refactor: `remove_parquet` cannot fail	2021-07-15 12:07:56 +02:00
Raphael Taylor-Davies	1d00fa2fd8	refactor: track memory metrics in catalog (#1995 ) * refactor: track memory metrics in catalog * chore: update comment	2021-07-14 16:23:00 +00:00
Andrew Lamb	d35b74c226	fix: Fix doc build warnings (#1945 ) * fix: Fix doc build warnings * refactor: add deny bare_urls to crates Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-13 08:03:42 +00:00
Andrew Lamb	670826daf9	refactor: make object_store construction interface consistent (#1944 ) * refactor: make object_store construction interface consistent * fix: benchmarks * fix: doc build Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-12 12:56:36 +00:00
Marco Neumann	18893e76e0	refactor: convert some table name and part. key String to Arcs This has the (somewhat nice) side effect that it shrinks the in-mem catalog a bit as well because nw `ParquetChunk` is a bit smaller making the chunk stage enum smaller as well.	2021-07-08 14:34:28 +02:00
Marco Neumann	b528ac2b55	feat: store schemas per table This way we can: - check for schema matches even for writes going into different partitions - solve #1768 and #1884 in some future PR Closes #1897.	2021-07-08 09:18:09 +02:00
Andrew Lamb	e6d995cbd8	chore: Update to Rust 1.53.0 (#1922 ) * chore: Update to Rust 1.53.0 * fix: Update to latest clippy standards * fix: bad refactor * fix: Update escaping * test: update test output Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-07 18:02:03 +00:00
Marco Neumann	4ca2d3e148	chore: move persistence windows related code into own crate The entire persistence windows data structures (including the checkpoints) have nothing to do with the mutable buffer per se. So lets move them into their own crate. This also makes `parquet_file` not longer depend on `mutable_buffer`.	2021-07-05 10:23:58 +02:00
Marco Neumann	d96e15c3f7	docs: explain why we store checkpoints in parquet files	2021-07-05 09:42:46 +02:00
Marco Neumann	cdab1bed05	feat: persist part+db checkpoint in parquets and catalog This will be required for replay on server startup.	2021-07-05 09:42:46 +02:00
Jacob Marble	0779b0d9bd	feat: add gRPC listener for new write protocol (#1842 ) * feat: add gRPC listener for new write protocol * chore: clippy happy * chore: lint * chore: cargo fmt --all * chore: cargo clippy * chore: protobuf-lint * chore: more formatting Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-01 16:15:12 +00:00
Marco Neumann	4204127b05	refactor: use protobuf for in-parquet metadata	2021-06-30 16:51:37 +02:00
Marco Neumann	ddc9cd49ca	chore: bump preserved catalog version	2021-06-29 14:23:06 +02:00
Marco Neumann	3ebb6a3037	refactor: do not capture txn-specific information in parquet files This helps with #1821.	2021-06-29 14:22:36 +02:00
kodiakhq[bot]	eda9532eb2	Merge branch 'main' into crepererum/issue1821-cleanup-lock	2021-06-29 10:48:43 +00:00
Marco Neumann	48df13de05	refactor: use parking lot for catalog cleanup	2021-06-29 12:47:29 +02:00
Marco Neumann	f824f235b4	fix: fix info log message Co-authored-by: Andrew Lamb <alamb@influxdata.com>	2021-06-29 12:35:05 +02:00
Marco Neumann	778a611fb8	docs: add clarifying comment for rebuild test	2021-06-29 11:58:19 +02:00
Marco Neumann	17f89ea8d0	docs: fix comment about lock downgrade	2021-06-29 11:53:55 +02:00
Marco Neumann	2cd5ce98be	refactor: do not pass locks around for catalog cleanup	2021-06-29 10:21:41 +02:00
Marco Neumann	730a23faa3	refactor: improve locking around the parquet file cleanup Instead of (ab)using the transaction lock to prevent the cleanup job from removing just-written parquet files, use a dedicated lock. This will later allow us to write parquet files before starting a transaction (i.e. w/o holding the transaction lock). This will help with #1821.	2021-06-29 10:20:03 +02:00
Marco Neumann	6ec24353bf	refactor: only rebuild a single txn for pres. catalogs Stop relying on in-parquet transaction information during catalog rebuilds. This has some downsides (no fork detection, only a single transaction hence no time travel) but will allow that we remove transaction information from parquet files, so that we can finally move the actual parquet file storage out of the transaction lock. This will help with #1821.	2021-06-28 15:10:44 +02:00
Andrew Lamb	0a03605bbc	refactor: pull Channel --> Stream adapater into its own module (#1793 ) * refactor: pull Channel --> Stream adapater into its own module * docs: Update query/src/exec/stream.rs Co-authored-by: Marko Mikulicic <mkm@influxdata.com> Co-authored-by: Marko Mikulicic <mkm@influxdata.com>	2021-06-24 10:35:45 +00:00
kodiakhq[bot]	59993e8b8f	Merge branch 'main' into crepererum/issue1623	2021-06-23 12:40:05 +00:00
Marco Neumann	c395409b51	feat: include UUIDv4 into parquet file names Change schema from ```text <server_id>/<db_name>/data/<part_key>/<chunk_id>/<table_name>.parquet ``` to ```text <server_id>/<db_name>/data/<table_name>/<part_key>/<chunk_id>.<uuid>.parquet ``` So parquet files will NEVER be overwritten. This is especially helpful when dealing with old catalog leftovers (i.e. a parquet file that belonged to an old but wiped catalog). It also simplifies the reasoning about file references in the future and follows what other dataset formats are usually doing (i.e. never replace files). Also use `ChunkAddr` where it makes sense.	2021-06-23 14:30:28 +02:00
kodiakhq[bot]	70817a474c	Merge branch 'main' into crepererum/issue1740-d	2021-06-23 12:29:54 +00:00
Raphael Taylor-Davies	5cd911c74a	fix: correct row count for object store chunks (#1789 )	2021-06-23 12:06:49 +00:00
Marco Neumann	1636f47565	refactor: remove dead code	2021-06-23 10:51:22 +02:00
Marco Neumann	cf55df68b5	refactor: remove some `Arc`s around the in-mem catalog This is for #1740.	2021-06-23 10:51:22 +02:00
Marco Neumann	e36b6f9c7a	docs: fix intra-doc link	2021-06-23 10:25:05 +02:00
Marco Neumann	67508094b4	fix: double ref Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>	2021-06-23 10:25:05 +02:00
Marco Neumann	d2be641864	refactor: make checkpointing easier to use Don't mix commit+checkpoint in a single call so that the caller has to reason about the error type and which of the two operations has failed. Splitting it also makes it easier to create the correct checkpoint data.	2021-06-23 10:25:05 +02:00
Marco Neumann	4a961694ec	refactor: make caller sync mem<>OS view during catalog transactions This is for #1740. Greatly simplifies the integration of the persisted catalog into the DB.	2021-06-23 10:25:05 +02:00
Marco Neumann	d1db0dfaeb	refactor: remove type parameter from preserved catalog For #1740.	2021-06-22 10:53:10 +02:00
Marco Neumann	ff60627500	refactor: make preserved catalog NOT own the in-mem catalog Works towards #1740.	2021-06-21 18:39:43 +02:00
Marco Neumann	881729bd23	refactor: make caller responsible to create checkpoint data This decouples the in-mem and preserved catalog a bit and works towards #1740.	2021-06-21 18:33:23 +02:00
Marco Neumann	aba973a6e1	refactor: make catalog `wipe` a freestanding function It does not interact with the `CatalogState` so users can call this function without that type.	2021-06-21 09:31:23 +02:00
Andrew Lamb	258a6b1956	chore: remove more dead code (#1760 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-18 21:28:22 +00:00
Andrew Lamb	de67bd3efe	refactor: Remove PartitionChunk::table_schema (#1756 ) * refactor: Remove PartitionChunk::table_schema * docs: update comments	2021-06-18 16:13:16 +00:00
Raphael Taylor-Davies	f6dbc8d6f2	refactor: add ChunkAddr to describe location of chunk in catalog (#1745 ) * refactor: add ChunkPath to describe location of chunk in catalog * refactor: rename ChunkPath to ChunkAddr * chore: further renames * chore: even more renames Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-17 12:04:37 +00:00
Marco Neumann	e056d97cf6	test: always test transaction aborts	2021-06-16 11:01:14 +02:00
Marco Neumann	caaf95c6ec	refactor: remove lock from `TestCatalogState`	2021-06-16 10:51:15 +02:00
Marco Neumann	c8c412f6fe	refactor: rework catalog state interface This now allows not only for copy-based transaction handling but also for eager exec and rollbacks. This will be useful to properly implement transaction aborts for the "real" catalog.	2021-06-16 10:51:15 +02:00
Marco Neumann	e064a6bbba	test: add test suite for `CatalogState` impls This makes it easier to check if `CatalogState` correctly implement all features, including transaction aborting.	2021-06-16 10:50:47 +02:00
Andrew Lamb	b756e09904	refactor: Rename parquet_file::Chunk --> ParquetChunk (#1722 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-15 11:21:49 +00:00
Marco Neumann	64c815dd50	fix: bump catalog version (#1726 ) This should have been done in #1714. Also add a note so that future devs might hopefully not forget. In any case though the code also works w/o this bump, it's just that the error message is a bit less nice ("cannot parse IOxMetadata" instead of "unsupported catalog version"). Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-15 10:26:30 +00:00
Marco Neumann	55fc5e564b	refactor: remove serverID and DB name args from catalog state They are no longer required.	2021-06-15 09:35:41 +02:00
Marco Neumann	776b6c011c	feat: remove path parsing functionality Paths to parquet files are an implementation detail and should not be parsed. Closes #1506.	2021-06-14 16:24:50 +02:00
Marco Neumann	250ccdcdcd	refactor: use `IOxMetadata` instead of path parsing for parquet chunks	2021-06-14 16:24:50 +02:00
Marco Neumann	d51e7a127c	feat: include table name, partition key, and chunk ID in `IoxMetadata`	2021-06-14 16:24:50 +02:00
kodiakhq[bot]	b57f397057	Merge branch 'main' into crepererum/checkpoint_during_restore	2021-06-14 13:54:03 +00:00

... 2 3 4 5 6 ...

487 Commits (c62c7d32b1fad95dc3b4d11fe8c9a7dcd3dae891)