influxdb

Commit Graph

Author	SHA1	Message	Date
Carol (Nichols \|\| Goulding)	7246f2702a	fix: Bump transaction version because of a change in the Parquet files	2021-08-19 09:32:37 -04:00
Raphael Taylor-Davies	5a841600d9	feat: make catalog state test deterministic (#2349 )	2021-08-19 14:04:27 +01:00
Carol (Nichols \|\| Goulding)	6390156c0e	fix: Remove error types not used anywhere	2021-08-18 11:32:39 -04:00
Carol (Nichols \|\| Goulding)	ef0e1a3f60	refactor: Extract a transaction file path type	2021-08-18 11:32:39 -04:00
Carol (Nichols \|\| Goulding)	6d5cb9c117	refactor: Extract a ParquetFilePath to handle paths to parquet files in a db's object store	2021-08-18 11:32:39 -04:00
Ning Sun	c012e996ab	refactor: remove display methods, use fmt::Display instead. (#2272 ) * refactor: remove display methods, use fmt::Display instead. Signed-off-by: Ning Sun <sunng@protonmail.com> * refactor: update a few calls from .display to .to_string() * fix: consistently use `Path` rather than occasionally `DirsAndFileName` * fix: fixup for merge conflicts * fix: update test * fix: Catch another case or two * fix: fmt Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-08-16 18:00:22 +00:00
Carol (Nichols \|\| Goulding)	564238ad8c	refactor: Organize uses	2021-08-12 15:05:32 -04:00
Carol (Nichols \|\| Goulding)	ae6b0e669b	refactor: Extract a database persister type that wraps object store Connects to #2193.	2021-08-12 15:05:32 -04:00
Carol (Nichols \|\| Goulding)	daa534ee32	refactor: Incorporate Path parsing into the TransactionFile type	2021-08-12 09:06:14 -04:00
Carol (Nichols \|\| Goulding)	ee3173efb1	refactor: Simplify implementation of parse_file_path	2021-08-12 09:06:14 -04:00
Carol (Nichols \|\| Goulding)	dbd1718fd2	refactor: Use the TransactionKey type	2021-08-12 09:06:14 -04:00
Carol (Nichols \|\| Goulding)	7f7a911a9a	refactor: Extract a TransactionFile type to manage transaction paths	2021-08-12 09:06:06 -04:00
Dom	3de6b44e23	build: use new rustdoc lint name (#2261 ) * fix: nocache feature code rot The MBChunk::snapshot code when using the "nocache" option no longer compiles - this commit updates it to match the not(nocache) code. * build: use updated broken_intra_doc_links name The broken_intra_doc_links lint was renamed rustdoc::broken_intra_doc_links https://doc.rust-lang.org/rustdoc/lints.html	2021-08-11 19:48:51 +00:00
Marco Neumann	8721c5fcd6	fix: improve error messages	2021-08-09 10:54:23 +02:00
Marco Neumann	950286e5b7	feat: make replay planning work w/ unordered checkpoints	2021-08-09 10:54:23 +02:00
Andrew Lamb	d41b44d312	feat: use zstd compression when writing parquet files (#2218 ) * feat: use ZSTD when writing parquet files * fix: test	2021-08-06 18:45:55 +00:00
Andrew Lamb	e92e94caad	chore: Update deps (including arrow 5.1.0, tonic -> 0.5, and prost 0.5) (#2172 ) * chore: Update deps (including arrow 5.0.0 --> arrow 5.1.0) * chore: update all the things * refactor: Update serving readiness check due to change in Tonic API * chore: update more deps Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-08-05 15:57:38 +00:00
Andrew Lamb	1ccaa433e8	fix: Temporarily disable parquet predicate pushdown (#2164 )	2021-07-30 20:24:30 +00:00
Carol (Nichols \|\| Goulding)	9d15798288	fix: Address or allow Clippy warnings new with Rust 1.54	2021-07-30 09:59:59 -04:00
kodiakhq[bot]	545222303f	Merge branch 'main' into cn/cc-only	2021-07-29 17:18:16 +00:00
Carol (Nichols \|\| Goulding)	ad0a9549de	fix: Avoid an unnecessary parsing of iox metadata In one case where ParquetChunk::new was being called, the calling code had just parsed the IoxMetadata too. In the other case, the calling code had just created the IoxMetadata being parsed. In both cases, this re-parsing wasn't actually needed; the two bits of info ParquetChunk::new can be easily passed in.	2021-07-28 14:25:56 -04:00
Carol (Nichols \|\| Goulding)	af7866a638	refactor: Remove first/last write times from ParquetFile chunks	2021-07-28 14:12:36 -04:00
Marco Neumann	04e797c706	refactor: pass sequencer numbers directly to DB checkpoint First of all using a partition checkpoint as some kind of intermediate representation was kinda a hack because partition checkpoints should only created for to-be-persisted partitions, not for the others. API-wise it should only be possible to construct a partition checkpoint from a flush handle. Also we were only able to construct partition checkpoints for partitions that had unpersisted data, otherwise there was no sane way to fill the `min_unpersisted_timestamp`. We must however scan all partitions no matter if there is unpersisted data so that we can determine the maximum seen sequence numbers. This was caught by a replay test resulting in a catalog state where the last database checkpoint had lower maximum seen sequence numbers than some partition checkpoint, bailing out with an error. So overall it turns out that passing the sequencer numbers directly instead of wrapping them into a partition checkpoint is the better implementation.	2021-07-28 17:28:34 +02:00
Andrew Lamb	5fb3e00f2a	fix: Properly record total_count and null_count in statistics (#2103 ) * fix: Properly record total_count and null_count in statistics * fix: fix statistics calculation in mutable_buffer * refactor: expose null counts in read_buffer * refactor: expose null_count in parquet_file * fix: update server crate tests * fix: update query_tests tests * docs: tweak comments * refactor: Use storage_stats rather than adding `null_count` * refactor: rename test data field for clarity * fix: fixup merge conflicts * refactor: rename initial_non_null_count to initial_total_count * refactor: caculate null_count as row_count - to_add	2021-07-26 18:13:36 +00:00
Carol (Nichols \|\| Goulding)	0acb0efbc9	fix: Bump METADATA and TRANSACTION versions	2021-07-26 10:52:42 -04:00
Jake Goulding	d928bc84e6	feat: Thread time_of_{first,last}_write through Parquet metadata	2021-07-23 14:07:35 -04:00
Carol (Nichols \|\| Goulding)	9604ce7084	fix: Don't pass table name around when it's only returned back The read_statistics, read_statistics_from_parquet_row_group, load_parquet_from_store, and load_parquet_from_store_for_chunk functions weren't ever using table name, they just passed it around and passed it back.	2021-07-23 13:48:16 -04:00
Carol (Nichols \|\| Goulding)	3c794153dd	refactor: Organize uses	2021-07-23 13:48:15 -04:00
kodiakhq[bot]	5b5453a020	Merge branch 'main' into pd/add-parquet-cache	2021-07-22 20:21:53 +00:00
Paul Dix	88e29dede9	chore: remove extraneous example code from parquet storage	2021-07-22 16:21:13 -04:00
Andrew Lamb	01c79f1a1a	fix: Print all timestamps using RFC3339 format (#2098 ) * fix: Use IOx pretty printer rather than arrow pretty printer * chore: update tests in the query crate * chore: update influxdb_iox tests * chore: Update end to end tests * chore: update query_tests * chore: update mutable_buffer tests * refactor: update parquet_file tests * refactor: update db tests * chore: update kafka integration test output * fix: merge conflict	2021-07-22 19:04:52 +00:00
Marco Neumann	50241bae9e	refactor: do not abuse `uint64::MAX` as sentinal for `None`	2021-07-22 12:51:43 +02:00
Paul Dix	d95b5df03e	refactor: move cache to ObjectStore Since the consumers of ObjectStore always use the concrete type rather than the ObjectStoreApi trait, it makes more sense to just change the concrete type to have a pointer to the cache. This removes the cache from the ObjectStoreApi trait and changes the ObjectStore to be a regular struct rather than a tuple around the ObjectStoreIntegration. Future work will have the server configure the cache on the ObjectStore struct when its options are set.	2021-07-21 18:27:56 -04:00
Paul Dix	d0ea812041	feat: add skeleton for object store file cache	2021-07-21 18:27:56 -04:00
Marco Neumann	57a9d5ade0	refactor: correctly track "seen" ranges in persistence checkpoints Now we can handle all these cases: There are two partitions w/ a single write each: 1. A reads sequence number 1 2. B reads sequence number 2 3. we persist A which only knows the sequences up until 1 => the DB checkpoint needs the global max, otherwise we forget sequences during replay (2 in this case, so B would be gone) 1. B reads sequence number 1 2. A reads sequence number 2 3. we persist A which (w/o this commit) would not track the sequencer at all in this checkpoint (since there is nothing to replay) => we MUST also remember that we already read up until 2, otherwise we'll re-read 2 after replay => the partition checkpoint needs the local seen max (no matter if there's something to to persist)	2021-07-21 19:19:49 +02:00
Marco Neumann	a5fc1c7d38	fix: collect min AND max in database checkpoints This is required to correctly handle the following case: 1. There are two partitions A and B w/ a single write each (from the same sequencer). 2. We persist A: - The partition checkpoint for A will be empty because after persistence there will be nothing to replay (the single write is persisted and we're ready). - The database checkpoint that contains the global minimum of all ranges recognizes that for the sequencer there is indeed something left (the minimum sequence number from B). 3. DB restart happens, replay starts 4. We scan all persisted files, figure out that we have a DB checkpoint with a sequence minimum but (w/o the change in this commit) there is no maximum. Only partition checkpoints contain maxima, and the only partition checkpoint that was persisted was the one for partition A and that one was empty (see above). 5. So now how do we recover partition B?	2021-07-21 14:48:29 +02:00
Andrew Lamb	4da8a16c18	chore: update to arrow 5.0 and master datafusion (#2049 ) * chore: update to arrow 5.0 and master datafusion * fix: Update test for change in object size	2021-07-19 12:49:51 +00:00
Jake Goulding	42b56ad657	refactor: Use SNAFU's context instead of `ok_or_else`	2021-07-16 09:59:54 -04:00
Jake Goulding	939d15a21f	perf: Avoid clone when an error doesn't occur	2021-07-16 09:59:54 -04:00
Marco Neumann	f57ba6afdb	fix: use fixed-size timestamps for parquet metadata (#2032 ) This fixes flaky tests that rely on predictable files sizes. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-16 13:14:02 +00:00
Andrew Lamb	0c86d1dccf	feat: Record parquet bytes size in catalog / parquet_file (#2006 ) * feat: Store object store size in parquet_file * fix: update TRANSACTION_VERSION to 8 * refactor: rename os_bytes --> file_size_bytes	2021-07-15 12:07:11 +00:00
Marco Neumann	40047a76bc	refactor: `remove_parquet` cannot fail	2021-07-15 12:07:56 +02:00
Raphael Taylor-Davies	1d00fa2fd8	refactor: track memory metrics in catalog (#1995 ) * refactor: track memory metrics in catalog * chore: update comment	2021-07-14 16:23:00 +00:00
Andrew Lamb	d35b74c226	fix: Fix doc build warnings (#1945 ) * fix: Fix doc build warnings * refactor: add deny bare_urls to crates Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-13 08:03:42 +00:00
Andrew Lamb	670826daf9	refactor: make object_store construction interface consistent (#1944 ) * refactor: make object_store construction interface consistent * fix: benchmarks * fix: doc build Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-12 12:56:36 +00:00
Marco Neumann	18893e76e0	refactor: convert some table name and part. key String to Arcs This has the (somewhat nice) side effect that it shrinks the in-mem catalog a bit as well because nw `ParquetChunk` is a bit smaller making the chunk stage enum smaller as well.	2021-07-08 14:34:28 +02:00
Marco Neumann	b528ac2b55	feat: store schemas per table This way we can: - check for schema matches even for writes going into different partitions - solve #1768 and #1884 in some future PR Closes #1897.	2021-07-08 09:18:09 +02:00
Andrew Lamb	e6d995cbd8	chore: Update to Rust 1.53.0 (#1922 ) * chore: Update to Rust 1.53.0 * fix: Update to latest clippy standards * fix: bad refactor * fix: Update escaping * test: update test output Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-07 18:02:03 +00:00
Marco Neumann	4ca2d3e148	chore: move persistence windows related code into own crate The entire persistence windows data structures (including the checkpoints) have nothing to do with the mutable buffer per se. So lets move them into their own crate. This also makes `parquet_file` not longer depend on `mutable_buffer`.	2021-07-05 10:23:58 +02:00
Marco Neumann	d96e15c3f7	docs: explain why we store checkpoints in parquet files	2021-07-05 09:42:46 +02:00
Marco Neumann	cdab1bed05	feat: persist part+db checkpoint in parquets and catalog This will be required for replay on server startup.	2021-07-05 09:42:46 +02:00
Jacob Marble	0779b0d9bd	feat: add gRPC listener for new write protocol (#1842 ) * feat: add gRPC listener for new write protocol * chore: clippy happy * chore: lint * chore: cargo fmt --all * chore: cargo clippy * chore: protobuf-lint * chore: more formatting Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-01 16:15:12 +00:00
Marco Neumann	4204127b05	refactor: use protobuf for in-parquet metadata	2021-06-30 16:51:37 +02:00
Marco Neumann	ddc9cd49ca	chore: bump preserved catalog version	2021-06-29 14:23:06 +02:00
Marco Neumann	3ebb6a3037	refactor: do not capture txn-specific information in parquet files This helps with #1821.	2021-06-29 14:22:36 +02:00
kodiakhq[bot]	eda9532eb2	Merge branch 'main' into crepererum/issue1821-cleanup-lock	2021-06-29 10:48:43 +00:00
Marco Neumann	48df13de05	refactor: use parking lot for catalog cleanup	2021-06-29 12:47:29 +02:00
Marco Neumann	f824f235b4	fix: fix info log message Co-authored-by: Andrew Lamb <alamb@influxdata.com>	2021-06-29 12:35:05 +02:00
Marco Neumann	778a611fb8	docs: add clarifying comment for rebuild test	2021-06-29 11:58:19 +02:00
Marco Neumann	17f89ea8d0	docs: fix comment about lock downgrade	2021-06-29 11:53:55 +02:00
Marco Neumann	2cd5ce98be	refactor: do not pass locks around for catalog cleanup	2021-06-29 10:21:41 +02:00
Marco Neumann	730a23faa3	refactor: improve locking around the parquet file cleanup Instead of (ab)using the transaction lock to prevent the cleanup job from removing just-written parquet files, use a dedicated lock. This will later allow us to write parquet files before starting a transaction (i.e. w/o holding the transaction lock). This will help with #1821.	2021-06-29 10:20:03 +02:00
Marco Neumann	6ec24353bf	refactor: only rebuild a single txn for pres. catalogs Stop relying on in-parquet transaction information during catalog rebuilds. This has some downsides (no fork detection, only a single transaction hence no time travel) but will allow that we remove transaction information from parquet files, so that we can finally move the actual parquet file storage out of the transaction lock. This will help with #1821.	2021-06-28 15:10:44 +02:00
Andrew Lamb	0a03605bbc	refactor: pull Channel --> Stream adapater into its own module (#1793 ) * refactor: pull Channel --> Stream adapater into its own module * docs: Update query/src/exec/stream.rs Co-authored-by: Marko Mikulicic <mkm@influxdata.com> Co-authored-by: Marko Mikulicic <mkm@influxdata.com>	2021-06-24 10:35:45 +00:00
kodiakhq[bot]	59993e8b8f	Merge branch 'main' into crepererum/issue1623	2021-06-23 12:40:05 +00:00
Marco Neumann	c395409b51	feat: include UUIDv4 into parquet file names Change schema from ```text <server_id>/<db_name>/data/<part_key>/<chunk_id>/<table_name>.parquet ``` to ```text <server_id>/<db_name>/data/<table_name>/<part_key>/<chunk_id>.<uuid>.parquet ``` So parquet files will NEVER be overwritten. This is especially helpful when dealing with old catalog leftovers (i.e. a parquet file that belonged to an old but wiped catalog). It also simplifies the reasoning about file references in the future and follows what other dataset formats are usually doing (i.e. never replace files). Also use `ChunkAddr` where it makes sense.	2021-06-23 14:30:28 +02:00
kodiakhq[bot]	70817a474c	Merge branch 'main' into crepererum/issue1740-d	2021-06-23 12:29:54 +00:00
Raphael Taylor-Davies	5cd911c74a	fix: correct row count for object store chunks (#1789 )	2021-06-23 12:06:49 +00:00
Marco Neumann	1636f47565	refactor: remove dead code	2021-06-23 10:51:22 +02:00
Marco Neumann	cf55df68b5	refactor: remove some `Arc`s around the in-mem catalog This is for #1740.	2021-06-23 10:51:22 +02:00
Marco Neumann	e36b6f9c7a	docs: fix intra-doc link	2021-06-23 10:25:05 +02:00
Marco Neumann	67508094b4	fix: double ref Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>	2021-06-23 10:25:05 +02:00
Marco Neumann	d2be641864	refactor: make checkpointing easier to use Don't mix commit+checkpoint in a single call so that the caller has to reason about the error type and which of the two operations has failed. Splitting it also makes it easier to create the correct checkpoint data.	2021-06-23 10:25:05 +02:00
Marco Neumann	4a961694ec	refactor: make caller sync mem<>OS view during catalog transactions This is for #1740. Greatly simplifies the integration of the persisted catalog into the DB.	2021-06-23 10:25:05 +02:00
Marco Neumann	d1db0dfaeb	refactor: remove type parameter from preserved catalog For #1740.	2021-06-22 10:53:10 +02:00
Marco Neumann	ff60627500	refactor: make preserved catalog NOT own the in-mem catalog Works towards #1740.	2021-06-21 18:39:43 +02:00
Marco Neumann	881729bd23	refactor: make caller responsible to create checkpoint data This decouples the in-mem and preserved catalog a bit and works towards #1740.	2021-06-21 18:33:23 +02:00
Marco Neumann	aba973a6e1	refactor: make catalog `wipe` a freestanding function It does not interact with the `CatalogState` so users can call this function without that type.	2021-06-21 09:31:23 +02:00
Andrew Lamb	258a6b1956	chore: remove more dead code (#1760 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-18 21:28:22 +00:00
Andrew Lamb	de67bd3efe	refactor: Remove PartitionChunk::table_schema (#1756 ) * refactor: Remove PartitionChunk::table_schema * docs: update comments	2021-06-18 16:13:16 +00:00
Raphael Taylor-Davies	f6dbc8d6f2	refactor: add ChunkAddr to describe location of chunk in catalog (#1745 ) * refactor: add ChunkPath to describe location of chunk in catalog * refactor: rename ChunkPath to ChunkAddr * chore: further renames * chore: even more renames Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-17 12:04:37 +00:00
Marco Neumann	e056d97cf6	test: always test transaction aborts	2021-06-16 11:01:14 +02:00
Marco Neumann	caaf95c6ec	refactor: remove lock from `TestCatalogState`	2021-06-16 10:51:15 +02:00
Marco Neumann	c8c412f6fe	refactor: rework catalog state interface This now allows not only for copy-based transaction handling but also for eager exec and rollbacks. This will be useful to properly implement transaction aborts for the "real" catalog.	2021-06-16 10:51:15 +02:00
Marco Neumann	e064a6bbba	test: add test suite for `CatalogState` impls This makes it easier to check if `CatalogState` correctly implement all features, including transaction aborting.	2021-06-16 10:50:47 +02:00
Andrew Lamb	b756e09904	refactor: Rename parquet_file::Chunk --> ParquetChunk (#1722 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-15 11:21:49 +00:00
Marco Neumann	64c815dd50	fix: bump catalog version (#1726 ) This should have been done in #1714. Also add a note so that future devs might hopefully not forget. In any case though the code also works w/o this bump, it's just that the error message is a bit less nice ("cannot parse IOxMetadata" instead of "unsupported catalog version"). Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-15 10:26:30 +00:00
Marco Neumann	55fc5e564b	refactor: remove serverID and DB name args from catalog state They are no longer required.	2021-06-15 09:35:41 +02:00
Marco Neumann	776b6c011c	feat: remove path parsing functionality Paths to parquet files are an implementation detail and should not be parsed. Closes #1506.	2021-06-14 16:24:50 +02:00
Marco Neumann	250ccdcdcd	refactor: use `IOxMetadata` instead of path parsing for parquet chunks	2021-06-14 16:24:50 +02:00
Marco Neumann	d51e7a127c	feat: include table name, partition key, and chunk ID in `IoxMetadata`	2021-06-14 16:24:50 +02:00
kodiakhq[bot]	b57f397057	Merge branch 'main' into crepererum/checkpoint_during_restore	2021-06-14 13:54:03 +00:00
Marco Neumann	0a7dcc3779	test: adjust read-write parquet test to newest test data	2021-06-14 14:24:24 +02:00
Marco Neumann	d6f6ddfdaa	fix: fix NULL handling in parquet stats	2021-06-14 14:24:09 +02:00
Marco Neumann	eae56630fb	test: add test for all-NULL float column metadata	2021-06-14 13:48:34 +02:00
Marco Neumann	3f9bcf7cd9	fix: fix NaN handling in parquet stats	2021-06-14 13:44:52 +02:00
Marco Neumann	ea96210e98	test: enable unblocked test	2021-06-14 13:44:52 +02:00
Marco Neumann	518f7c6f15	refactor: wrap upstream parquet MD into struct + clean up interface This prevents users from `parquet_file::metadata` to also depend on `parquet` directly. Furthermore they don't need to important dozend of functions and can instead just use `IoxParquetMetaData` directly.	2021-06-14 13:17:01 +02:00
Marco Neumann	030d0d2b9a	feat: create checkpoint during catalog rebuild	2021-06-14 10:55:56 +02:00
Marco Neumann	df866f72e0	refactor: store parquet metadata in chunk This will be useful for #1381. At the moment we parse schema and stats eagerly and store them alongside the parquet metadata in memory. Technically this is not required since this is basically duplicate data. In the future we might trade-off some of this memory against CPU consumption by parsing schema and stats on demand.	2021-06-14 10:08:31 +02:00
Marco Neumann	e6699ff15a	test: ensure that `find_last_transaction_timestamp` considers checkpoints	2021-06-14 10:04:50 +02:00
Marco Neumann	f8a518bbed	refactor: inline `Table` into `parquet_file::chunk::Chunk` Note that the resulting size estimations are different because we were double-counting `Table`. `mem::size_of::<Self>()` is recursive for non-boxed types since the child will be part of the parent structure. Issue: #1295.	2021-06-11 11:54:31 +02:00
Marco Neumann	28d1dc4da1	chore: bump preserved catalog version	2021-06-10 16:01:13 +02:00
Marco Neumann	80ee36cd1a	refactor: slightly streamline path parsing code in pres. catalog	2021-06-10 15:59:28 +02:00
Marco Neumann	7e7332c9ce	refactor: make comparison a bit less confusing	2021-06-10 15:42:21 +02:00
Marco Neumann	fd581e2ec9	docs: fix confusion wording in `CatalogState::files`	2021-06-10 15:42:21 +02:00
Marco Neumann	be9b3a4853	fix: protobuf lint fixes	2021-06-10 15:42:21 +02:00
Marco Neumann	294c304491	feat: impl catalog checkpointing infrastructure This implements a way to add checkpoints to the preserved catalog and speed up replay. Note: This leaves the "hook it up into the actual DB" for a future PR. Issue: #1381.	2021-06-10 15:42:21 +02:00
Marco Neumann	188cacec54	refactor: use `Arc` to pass `ParquetFileMetaData` This will be handy when the catalog state must be able to return metadata objects so that we can create checkpoints, esp. when we use multi-chunk parquet files in some midterm future.	2021-06-10 15:42:21 +02:00
Marco Neumann	c7412740e4	refactor: prepare to read and write multiple file types for catalog Prepares #1381.	2021-06-10 15:42:21 +02:00
Marco Neumann	33e364ed78	feat: add encoding info to transaction protobuf This should help with #1381.	2021-06-10 15:42:21 +02:00
Marco Neumann	4fe2d7af9c	chore: enforce `clippy::future_not_send` for `parquet_file`	2021-06-09 18:18:27 +02:00
Andrew Lamb	ab0aed0f2e	refactor: Remove a layer of channels in parquet read stream (#1648 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-07 16:47:04 +00:00
Raphael Taylor-Davies	1e7ef193a6	refactor: use field metadata to store influx types (#1642 ) * refactor: use field metadata to store influx types make SchemaBuilder non-consuming * chore: remove unused variants * chore: fix lints	2021-06-07 13:26:39 +00:00
Marco Neumann	c830542464	feat: add info log when cleanup limit is reached	2021-06-04 11:12:29 +02:00
Marco Neumann	91df8a30e7	feat: limit number of files during storage cleanup Since the number of parquet files can potentially be unbound (aka very very large) and we do not want to hold the transaction lock for too long and also want to limit memory consumption of the cleanup routine, let's limit the number of files that we collect for cleanup.	2021-06-03 17:43:11 +02:00
Marco Neumann	85139abbbb	fix: use structured logging for cleanup logs	2021-06-03 11:23:29 +02:00
Andrew Lamb	32c6ed1f34	refactor: More cleanup related to multi-table chunks (#1604 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-02 17:00:23 +00:00
Marco Neumann	e5b65e10ac	test: ensure that `find_last_transaction_timestamp` indeed returns the last timestamp	2021-06-02 10:15:06 +02:00
Marco Neumann	98e413d5a9	fix: do not unwrap broken timestamps in serialized catalog	2021-06-02 10:15:06 +02:00
Marco Neumann	fc0a74920f	fix: use clearer error text	2021-06-02 09:41:19 +02:00
Marco Neumann	2a0b2698c6	fix: use structured logging Co-authored-by: Andrew Lamb <alamb@influxdata.com>	2021-06-02 09:41:19 +02:00
Marco Neumann	64bf8c5182	docs: add code comment explaining why we parse transaction timestamps Co-authored-by: Andrew Lamb <alamb@influxdata.com>	2021-06-02 09:41:19 +02:00
Marco Neumann	77aeb5ca5d	refactor: use protobuf-native Timestamp instead of string	2021-06-02 09:41:19 +02:00
Marco Neumann	9b9400803b	refactor!: bump transaction version to 2	2021-06-02 09:41:19 +02:00
Marco Neumann	5f77b7b92b	feat: add `parquet_file::catalog::find_last_transaction_timestamp`	2021-06-02 09:41:19 +02:00
Marco Neumann	9aee961e2a	test: test loading catalogs from broken protobufs	2021-06-02 09:41:19 +02:00
Marco Neumann	0a625b50e6	feat: store transaction timestamp in preserved catalog	2021-06-02 09:41:19 +02:00
Andrew Lamb	d8fbb7b410	refactor: Remove last vestiges of multi-table chunks from PartitionChunk API (#1588 ) * refactor: Remove last vestiges of multi-table chunks from PartitionChunk API * fix: remove test that can no longer fail * fix: update tests + code review comments * fix: clippy * fix: clippy * fix: restore test_measurement_fields_error test	2021-06-01 16:12:33 +00:00
Andrew Lamb	d3711a5591	refactor: Use ParquetExec from DataFusion to read parquet files (#1580 ) * refactor: use ParquetExec to read parquet files * fix: test Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-06-01 14:44:07 +00:00
Andrew Lamb	64328dcf1c	feat: cache schema on catalog chunks too (#1575 )	2021-06-01 12:42:46 +00:00
Andrew Lamb	00e735ef0d	chore: remove unused dependencies (#1583 )	2021-05-29 10:31:57 +00:00
Raphael Taylor-Davies	db432de137	feat: add distinct count to StatValues (#1568 )	2021-05-28 17:41:34 +00:00
kodiakhq[bot]	6098c7cd00	Merge branch 'main' into crepererum/issue1376	2021-05-28 07:13:15 +00:00
Andrew Lamb	f3bec93ef1	feat: Cache TableSummary in Catalog rather than computing it on demand (#1569 ) * feat: Cache `TableSummary` in catalog Chunks * refactor: use consistent table summary	2021-05-27 16:03:05 +00:00
Marco Neumann	dd2a976907	feat: add a flag to ignore metadata errors during catalog rebuild	2021-05-27 13:10:14 +02:00
Marco Neumann	bc7389dc38	fix: fix typo Co-authored-by: Andrew Lamb <alamb@influxdata.com>	2021-05-27 12:51:01 +02:00
Marco Neumann	48307e4ab2	docs: adjust error description to reflect internal errors Co-authored-by: Andrew Lamb <alamb@influxdata.com>	2021-05-27 12:51:01 +02:00
Marco Neumann	d6f0dc7059	feat: implement catalog rebuilding from files Closes #1376.	2021-05-27 12:51:01 +02:00
Marco Neumann	024323912a	docs: explain what `PreservedCatalog::wipe` offers	2021-05-27 12:48:41 +02:00
Raphael Taylor-Davies	4fcc04e6c9	chore: enable arrow prettyprint feature (#1566 )	2021-05-27 10:28:14 +00:00
Marco Neumann	9f451423d5	feat: log files that are deleted	2021-05-26 12:49:44 +02:00
Marco Neumann	24ec1a472e	fix: do NOT delete parquet files that are reachable by time travel	2021-05-26 12:38:54 +02:00
Marco Neumann	5983336366	refactor: rename `parquet_file::{utils => test_utils}`	2021-05-26 11:09:29 +02:00
Marco Neumann	d7e3bc569e	refactor: shorten time we hold the transaction lock during clean-up	2021-05-26 11:04:57 +02:00
Marco Neumann	18f5dd9ae1	test: ensure transaction lock exists during cleanup planning	2021-05-26 11:04:57 +02:00
Marco Neumann	b55eae98da	fix: do not delete non-parquet files during catalog-driven cleanup	2021-05-26 11:04:57 +02:00
Marco Neumann	5ed16ff294	refactor: improve error message in `parquet_file::cleanup`	2021-05-26 11:04:57 +02:00
Marco Neumann	14fdf3b7c7	feat: implement object store cleanup core routine	2021-05-26 11:02:40 +02:00
Marco Neumann	cc78b5317d	feat: add method to get all parquet files from catalog state	2021-05-26 11:02:40 +02:00
Marco Neumann	953114af2e	feat: add method to abort catalog transaction	2021-05-26 11:02:40 +02:00
Marco Neumann	92fcd7e940	feat: add a way to get OS, server ID and DB name from catalog	2021-05-26 11:02:40 +02:00
Marco Neumann	9daa4d00d6	test: re-organize `parquet_file` test utils a bit	2021-05-26 11:02:39 +02:00
Marco Neumann	38183928c8	refactor: extract path generator for data location	2021-05-26 10:59:40 +02:00
Marco Neumann	19a2733d30	feat: preserve transaction metadata in parquets	2021-05-25 09:56:12 +02:00
Marco Neumann	fe8e6301fe	refactor: move `read_schema_from_parquet_metadata` back to `parquet_file::metadata` Let us pool all metadata handling in a single module, which makes it easier to review.	2021-05-25 09:37:53 +02:00
Marco Neumann	ac83d99f66	feat: add a way to get current revision and UUID from transaction handle	2021-05-25 09:37:53 +02:00
Marco Neumann	fdc553b257	refactor: replace unwrap with expect	2021-05-25 09:37:53 +02:00
Andrew Lamb	c464ffadad	refactor: remove special case timestamp_range in parquet chunk (#1543 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-05-24 16:19:44 +00:00
Andrew Lamb	14ba25f86d	chore: Update datafusion and use released version of arrow crates (#1546 ) * chore: Update datafusion and use released version of arrow crate * fix: Update for change in API	2021-05-24 15:37:22 +00:00
Andrew Lamb	27e5b8fabf	refactor: Remove multiple table support from Parquet Chunk (#1541 )	2021-05-24 08:40:31 -04:00
Marco Neumann	8bdddfd475	docs: mention that catalog wiping does not delete parquet files	2021-05-20 10:22:20 +02:00
Marco Neumann	b1a06246d6	feat: implement function to wipe a preserved catalog	2021-05-20 10:22:20 +02:00
Marco Neumann	6c405aa6f9	feat: check if preserved catalog exists when creating an empty one	2021-05-20 10:22:20 +02:00
Marco Neumann	c6a6005f65	feat: add `PreservedCatalog.exists`	2021-05-20 10:22:20 +02:00
Raphael Taylor-Davies	37880ee89a	refactor: store chunk IDs only in catalog (#1521 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-05-20 04:07:14 +00:00
Marco Neumann	8db26485a4	refactor: empty transaction during catalog creation That involves some refactoring which we are going to need anyway for hooking up the "read" path of the catalog into the DB startup, namely: - make `Db::new` require a preserved catalog - introduce a helper function that can provide that - as a consequence, all test-creations of a Db are now async This prepares for #1382.	2021-05-18 17:42:07 +02:00
Marco Neumann	cdf0ada6a6	test: test preserved catalog <-> Db write wiring	2021-05-17 13:57:31 +02:00
Marco Neumann	68729dd5ee	refactor: avoid string allocation	2021-05-17 12:32:34 +02:00
Marco Neumann	adcd8132e7	docs: more comments regarding catalog transaction handling	2021-05-17 12:05:08 +02:00
Marco Neumann	a99d53e771	docs: document `OpenTransaction::handle_action*`	2021-05-17 11:48:51 +02:00
Marco Neumann	4fb800c7a6	refactor: make PreservedCatalog easier to integrate	2021-05-17 11:33:22 +02:00
Marco Neumann	f4d7154746	fix: table summaries must include timestamp as well	2021-05-17 11:33:22 +02:00
Marco Neumann	7cced3242f	feat: add a way to parse infos from parquet paths	2021-05-17 11:33:22 +02:00
Marco Neumann	5969caccb0	feat: return parquet metadata from `write_to_object_store`	2021-05-17 11:33:22 +02:00
Raphael Taylor-Davies	f9178dbb5f	feat: push metrics into catalog (#1488 ) * feat: push metrics into catalog * chore: minor cleanup * fix: include db labels in chunk metric domains * chore: fmt * fix: don't allow dropping moving chunks * chore: further tweaks * chore: review feedback * feat: use new_unregistered() for metric instruments instead of default * chore: use &[KeyValue] instead of &Vec<KeyValue> * refactor: make GauageValue non default constructible	2021-05-14 17:37:39 +00:00
Nga Tran	9583636748	feat: we now can read parquet files form all kind of object stores	2021-05-12 18:05:34 -04:00
Marco Neumann	795f5bfcb7	refactor: make `StatValues::{min,max}` optional + handle NaNs This will allow us to: - handle all-NULL columns correctly - be in-line with Parquet (where min/max are optional) - handle NaNs at least somewhat sane (they do not "poison" stats anymore)	2021-05-10 17:12:25 +02:00
Nga Tran	c6b933eb63	chore: merge main to branch	2021-05-07 18:40:17 -04:00
Nga Tran	f2c19ec080	refactor: further address Carol's comment	2021-05-07 17:40:40 -04:00
Nga Tran	971500681f	refactor: address Andrew's and Carol's comment	2021-05-07 17:33:19 -04:00
Carol (Nichols \|\| Goulding)	e2cc4634bf	fix: Use PathBuf rather than debug formatting and back to String This is the same fix I made in `54c5f98`, just found a few more spots :)	2021-05-07 15:58:11 -04:00
Nga Tran	31d49db0ed	chore: a litlle more cleanup	2021-05-07 09:38:41 -04:00
Nga Tran	ba015ee4df	refactor: clean up and add comments	2021-05-07 09:31:41 -04:00
Marco Neumann	1a998d4116	feat: preserve parquet metadata in catalog Closes #1380.	2021-05-07 09:51:44 +02:00
Marco Neumann	c3d523fc4f	refactor: add col prefixes to make_chunk & Co	2021-05-07 09:51:44 +02:00
Marco Neumann	5db504300d	refactor: use parsed paths instead of raw strings for catalog paths	2021-05-07 09:51:44 +02:00
Nga Tran	55bf848bd2	feat: Now we can query directly from files in object store	2021-05-06 18:02:17 -04:00
Andrew Lamb	884baf7329	feat: add column_type and influxdb_column_type, remove row_count from system.columns (#1415 ) * feat: add column_type and influxdb_column_type, remove row_count from system.columns * fix: update tests * fix: more test update * fix: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * fix: fmt * fix: copy/paste type conversion to avoid cross dependency between data_types and internal_types Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com>	2021-05-06 12:59:30 +00:00
Andrew Lamb	86771ea629	chore: update arrow/datafusion deps (#1433 ) * chore: update datafusion deps * chore: update arrow deps Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-05-05 22:37:31 +00:00
Nga Tran	a5c92fae8a	chore: merge main to branch	2021-05-05 13:48:42 -04:00
Nga Tran	3bdb451529	chore: merge main to branch	2021-05-05 13:18:39 -04:00
Raphael Taylor-Davies	411cf134e9	refactor: explode arrow_deps (#1425 ) * refactor: explode arrow_deps * chore: workaround doctest bug	2021-05-05 16:59:12 +00:00
Nga Tran	2b46f51e5b	chore: address Dom's comment	2021-05-05 12:55:41 -04:00
Nga Tran	a1f3413c89	refactor: move private test helpers to utils module to be used by many modules	2021-05-05 11:41:46 -04:00
Nga Tran	fcb37a0b1d	feat: more testing scenarios for quering parquet files	2021-05-05 10:57:02 -04:00
Marco Neumann	1f42eb89cd	feat: implement parquet metadata handling Closes #1379 and contributes to #1380.	2021-05-05 13:29:16 +02:00
Marco Neumann	056c29aaa2	feat: add a way to retrieve timestamp range from parquet chunk	2021-05-05 13:29:16 +02:00
Marco Neumann	c54109113e	feat: add a way to retrieve storage path from parquet chunks	2021-05-05 13:29:16 +02:00
Marco Neumann	136c35cb88	feat: implement transaction handling for catalog Closes #1253.	2021-05-03 10:04:35 +02:00
Nga Tran	34a3388a49	feat: unload chunks from read buffer but keep them in object store	2021-04-30 16:12:02 -04:00
Nga Tran	e87973babe	refactor: address review comments	2021-04-29 13:15:43 -04:00
Nga Tran	402d9c748c	chore: cargo fmt	2021-04-28 16:52:52 -04:00
Nga Tran	2a2760bd18	feat: complete tests where data in both RUB and OS	2021-04-28 16:14:07 -04:00
Nga Tran	140d96dbea	feat: tests ffor loading data to object store and make sure twe still query read buffer	2021-04-28 15:59:17 -04:00
Marco Neumann	eddc9319ff	docs: deny broken intradoc links	2021-04-27 13:22:28 +02:00
Carol (Nichols \|\| Goulding)	272cdb85ce	fix: Use the ServerId type everywhere, for writing, querying, anything	2021-04-26 18:44:32 +00:00
Carol (Nichols \|\| Goulding)	b8face3335	refactor: Organize use statements	2021-04-26 18:44:32 +00:00
Jake Goulding	67f5ad841d	refactor: Introduce ServerId and CurrentServerId types	2021-04-26 18:44:32 +00:00
Nga Tran	657bfa1b20	refactor: address Andrew's comments	2021-04-16 17:44:46 -04:00
Nga Tran	b3e110a241	refactor: address Jake's comment	2021-04-16 17:27:40 -04:00
Nga Tran	4c23ca8888	feat: full implementation of parquet's read_filter for review	2021-04-16 16:03:24 -04:00
Andrew Lamb	e226b5a820	feat: Use TimestampNanosecondArray for timestamps in IOx (#1230 ) * refactor: Create Arrow arrays using iterators * feat: use Timestamp64(TimeUnit::Nanosecond) for timestamps * feat: add support for timestamp array * fix: update more tests * fix: remove unecessary code Co-authored-by: Edd Robinson <me@edd.io> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-04-16 15:55:33 +00:00
Nga Tran	231ebb54d4	chore: fix a format	2021-04-14 16:32:25 -04:00
Nga Tran	4e2d59d9a5	feat: saimplement a few more functions as part of supporting query dfrom parquet files	2021-04-14 16:06:47 -04:00
Nga Tran	05bf28ce85	feat: Add 2 main functions table_schema and table_names for Parquet Chunk ato pay a foundation for querying it	2021-04-13 18:23:55 -04:00
Nga Tran	4a6d6bd7ad	feat: initial work for querying data from parquet file in object store	2021-04-13 13:57:46 -04:00
Raphael Taylor-Davies	1997324344	feat: mutable buffer snapshotting (#1179 ) * feat: mutable buffer snapshotting * chore: review feedback	2021-04-13 12:14:54 +00:00
Nga Tran	453aeaf1a0	feat: Add tests for writing RB chunks to Object Store	2021-04-09 17:39:23 -04:00
Nga Tran	f501a74aea	refactor: Address review comments	2021-04-07 21:28:03 -04:00
Nga Tran	be6e1e48e4	feat: add writer_id and object_store in Db	2021-04-07 18:36:07 -04:00
Raphael Taylor-Davies	c2355aca6d	feat: add basic memory tracking (#1125 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-04-07 15:38:24 +00:00
Nga Tran	6e01fbc382	feat: ause TableSummary as metadata for parquet chunk's tables and read buffer's read_filter ot get data	2021-04-05 15:37:34 -04:00
Nga Tran	4bdf8963e6	feat: continue buidling foundation for writing RB chunks to parquet files	2021-04-02 16:06:25 -04:00
Nga Tran	49267114d3	chore: merge main into branch and resolve conflicts	2021-04-01 13:22:49 -04:00
Nga Tran	1463c6645f	feat: Add ChunkState::ObjectStore and rename ParquetChunk to Chunk	2021-04-01 11:53:03 -04:00
Nga Tran	19a453a483	feat: finally have some framework with clear todos for writing a chunk into parquet files	2021-03-31 16:21:53 -04:00
Nga Tran	cd409b471f	feat: continue the implementation	2021-03-30 21:31:51 -04:00
Nga Tran	0bcd52d5c9	feat: Add more changes	2021-03-30 18:31:09 -04:00

... 6 7 8 9 10 ...

579 Commits (c8242c74696bd849e8b296f7b255d909babd7bd5)