influxdb

Commit Graph

Author	SHA1	Message	Date
Andrew Lamb	1ccaa433e8	fix: Temporarily disable parquet predicate pushdown (#2164 )	2021-07-30 20:24:30 +00:00
Carol (Nichols \|\| Goulding)	9d15798288	fix: Address or allow Clippy warnings new with Rust 1.54	2021-07-30 09:59:59 -04:00
kodiakhq[bot]	545222303f	Merge branch 'main' into cn/cc-only	2021-07-29 17:18:16 +00:00
Carol (Nichols \|\| Goulding)	ad0a9549de	fix: Avoid an unnecessary parsing of iox metadata In one case where ParquetChunk::new was being called, the calling code had just parsed the IoxMetadata too. In the other case, the calling code had just created the IoxMetadata being parsed. In both cases, this re-parsing wasn't actually needed; the two bits of info ParquetChunk::new can be easily passed in.	2021-07-28 14:25:56 -04:00
Carol (Nichols \|\| Goulding)	af7866a638	refactor: Remove first/last write times from ParquetFile chunks	2021-07-28 14:12:36 -04:00
Marco Neumann	04e797c706	refactor: pass sequencer numbers directly to DB checkpoint First of all using a partition checkpoint as some kind of intermediate representation was kinda a hack because partition checkpoints should only created for to-be-persisted partitions, not for the others. API-wise it should only be possible to construct a partition checkpoint from a flush handle. Also we were only able to construct partition checkpoints for partitions that had unpersisted data, otherwise there was no sane way to fill the `min_unpersisted_timestamp`. We must however scan all partitions no matter if there is unpersisted data so that we can determine the maximum seen sequence numbers. This was caught by a replay test resulting in a catalog state where the last database checkpoint had lower maximum seen sequence numbers than some partition checkpoint, bailing out with an error. So overall it turns out that passing the sequencer numbers directly instead of wrapping them into a partition checkpoint is the better implementation.	2021-07-28 17:28:34 +02:00
Andrew Lamb	5fb3e00f2a	fix: Properly record total_count and null_count in statistics (#2103 ) * fix: Properly record total_count and null_count in statistics * fix: fix statistics calculation in mutable_buffer * refactor: expose null counts in read_buffer * refactor: expose null_count in parquet_file * fix: update server crate tests * fix: update query_tests tests * docs: tweak comments * refactor: Use storage_stats rather than adding `null_count` * refactor: rename test data field for clarity * fix: fixup merge conflicts * refactor: rename initial_non_null_count to initial_total_count * refactor: caculate null_count as row_count - to_add	2021-07-26 18:13:36 +00:00
Carol (Nichols \|\| Goulding)	0acb0efbc9	fix: Bump METADATA and TRANSACTION versions	2021-07-26 10:52:42 -04:00
Jake Goulding	d928bc84e6	feat: Thread time_of_{first,last}_write through Parquet metadata	2021-07-23 14:07:35 -04:00
Carol (Nichols \|\| Goulding)	9604ce7084	fix: Don't pass table name around when it's only returned back The read_statistics, read_statistics_from_parquet_row_group, load_parquet_from_store, and load_parquet_from_store_for_chunk functions weren't ever using table name, they just passed it around and passed it back.	2021-07-23 13:48:16 -04:00
Carol (Nichols \|\| Goulding)	3c794153dd	refactor: Organize uses	2021-07-23 13:48:15 -04:00
kodiakhq[bot]	5b5453a020	Merge branch 'main' into pd/add-parquet-cache	2021-07-22 20:21:53 +00:00
Paul Dix	88e29dede9	chore: remove extraneous example code from parquet storage	2021-07-22 16:21:13 -04:00
Andrew Lamb	01c79f1a1a	fix: Print all timestamps using RFC3339 format (#2098 ) * fix: Use IOx pretty printer rather than arrow pretty printer * chore: update tests in the query crate * chore: update influxdb_iox tests * chore: Update end to end tests * chore: update query_tests * chore: update mutable_buffer tests * refactor: update parquet_file tests * refactor: update db tests * chore: update kafka integration test output * fix: merge conflict	2021-07-22 19:04:52 +00:00
Marco Neumann	50241bae9e	refactor: do not abuse `uint64::MAX` as sentinal for `None`	2021-07-22 12:51:43 +02:00
Paul Dix	d95b5df03e	refactor: move cache to ObjectStore Since the consumers of ObjectStore always use the concrete type rather than the ObjectStoreApi trait, it makes more sense to just change the concrete type to have a pointer to the cache. This removes the cache from the ObjectStoreApi trait and changes the ObjectStore to be a regular struct rather than a tuple around the ObjectStoreIntegration. Future work will have the server configure the cache on the ObjectStore struct when its options are set.	2021-07-21 18:27:56 -04:00
Paul Dix	d0ea812041	feat: add skeleton for object store file cache	2021-07-21 18:27:56 -04:00
Marco Neumann	57a9d5ade0	refactor: correctly track "seen" ranges in persistence checkpoints Now we can handle all these cases: There are two partitions w/ a single write each: 1. A reads sequence number 1 2. B reads sequence number 2 3. we persist A which only knows the sequences up until 1 => the DB checkpoint needs the global max, otherwise we forget sequences during replay (2 in this case, so B would be gone) 1. B reads sequence number 1 2. A reads sequence number 2 3. we persist A which (w/o this commit) would not track the sequencer at all in this checkpoint (since there is nothing to replay) => we MUST also remember that we already read up until 2, otherwise we'll re-read 2 after replay => the partition checkpoint needs the local seen max (no matter if there's something to to persist)	2021-07-21 19:19:49 +02:00
Marco Neumann	a5fc1c7d38	fix: collect min AND max in database checkpoints This is required to correctly handle the following case: 1. There are two partitions A and B w/ a single write each (from the same sequencer). 2. We persist A: - The partition checkpoint for A will be empty because after persistence there will be nothing to replay (the single write is persisted and we're ready). - The database checkpoint that contains the global minimum of all ranges recognizes that for the sequencer there is indeed something left (the minimum sequence number from B). 3. DB restart happens, replay starts 4. We scan all persisted files, figure out that we have a DB checkpoint with a sequence minimum but (w/o the change in this commit) there is no maximum. Only partition checkpoints contain maxima, and the only partition checkpoint that was persisted was the one for partition A and that one was empty (see above). 5. So now how do we recover partition B?	2021-07-21 14:48:29 +02:00
Andrew Lamb	4da8a16c18	chore: update to arrow 5.0 and master datafusion (#2049 ) * chore: update to arrow 5.0 and master datafusion * fix: Update test for change in object size	2021-07-19 12:49:51 +00:00
Jake Goulding	42b56ad657	refactor: Use SNAFU's context instead of `ok_or_else`	2021-07-16 09:59:54 -04:00
Jake Goulding	939d15a21f	perf: Avoid clone when an error doesn't occur	2021-07-16 09:59:54 -04:00
Marco Neumann	f57ba6afdb	fix: use fixed-size timestamps for parquet metadata (#2032 ) This fixes flaky tests that rely on predictable files sizes. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-16 13:14:02 +00:00
Andrew Lamb	0c86d1dccf	feat: Record parquet bytes size in catalog / parquet_file (#2006 ) * feat: Store object store size in parquet_file * fix: update TRANSACTION_VERSION to 8 * refactor: rename os_bytes --> file_size_bytes	2021-07-15 12:07:11 +00:00
Marco Neumann	40047a76bc	refactor: `remove_parquet` cannot fail	2021-07-15 12:07:56 +02:00
Raphael Taylor-Davies	1d00fa2fd8	refactor: track memory metrics in catalog (#1995 ) * refactor: track memory metrics in catalog * chore: update comment	2021-07-14 16:23:00 +00:00
Andrew Lamb	d35b74c226	fix: Fix doc build warnings (#1945 ) * fix: Fix doc build warnings * refactor: add deny bare_urls to crates Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-13 08:03:42 +00:00
Andrew Lamb	670826daf9	refactor: make object_store construction interface consistent (#1944 ) * refactor: make object_store construction interface consistent * fix: benchmarks * fix: doc build Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-12 12:56:36 +00:00
Marco Neumann	18893e76e0	refactor: convert some table name and part. key String to Arcs This has the (somewhat nice) side effect that it shrinks the in-mem catalog a bit as well because nw `ParquetChunk` is a bit smaller making the chunk stage enum smaller as well.	2021-07-08 14:34:28 +02:00
Marco Neumann	b528ac2b55	feat: store schemas per table This way we can: - check for schema matches even for writes going into different partitions - solve #1768 and #1884 in some future PR Closes #1897.	2021-07-08 09:18:09 +02:00
Andrew Lamb	e6d995cbd8	chore: Update to Rust 1.53.0 (#1922 ) * chore: Update to Rust 1.53.0 * fix: Update to latest clippy standards * fix: bad refactor * fix: Update escaping * test: update test output Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-07 18:02:03 +00:00
Marco Neumann	4ca2d3e148	chore: move persistence windows related code into own crate The entire persistence windows data structures (including the checkpoints) have nothing to do with the mutable buffer per se. So lets move them into their own crate. This also makes `parquet_file` not longer depend on `mutable_buffer`.	2021-07-05 10:23:58 +02:00
Marco Neumann	d96e15c3f7	docs: explain why we store checkpoints in parquet files	2021-07-05 09:42:46 +02:00
Marco Neumann	cdab1bed05	feat: persist part+db checkpoint in parquets and catalog This will be required for replay on server startup.	2021-07-05 09:42:46 +02:00
Jacob Marble	0779b0d9bd	feat: add gRPC listener for new write protocol (#1842 ) * feat: add gRPC listener for new write protocol * chore: clippy happy * chore: lint * chore: cargo fmt --all * chore: cargo clippy * chore: protobuf-lint * chore: more formatting Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2021-07-01 16:15:12 +00:00
Marco Neumann	4204127b05	refactor: use protobuf for in-parquet metadata	2021-06-30 16:51:37 +02:00
Marco Neumann	ddc9cd49ca	chore: bump preserved catalog version	2021-06-29 14:23:06 +02:00
Marco Neumann	3ebb6a3037	refactor: do not capture txn-specific information in parquet files This helps with #1821.	2021-06-29 14:22:36 +02:00
kodiakhq[bot]	eda9532eb2	Merge branch 'main' into crepererum/issue1821-cleanup-lock	2021-06-29 10:48:43 +00:00
Marco Neumann	48df13de05	refactor: use parking lot for catalog cleanup	2021-06-29 12:47:29 +02:00
Marco Neumann	f824f235b4	fix: fix info log message Co-authored-by: Andrew Lamb <alamb@influxdata.com>	2021-06-29 12:35:05 +02:00
Marco Neumann	778a611fb8	docs: add clarifying comment for rebuild test	2021-06-29 11:58:19 +02:00
Marco Neumann	17f89ea8d0	docs: fix comment about lock downgrade	2021-06-29 11:53:55 +02:00
Marco Neumann	2cd5ce98be	refactor: do not pass locks around for catalog cleanup	2021-06-29 10:21:41 +02:00
Marco Neumann	730a23faa3	refactor: improve locking around the parquet file cleanup Instead of (ab)using the transaction lock to prevent the cleanup job from removing just-written parquet files, use a dedicated lock. This will later allow us to write parquet files before starting a transaction (i.e. w/o holding the transaction lock). This will help with #1821.	2021-06-29 10:20:03 +02:00
Marco Neumann	6ec24353bf	refactor: only rebuild a single txn for pres. catalogs Stop relying on in-parquet transaction information during catalog rebuilds. This has some downsides (no fork detection, only a single transaction hence no time travel) but will allow that we remove transaction information from parquet files, so that we can finally move the actual parquet file storage out of the transaction lock. This will help with #1821.	2021-06-28 15:10:44 +02:00
Andrew Lamb	0a03605bbc	refactor: pull Channel --> Stream adapater into its own module (#1793 ) * refactor: pull Channel --> Stream adapater into its own module * docs: Update query/src/exec/stream.rs Co-authored-by: Marko Mikulicic <mkm@influxdata.com> Co-authored-by: Marko Mikulicic <mkm@influxdata.com>	2021-06-24 10:35:45 +00:00
kodiakhq[bot]	59993e8b8f	Merge branch 'main' into crepererum/issue1623	2021-06-23 12:40:05 +00:00
Marco Neumann	c395409b51	feat: include UUIDv4 into parquet file names Change schema from ```text <server_id>/<db_name>/data/<part_key>/<chunk_id>/<table_name>.parquet ``` to ```text <server_id>/<db_name>/data/<table_name>/<part_key>/<chunk_id>.<uuid>.parquet ``` So parquet files will NEVER be overwritten. This is especially helpful when dealing with old catalog leftovers (i.e. a parquet file that belonged to an old but wiped catalog). It also simplifies the reasoning about file references in the future and follows what other dataset formats are usually doing (i.e. never replace files). Also use `ChunkAddr` where it makes sense.	2021-06-23 14:30:28 +02:00
kodiakhq[bot]	70817a474c	Merge branch 'main' into crepererum/issue1740-d	2021-06-23 12:29:54 +00:00

1 2 3 4 5

212 Commits (0fe8eda89e66c849e20c04f0fb90e4d3e4879663)