influxdb

Commit Graph

Author	SHA1	Message	Date
Carol (Nichols \|\| Goulding)	30fea67701	fix: Move variables within format strings. Thanks clippy! Changes made automatically using `cargo clippy --fix`.	2023-02-03 13:06:17 -05:00
Nga Tran	b8a80869d4	feat: introduce a new way of max_sequence_number for ingester, compactor and querier (#6692 ) * feat: introduce a new way of max_sequence_number for ingester, compactor and querier * chore: cleanup * feat: new column max_l0_created_at to order files for deduplication * chore: cleanup * chore: debug info for chnaging cpu.parquet * fix: update test parquet file Co-authored-by: Marco Neumann <marco@crepererum.net>	2023-01-26 10:52:47 +00:00
Carol (Nichols \|\| Goulding)	43687a86d2	fix: Remove lots of needless borrows that Clippy can now identify Except for in generated code that we don't control.	2022-11-09 10:54:18 -05:00
Marco Neumann	9b48437711	refactor: make influx column type mandatory (#5978 ) We basically assume everywhere that a column falls into one of the three known categories (time, tag, field), so lets encode this in our type system instead of defining "unknown" as "undefined behavior, may or may not crash". Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-26 11:20:29 +00:00
Marco Neumann	42b89ade03	refactor: use `SendableRecordBatchStream` to write parquets (#5911 ) Use a proper typed stream instead of peeking the first element. This is more in line with our remaining stack and shall also improve error handling. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-19 12:59:53 +00:00
Marco Neumann	eb5a661ab3	refactor: prep work for #5897 (#5907 ) * refactor: add ID to `ParquetStorage` * refactor: remove duplicate code * refactor: use dedicated `StorageId`	2022-10-19 11:54:42 +00:00
Dom Dwyer	7698264768	refactor: raise error for no rows in parquet file Previously when attempting to serialise a stream of one or more RecordBatch containing no rows (resulting in an empty file), the parquet serialisation code would panic. This changes the code path to raise an error instead, to support the compactor making multiple splits at once, which may overlap a single chunk: ────────────── Time ────────────▶ │ │ ┌█████──────────────────────█████┐ │█████ │ Chunk 1 │ █████│ └█████──────────────────────█████┘ │ │ │ │ Split T1 Split T2 In the example above, the chunk has an unusual distribution of write timestamps over the time range it covers, with all data having a timestamp before T1, or after T2. When a running a SplitExec to slice this chunk at T1 and T2, the middle of the resulting 3 subsets will contain no rows. Because we store only the min/max timestamps in the chunk statistics, it is unfortunately impossible to prune one of these split points from the plan ahead of time.	2022-08-30 14:52:31 +02:00
Jake Goulding	4abf21c724	refactor: Rename Sequencer (and its entourage) to Shard	2022-08-29 14:06:43 -04:00
Andrew Lamb	9215a534d0	chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0` (#5229 ) * chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0` * chore: Run cargo hakari tasks * fix: Update for API changes * fix: clippy Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-28 08:10:47 +00:00
Nga Tran	69cb3f2b19	refactor: remove min_sequence_number from Compactor and Querier, add `count_by_overlaps_with_level_0` and `count_by_overlaps_with_level_1` to catalog (#5151 ) * refactor: remove min_sequnce_number * fix: typos * fix: remove min_sequencer_number from new files from merging main * fix: add back throwing error if the compactor compacts files persisted by the ingester after the ingester sends max seq_num back to querier * test: add test_compactor_collision back but modify the input to make it work woth new changes Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-21 13:51:54 +00:00
Carol (Nichols \|\| Goulding)	61c023139b	refactor: Switch compaction levels to an enum with values rather than separate consts Bonuses: - Type checking - Validation - Less casting - Exhaustiveness checking - Less use of the numerical value	2022-07-13 11:30:36 -04:00
Carol (Nichols \|\| Goulding)	80b6c5c82f	fix: Correct typo in constant name so searching for COMPACTION_LEVEL returns all (#5077 )	2022-07-08 16:31:52 +00:00
Marco Neumann	be53716e4d	refactor: use IDs for `parquet_file.column_set` (#4965 ) * feat: `ColumnRepo::list_by_table_id` * refactor: use IDs for `parquet_file.column_set` Closes #4959. * refactor: introduce `TableSchema::column_id_map`	2022-06-30 15:08:41 +00:00
Nga Tran	cfcc4b8426	refactor: change level 1 to level 2 preparing for next design changes (#4954 ) * refactor: change level 1 to level 2 preparing for next design changes * fix: make level-2 consistent everywhere * chore: remove unused comments * refactor: change all the name level_1 to level_2 to completely replace 1 with 2 to amke everything consistent * chore: add correspinding constants for the comapction levels in the comments Co-authored-by: Dom <dom@itsallbroken.com>	2022-06-29 14:08:58 +00:00
Ryan Russell	d279deddad	docs(various): Improve Readability (#4768 ) Signed-off-by: Ryan Russell <git@ryanrussell.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-02 18:01:06 +00:00
Dom Dwyer	f8b83c5085	test: assert panic behaviour Modifies the existing test added as part of #4695 to ensure a panic is emitted when serialising an empty parquet file.	2022-06-01 16:55:53 +01:00
Nga Tran	16e7a6d596	test: test that hits panic becasue of no column meta data (#4719 ) * test: test that hits panic becasue of no column meta data * chore: Apply suggestions from code review * chore: run format after applying changes * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * chore: run clippy Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-05-27 15:27:03 +00:00
Andrew Lamb	dde3c3922c	refactor: use consistent spelling of serialize (#4717 )	2022-05-27 14:42:59 +00:00
Nga Tran	ea81152fac	refactor: add partition ID into debug info and panic earlier to identify the bug easier (#4716 ) * chore: point tests to the new ticket * chore: cleanup * refactor: add partition ID into debug info and panic earlier to identify the bug easier	2022-05-27 12:20:36 +00:00
Nga Tran	09b55a209d	chore: point tests to the new ticket (#4715 ) * chore: point tests to the new ticket * chore: cleanup	2022-05-27 11:12:55 +00:00
Nga Tran	372b262f37	test: parquet meta decoded tests and more debug info (#4713 ) * test: reproducer for 4695 * chore: some debug info * test: test with many columns and rows * chore: cleanup and add debug info * chore: cleanup * chore: cleanup * chore: more debug info	2022-05-27 09:53:07 +00:00
Nga Tran	05151d5c69	test: reproducer for 4695 (#4706 ) * test: reproducer for 4695 * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-26 15:32:30 +00:00
Dom Dwyer	2e6c49be83	refactor: remove IoxMetadata min & max timestamp Removes the min/max timestamp fields from the IoxMetadata proto structure embedded within a Parquet file's metadata. These values are redundant as they already exist within the Parquet column statistics, and precluded streaming serialisation as these removed min/max values were needed before serialising the file.	2022-05-23 16:27:08 +01:00
Dom Dwyer	a142a9eb57	refactor: remove row_count from IoxMetadata Remove the redundant row_count from the IoxMetadata structure that is serialised into the Parquet file. The reasoning is twofold: * The Parquet file's native metadata already contains a row count * Needing to know the number of rows up-front precludes streaming	2022-05-23 16:18:35 +01:00
Dom Dwyer	71555ee55c	test: Parquet metadata integration test Adds two integration tests covering validation of the embedded IOx metadata within the Parquet file metadata, and validation of the derived ParquetFileParams metadata used to populate the catalog.	2022-05-23 16:17:56 +01:00

25 Commits (86dd72ef1f31b62a97d7bd22a69e63dd7b62bc52)