Commit Graph

13 Commits (0c705fecf168b7fb265510240775d9e744e6e15a)

Author SHA1 Message Date
Marco Neumann be53716e4d
refactor: use IDs for `parquet_file.column_set` (#4965)
* feat: `ColumnRepo::list_by_table_id`

* refactor: use IDs for `parquet_file.column_set`

Closes #4959.

* refactor: introduce `TableSchema::column_id_map`
2022-06-30 15:08:41 +00:00
Nga Tran cfcc4b8426
refactor: change level 1 to level 2 preparing for next design changes (#4954)
* refactor: change level 1 to level 2 preparing for next design changes

* fix: make level-2 consistent everywhere

* chore: remove unused comments

* refactor: change all the name level_1 to level_2 to completely replace 1 with 2 to amke everything consistent

* chore: add correspinding constants for the comapction levels in the comments

Co-authored-by: Dom <dom@itsallbroken.com>
2022-06-29 14:08:58 +00:00
Ryan Russell d279deddad
docs(various): Improve Readability (#4768)
Signed-off-by: Ryan Russell <git@ryanrussell.org>

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-02 18:01:06 +00:00
Dom Dwyer f8b83c5085 test: assert panic behaviour
Modifies the existing test added as part of #4695 to ensure a panic is
emitted when serialising an empty parquet file.
2022-06-01 16:55:53 +01:00
Nga Tran 16e7a6d596
test: test that hits panic becasue of no column meta data (#4719)
* test: test that hits panic becasue of no column meta data

* chore: Apply suggestions from code review

* chore: run format after applying changes

* chore: Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* chore: run clippy

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-05-27 15:27:03 +00:00
Andrew Lamb dde3c3922c
refactor: use consistent spelling of serialize (#4717) 2022-05-27 14:42:59 +00:00
Nga Tran ea81152fac
refactor: add partition ID into debug info and panic earlier to identify the bug easier (#4716)
* chore: point tests to the new ticket

* chore: cleanup

* refactor: add partition ID into debug info and panic earlier to identify the bug easier
2022-05-27 12:20:36 +00:00
Nga Tran 09b55a209d
chore: point tests to the new ticket (#4715)
* chore: point tests to the new ticket

* chore: cleanup
2022-05-27 11:12:55 +00:00
Nga Tran 372b262f37
test: parquet meta decoded tests and more debug info (#4713)
* test: reproducer for 4695

* chore: some debug info

* test: test with many columns and rows

* chore: cleanup and add debug info

* chore: cleanup

* chore: cleanup

* chore: more debug info
2022-05-27 09:53:07 +00:00
Nga Tran 05151d5c69
test: reproducer for 4695 (#4706)
* test: reproducer for 4695

* chore: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-26 15:32:30 +00:00
Dom Dwyer 2e6c49be83 refactor: remove IoxMetadata min & max timestamp
Removes the min/max timestamp fields from the IoxMetadata proto
structure embedded within a Parquet file's metadata.

These values are redundant as they already exist within the Parquet
column statistics, and precluded streaming serialisation as these
removed min/max values were needed before serialising the file.
2022-05-23 16:27:08 +01:00
Dom Dwyer a142a9eb57 refactor: remove row_count from IoxMetadata
Remove the redundant row_count from the IoxMetadata structure that is
serialised into the Parquet file.

The reasoning is twofold:

    * The Parquet file's native metadata already contains a row count
    * Needing to know the number of rows up-front precludes streaming
2022-05-23 16:18:35 +01:00
Dom Dwyer 71555ee55c test: Parquet metadata integration test
Adds two integration tests covering validation of the embedded IOx
metadata within the Parquet file metadata, and validation of the derived
ParquetFileParams metadata used to populate the catalog.
2022-05-23 16:17:56 +01:00