Commit Graph

654 Commits (75fe1217e09f2bb52968dc3c04ad13e73c382c93)

Author SHA1 Message Date
Carol (Nichols || Goulding) 423ee71f5e refactor: Remove duplicated lint rules
These get inherited from crate root files, so the lint rules in
src/main.rs apply in this file already.
2020-06-24 16:56:16 -04:00
Andrew Lamb 3bb3f2ddbd
Merge pull request #185 from influxdata/alamb/fix-parquet-nulls
fix: Correctly encode nulls in parquet files
2020-06-24 11:51:31 -04:00
alamb 0fdc6aa745 test: add test for packing null values 2020-06-24 11:34:40 -04:00
alamb 431787fb31 Merge remote-tracking branch 'origin/master' into alamb/fix-parquet-nulls 2020-06-24 11:29:07 -04:00
Andrew Lamb de600b7712
fix: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-24 09:44:08 -04:00
Andrew Lamb ab22384009
Merge pull request #186 from influxdata/alamb/refactor-parquet-deps
refactor: clean up parquet library deps and remove use of InputReaderAdapter (related to parquet dependencies)
2020-06-24 09:42:44 -04:00
Carol (Nichols || Goulding) 6fb107af68
Merge pull request #178 from influxdata/cn-u64-enc 2020-06-24 08:48:57 -04:00
alamb 2c4a9dba53 fix: cleanup comment + code order 2020-06-23 17:21:20 -04:00
alamb b22423621b refactor: remove InputReaderAdapter 2020-06-23 17:15:02 -04:00
alamb 68ce351a3a refactor: remove direct parquet dependency from delorean_ingest 2020-06-23 16:58:31 -04:00
Andrew Lamb 16bf5887df
fix: Setup parquet column encoding correctly (#182)
FYI @e-dard
2020-06-23 16:42:44 -04:00
alamb c9b24f3762 fix: Correctly encode nulls in parquet files 2020-06-23 12:23:47 -04:00
alamb eee1e9fe77 fix: Setup parquet column encoding correctly 2020-06-23 09:54:16 -04:00
Edd Robinson ec448f361a refactor: enable unisgned block reading 2020-06-23 10:50:32 +01:00
Andrew Lamb 943a6cd299
feat: benchmark for lp->parquet performance (#176) 2020-06-23 05:44:52 -04:00
Andrew Lamb 322a491b9d
perf: Improve line protocol --> parquet conversion performance by ~20% (#177)
* feat: benchmark for lp->parquet performance

* feat: improve parser performance by storing contiguous EscapedStr

* fix: remove all string copies during LP-Parquet conversion

* refactor: Implement from_str as From<&str> only

* refactor: implement Deref instead of as_str

* refactor: Remove ends_with because Deref now makes it work

* refactor: Eq can be derived

* refactor: Remove unused From implementation

* refactor: Replace single-character strings with chars as requested by clippy

Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@integer32.com>
2020-06-23 05:42:19 -04:00
Andrew Lamb 86a425e5ef
feat: Add support for parsing bool values in line protocol parser (#156)
* feat: Implement boolean support for the line protcol parser

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: fmt+clippy

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-22 16:58:38 -04:00
Carol (Nichols || Goulding) 294163bed0 feat: Implement unsigned encoding 2020-06-22 16:52:24 -04:00
Andrew Lamb 2a42df278a
docs: Initial style guide with idomatic error handling (#174)
* docs: Initial style guide with idomatic error handling

* fix: Apply suggestions from code review

Co-authored-by: Paul Dix <paul@influxdata.com>

* fix: Apply suggestions from code review

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: clean up example 

To not to use different field name

Co-authored-by: Paul Dix <paul@influxdata.com>
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-22 16:41:36 -04:00
Carol (Nichols || Goulding) 1f6effba91
Merge pull request #163 from influxdata/cn-string-enc 2020-06-22 16:10:58 -04:00
Carol (Nichols || Goulding) 89b9dbe9e8 refactor: Slice twice instead of adding 2020-06-22 15:41:49 -04:00
Carol (Nichols || Goulding) 85e442373f test: Verify encoding and decoding invalid UTF-8 2020-06-22 15:41:27 -04:00
Carol (Nichols || Goulding) 264dd96035 test: Add a test for unicode data 2020-06-22 15:33:47 -04:00
Carol (Nichols || Goulding) 683205ad03 refactor: Use `Vec::clear` instead of `Vec::truncate(0)` 2020-06-22 15:32:15 -04:00
Carol (Nichols || Goulding) 1e341a7321 fix: Encode and decode string data as bytes
String data isn't guaranteed to be UTF-8
2020-06-22 15:32:14 -04:00
Carol (Nichols || Goulding) 672d3fe668 fix: Assert that encoded strings' lengths fits in an i32 2020-06-22 15:19:19 -04:00
Carol (Nichols || Goulding) df75db6870 refactor: Remove some unneeded type annotations 2020-06-22 15:17:03 -04:00
Carol (Nichols || Goulding) 8bc25e92bf refactor: Shorten unused cases 2020-06-22 15:15:37 -04:00
Carol (Nichols || Goulding) d7dbf061cb feat: Implement String encoding/decoding
Fixes #148.
2020-06-22 15:15:34 -04:00
Carol (Nichols || Goulding) bf884ff3d3 refactor: Extract a constant for max varint size for 64-bit integers 2020-06-22 14:53:53 -04:00
Carol (Nichols || Goulding) 4a91a8b45f refactor: Remove unneeded lifetime annotations 2020-06-22 14:53:53 -04:00
Carol (Nichols || Goulding) f2fc4a6d43 chore: Remove or change scope for outdated dead_code allows 2020-06-22 14:53:53 -04:00
Edd Robinson 2768b15bf4
Merge pull request #168 from influxdata/er/tsm-parquet
feat: Add support for converting TSM files into Parquet
2020-06-22 19:10:17 +01:00
Edd Robinson b3e78d712d refactor: address PR feedback 2020-06-22 18:56:17 +01:00
Edd Robinson 844625d811 fix: down-sample timestamps to μs 2020-06-22 18:56:17 +01:00
Edd Robinson e507183fbd refactor: cleanup + clippy 2020-06-22 18:56:17 +01:00
Edd Robinson 4bbeac7a1c refactor: extend packers 2020-06-22 18:56:17 +01:00
Edd Robinson 106bd69b5a feat: support converting from TSM->Parquet 2020-06-22 18:56:17 +01:00
Edd Robinson 9006af8961 feat: support converting from BlockType 2020-06-22 18:56:17 +01:00
Edd Robinson 3c24b6e10e refactor: small API change 2020-06-22 18:56:17 +01:00
Edd Robinson 5f40974752 refactor: don't error on string blocks 2020-06-22 18:56:17 +01:00
Edd Robinson 353c7a618b fix: ensure short blocks decode correctly 2020-06-22 18:56:17 +01:00
Edd Robinson 68a1d5355d refactor: simplify block types 2020-06-22 18:56:17 +01:00
Edd Robinson 621f2f91f0 refactor: hoist tsm mapper to delorean_tsm 2020-06-22 18:56:17 +01:00
Edd Robinson f046dbeea0 refactor: organise code in delorean_tsm crate 2020-06-22 18:56:17 +01:00
Edd Robinson 0ca6fdfa5f refactor: StorageError -> TSMError 2020-06-22 18:56:17 +01:00
Edd Robinson 85e0b4ec16 refactor: hoist tsm reader into own crate 2020-06-22 18:56:17 +01:00
Edd Robinson fd9f2ea5b8 refactor: split out index reading and block decoding
This commit splits out the functionality required to read a TSM file's
index, and decode the blocks within the file.
2020-06-22 18:56:17 +01:00
Edd Robinson 6339083b87 feat: implement mapping between blocks and table
This commit implements the ability to map from multiple columns into a
single tablular view, where columns are aligned by their timestamp
components.
2020-06-22 18:56:17 +01:00
Edd Robinson 5418b34fcc feat(tsm): map TSM data model to table model
This commit adds a new type `TSMMeasurementMapper` that will iterate
through a `TSMReader`'s index and collect together all series and blocks
by measurement. These units are called `MeasurementTable`s.
2020-06-22 18:56:17 +01:00