Commit Graph

51 Commits (9a3e0d24a3ddca99288050797c43be8c84d40e0f)

Author SHA1 Message Date
Edd Robinson cb3e948ca0 feat: TO REMOVE - TSM -> Arrow 2020-09-25 10:12:46 +01:00
Edd Robinson b62810676d feat: add support for merging blocks 2020-07-13 10:39:36 +01:00
Edd Robinson 4e66a48ba9
refactor: PR feedback
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-07-09 15:46:08 +01:00
Edd Robinson fd3f482652
refactor: PR feedback
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-07-09 15:45:50 +01:00
Edd Robinson 3d0d24d6fb
refactor: PR feedback
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-07-09 15:45:41 +01:00
Edd Robinson cc7e8e8da0 fix: ensure tables merged correctly 2020-07-08 22:57:15 +01:00
Edd Robinson bd5d39f60c refactor: address PR feedback 2020-07-08 22:57:15 +01:00
Edd Robinson 5755949c01 refactor: PR feedback
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-07-08 22:57:15 +01:00
Edd Robinson da305596f9 refactor: PR feedback
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-07-08 22:57:15 +01:00
Edd Robinson 54a61b33fc refactor: remove redundant block type 2020-07-08 22:57:15 +01:00
Edd Robinson 50ef521e6c feat: add support for converting multiple TSM files
This commit extends the ingest crate to support converting multiple TSM
files to a single Parquet file by merging identical measurements across
the TSM files.

This does not yet support merging blocks that overlap.
2020-07-08 22:57:15 +01:00
Edd Robinson fff5577efb refactor: encapsulate mapping logic
This commit moves some of the TSM mapper logic that had leaked into the
TSM->Parquer converter back into the mapper. The refactor allows us to
make some previously public APIs private, whilst still providing a
reasonably flexible API.
2020-07-08 22:57:15 +01:00
Edd Robinson 2be6385ade perf: drain block data more efficiently
This commit reduces copying of block data by replacing an inefficient
`remove` call on vectors by with an index tracking approach, leving the
original vectors in place.

It further refactors some of the mapping code DRYing things up.

It improves performance of the `map_field_columns` function by 48%.

```
time:   [137.11 us 137.50 us 137.92 us]
change: [-49.095% -48.558% -48.033%] (p = 0.00 < 0.05)
Performance has improved.
```
2020-07-03 10:56:31 +01:00
Edd Robinson 08058c8b63 refactor: move mock decoder 2020-07-03 10:56:31 +01:00
Edd Robinson b78b00d30c refactor: update delorean_ingest/src/lib.rs
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-07-01 16:27:38 +01:00
Edd Robinson d75ee0cd4d refactor: update delorean_ingest/src/lib.rs
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-07-01 15:52:21 +01:00
Edd Robinson b2addf614b refactor: PR feedback
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2020-07-01 15:52:21 +01:00
Edd Robinson 55bf2a44be test: exercise TSM converter 2020-07-01 15:52:21 +01:00
Edd Robinson 414029b96d refactor: use BlockDecoder trait 2020-07-01 15:52:21 +01:00
Jake Goulding b72767b695 refactor: use SNAFU more idiomatically in delorean_ingest 2020-06-26 13:26:51 -04:00
Jake Goulding a169a80f33 refactor: No need to return &String 2020-06-26 13:12:19 -04:00
alamb d4a2cf1bd8 fix: rename timestamp column "timestamp" -> "time" to be consistent 2020-06-26 08:26:16 -04:00
Edd Robinson d15256e0e7 refactor: address PR feedback 2020-06-26 12:08:42 +01:00
Edd Robinson 99268f5260 test: add coverage for converting tsm file 2020-06-26 11:50:37 +01:00
Edd Robinson 9d889828c3 fix: ensure all rows are emitted for each column 2020-06-26 11:50:37 +01:00
Carol (Nichols || Goulding) 4df99f1a7c style: Enable the clippy warning to use `Self` when recommended
Fixes #158.
2020-06-25 07:38:58 -04:00
Carol (Nichols || Goulding) afcd1efd1e style: Unify lints everywhere
Then fix the failures, mostly by adding derives and then removing some
unneeded (cheap) clones.

Document places where we purposefully don't use the same lints.

Not unifying missing_docs.

👀 https://github.com/rust-lang/cargo/issues/5034
2020-06-25 07:28:42 -04:00
alamb 431787fb31 Merge remote-tracking branch 'origin/master' into alamb/fix-parquet-nulls 2020-06-24 11:29:07 -04:00
Andrew Lamb de600b7712
fix: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-24 09:44:08 -04:00
alamb 68ce351a3a refactor: remove direct parquet dependency from delorean_ingest 2020-06-23 16:58:31 -04:00
alamb c9b24f3762 fix: Correctly encode nulls in parquet files 2020-06-23 12:23:47 -04:00
Andrew Lamb 322a491b9d
perf: Improve line protocol --> parquet conversion performance by ~20% (#177)
* feat: benchmark for lp->parquet performance

* feat: improve parser performance by storing contiguous EscapedStr

* fix: remove all string copies during LP-Parquet conversion

* refactor: Implement from_str as From<&str> only

* refactor: implement Deref instead of as_str

* refactor: Remove ends_with because Deref now makes it work

* refactor: Eq can be derived

* refactor: Remove unused From implementation

* refactor: Replace single-character strings with chars as requested by clippy

Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@integer32.com>
2020-06-23 05:42:19 -04:00
Andrew Lamb 86a425e5ef
feat: Add support for parsing bool values in line protocol parser (#156)
* feat: Implement boolean support for the line protcol parser

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: fmt+clippy

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-22 16:58:38 -04:00
Carol (Nichols || Goulding) 1e341a7321 fix: Encode and decode string data as bytes
String data isn't guaranteed to be UTF-8
2020-06-22 15:32:14 -04:00
Edd Robinson 4bbeac7a1c refactor: extend packers 2020-06-22 18:56:17 +01:00
Edd Robinson 106bd69b5a feat: support converting from TSM->Parquet 2020-06-22 18:56:17 +01:00
Edd Robinson 49b5322487 feat: add resize_exact to packers 2020-06-22 11:25:17 +01:00
Edd Robinson c26ac10b3b refactor: update delorean_ingest/src/lib.rs
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
2020-06-22 11:24:29 +01:00
Edd Robinson 146000d55b refactor: update delorean_ingest/src/lib.rs
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
2020-06-22 11:24:29 +01:00
Edd Robinson cd435d9b51 refactor: update delorean_ingest/src/lib.rs
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
2020-06-22 11:24:29 +01:00
Edd Robinson ac7bb6bf68 refactor: make Packer generic 2020-06-22 11:24:29 +01:00
Andrew Lamb ae37548980
feat: Add support for parsing string values in line protocol parser (#155)
* feat: add debug logging on parser error

* feat: Add support for parsing string values in line protocol parser

* fix: Fix comment

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-18 12:44:17 -04:00
Andrew Lamb 3fac49d1ba
fix: encode timestamp values properly in parquet files (#166) 2020-06-18 12:24:55 -04:00
Andrew Lamb d9278263a7
feat: write multiple measurements to multiple parquet files (#138)
* feat: write to a directory of parquet files

* feat: change LineProtocolConverter to push style, move sampling there

* feat: full push mode, write to multiple measurements

* fix: clarify comments on finalize

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: clippy/fmt

* fix: remove whitespace

* fix: Apply suggestions from code review

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>

* fix: fmt

* fix: make it compile again

* fix: fixup comments

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>

* fix: remove unecessary debug implementation

* fix: cleaner comment

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>

* fix: clearer iterator name

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>

* fix: Apply suggestions from code review

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>

* fix: clean

* fix: make it compile

* fix: type fix

* fix: whitespace

* fix: more review comments

* fix: more review comments

* fix: code review comments + fmt

* fix: clippy

* fix: Use EscapedStr directly for performance

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
2020-06-12 17:19:35 -04:00
Andrew Lamb 0415b233ec
refactor: Instantiate the table writer on demand (#128)
* refactor: instantiate ParquetWriter on demand, prep for multi measurements

* fix: doc test

* fix: update names
2020-06-09 16:11:42 -04:00
Andrew Lamb 986e12d62a
refactor: Rename crate line_protocol_schema --> delorean_table_schema (#129)
* refactor: Rename crate line_protocol_schema --> delorean_table_schema

* fix: fmt
2020-06-09 11:56:16 -04:00
Andrew Lamb 8475b6d183
feat: Add parquet writer, hook up conversion in dstool (#124)
* feat: Add parquet writer, hook up conversion in dstool

* fix: use bigger executor for test

* fix: less cloning

* fix: make unsupported messages less pejorative

* fix: fmt

* fix: Rename writer and do not require std::File, add example

* fix: clippy and fmt

* fix: remove unnecessary module in end to end tests

* fix: remove strange use of tempfile

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: cleanup use

* fix: Use more specific error messages

* fix: comment tweak

* fix: touchup temp path creation

* fix: clippy!

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-08 16:25:24 -04:00
Andrew Lamb ca9f9d4cae
feat: Add column packing code (#114)
* feat: Add column packing code

* fix: remove dependency on assert_approx_equal in favor of delorean_test_helpers

* fix: Cleanups from pr comments

* fix: Apply suggestions from code review

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: more cleanup per code review

* fix: pr comments

* fix: remove explict string creation from caller

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-06 06:04:41 -04:00
Jake Goulding df39eca043 style: Apply standard lints across all crates 2020-06-05 17:02:54 -04:00
Andrew Lamb e43ab6dc31
fix(dstool): extract schema from a sample of input rather than the whole thing (#113)
* fix: extract schema from references

* fix: use a slice reference rather than iterator

* fix: fmt and clippy
2020-06-04 10:25:36 -04:00