Edd Robinson
cb3e948ca0
feat: TO REMOVE - TSM -> Arrow
2020-09-25 10:12:46 +01:00
Edd Robinson
b62810676d
feat: add support for merging blocks
2020-07-13 10:39:36 +01:00
Edd Robinson
4e66a48ba9
refactor: PR feedback
...
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-07-09 15:46:08 +01:00
Edd Robinson
fd3f482652
refactor: PR feedback
...
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-07-09 15:45:50 +01:00
Edd Robinson
3d0d24d6fb
refactor: PR feedback
...
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-07-09 15:45:41 +01:00
Edd Robinson
cc7e8e8da0
fix: ensure tables merged correctly
2020-07-08 22:57:15 +01:00
Edd Robinson
bd5d39f60c
refactor: address PR feedback
2020-07-08 22:57:15 +01:00
Edd Robinson
5755949c01
refactor: PR feedback
...
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-07-08 22:57:15 +01:00
Edd Robinson
da305596f9
refactor: PR feedback
...
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-07-08 22:57:15 +01:00
Edd Robinson
54a61b33fc
refactor: remove redundant block type
2020-07-08 22:57:15 +01:00
Edd Robinson
50ef521e6c
feat: add support for converting multiple TSM files
...
This commit extends the ingest crate to support converting multiple TSM
files to a single Parquet file by merging identical measurements across
the TSM files.
This does not yet support merging blocks that overlap.
2020-07-08 22:57:15 +01:00
Edd Robinson
fff5577efb
refactor: encapsulate mapping logic
...
This commit moves some of the TSM mapper logic that had leaked into the
TSM->Parquer converter back into the mapper. The refactor allows us to
make some previously public APIs private, whilst still providing a
reasonably flexible API.
2020-07-08 22:57:15 +01:00
Edd Robinson
2be6385ade
perf: drain block data more efficiently
...
This commit reduces copying of block data by replacing an inefficient
`remove` call on vectors by with an index tracking approach, leving the
original vectors in place.
It further refactors some of the mapping code DRYing things up.
It improves performance of the `map_field_columns` function by 48%.
```
time: [137.11 us 137.50 us 137.92 us]
change: [-49.095% -48.558% -48.033%] (p = 0.00 < 0.05)
Performance has improved.
```
2020-07-03 10:56:31 +01:00
Edd Robinson
08058c8b63
refactor: move mock decoder
2020-07-03 10:56:31 +01:00
Edd Robinson
b78b00d30c
refactor: update delorean_ingest/src/lib.rs
...
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-07-01 16:27:38 +01:00
Edd Robinson
d75ee0cd4d
refactor: update delorean_ingest/src/lib.rs
...
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-07-01 15:52:21 +01:00
Edd Robinson
b2addf614b
refactor: PR feedback
...
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2020-07-01 15:52:21 +01:00
Edd Robinson
55bf2a44be
test: exercise TSM converter
2020-07-01 15:52:21 +01:00
Edd Robinson
414029b96d
refactor: use BlockDecoder trait
2020-07-01 15:52:21 +01:00
Jake Goulding
b72767b695
refactor: use SNAFU more idiomatically in delorean_ingest
2020-06-26 13:26:51 -04:00
Jake Goulding
a169a80f33
refactor: No need to return &String
2020-06-26 13:12:19 -04:00
alamb
d4a2cf1bd8
fix: rename timestamp column "timestamp" -> "time" to be consistent
2020-06-26 08:26:16 -04:00
Edd Robinson
d15256e0e7
refactor: address PR feedback
2020-06-26 12:08:42 +01:00
Edd Robinson
99268f5260
test: add coverage for converting tsm file
2020-06-26 11:50:37 +01:00
Edd Robinson
9d889828c3
fix: ensure all rows are emitted for each column
2020-06-26 11:50:37 +01:00
Carol (Nichols || Goulding)
4df99f1a7c
style: Enable the clippy warning to use `Self` when recommended
...
Fixes #158 .
2020-06-25 07:38:58 -04:00
Carol (Nichols || Goulding)
afcd1efd1e
style: Unify lints everywhere
...
Then fix the failures, mostly by adding derives and then removing some
unneeded (cheap) clones.
Document places where we purposefully don't use the same lints.
Not unifying missing_docs.
👀 https://github.com/rust-lang/cargo/issues/5034
2020-06-25 07:28:42 -04:00
alamb
431787fb31
Merge remote-tracking branch 'origin/master' into alamb/fix-parquet-nulls
2020-06-24 11:29:07 -04:00
Andrew Lamb
de600b7712
fix: Apply suggestions from code review
...
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-24 09:44:08 -04:00
alamb
68ce351a3a
refactor: remove direct parquet dependency from delorean_ingest
2020-06-23 16:58:31 -04:00
alamb
c9b24f3762
fix: Correctly encode nulls in parquet files
2020-06-23 12:23:47 -04:00
Andrew Lamb
322a491b9d
perf: Improve line protocol --> parquet conversion performance by ~20% ( #177 )
...
* feat: benchmark for lp->parquet performance
* feat: improve parser performance by storing contiguous EscapedStr
* fix: remove all string copies during LP-Parquet conversion
* refactor: Implement from_str as From<&str> only
* refactor: implement Deref instead of as_str
* refactor: Remove ends_with because Deref now makes it work
* refactor: Eq can be derived
* refactor: Remove unused From implementation
* refactor: Replace single-character strings with chars as requested by clippy
Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@integer32.com>
2020-06-23 05:42:19 -04:00
Andrew Lamb
86a425e5ef
feat: Add support for parsing bool values in line protocol parser ( #156 )
...
* feat: Implement boolean support for the line protcol parser
* fix: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* fix: fmt+clippy
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-22 16:58:38 -04:00
Carol (Nichols || Goulding)
1e341a7321
fix: Encode and decode string data as bytes
...
String data isn't guaranteed to be UTF-8
2020-06-22 15:32:14 -04:00
Edd Robinson
4bbeac7a1c
refactor: extend packers
2020-06-22 18:56:17 +01:00
Edd Robinson
106bd69b5a
feat: support converting from TSM->Parquet
2020-06-22 18:56:17 +01:00
Edd Robinson
49b5322487
feat: add resize_exact to packers
2020-06-22 11:25:17 +01:00
Edd Robinson
c26ac10b3b
refactor: update delorean_ingest/src/lib.rs
...
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
2020-06-22 11:24:29 +01:00
Edd Robinson
146000d55b
refactor: update delorean_ingest/src/lib.rs
...
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
2020-06-22 11:24:29 +01:00
Edd Robinson
cd435d9b51
refactor: update delorean_ingest/src/lib.rs
...
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
2020-06-22 11:24:29 +01:00
Edd Robinson
ac7bb6bf68
refactor: make Packer generic
2020-06-22 11:24:29 +01:00
Andrew Lamb
ae37548980
feat: Add support for parsing string values in line protocol parser ( #155 )
...
* feat: add debug logging on parser error
* feat: Add support for parsing string values in line protocol parser
* fix: Fix comment
* fix: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-18 12:44:17 -04:00
Andrew Lamb
3fac49d1ba
fix: encode timestamp values properly in parquet files ( #166 )
2020-06-18 12:24:55 -04:00
Andrew Lamb
d9278263a7
feat: write multiple measurements to multiple parquet files ( #138 )
...
* feat: write to a directory of parquet files
* feat: change LineProtocolConverter to push style, move sampling there
* feat: full push mode, write to multiple measurements
* fix: clarify comments on finalize
* fix: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* fix: clippy/fmt
* fix: remove whitespace
* fix: Apply suggestions from code review
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
* fix: fmt
* fix: make it compile again
* fix: fixup comments
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
* fix: remove unecessary debug implementation
* fix: cleaner comment
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
* fix: clearer iterator name
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
* fix: Apply suggestions from code review
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
* fix: clean
* fix: make it compile
* fix: type fix
* fix: whitespace
* fix: more review comments
* fix: more review comments
* fix: code review comments + fmt
* fix: clippy
* fix: Use EscapedStr directly for performance
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
2020-06-12 17:19:35 -04:00
Andrew Lamb
0415b233ec
refactor: Instantiate the table writer on demand ( #128 )
...
* refactor: instantiate ParquetWriter on demand, prep for multi measurements
* fix: doc test
* fix: update names
2020-06-09 16:11:42 -04:00
Andrew Lamb
986e12d62a
refactor: Rename crate line_protocol_schema --> delorean_table_schema ( #129 )
...
* refactor: Rename crate line_protocol_schema --> delorean_table_schema
* fix: fmt
2020-06-09 11:56:16 -04:00
Andrew Lamb
8475b6d183
feat: Add parquet writer, hook up conversion in dstool ( #124 )
...
* feat: Add parquet writer, hook up conversion in dstool
* fix: use bigger executor for test
* fix: less cloning
* fix: make unsupported messages less pejorative
* fix: fmt
* fix: Rename writer and do not require std::File, add example
* fix: clippy and fmt
* fix: remove unnecessary module in end to end tests
* fix: remove strange use of tempfile
* fix: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* fix: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* fix: cleanup use
* fix: Use more specific error messages
* fix: comment tweak
* fix: touchup temp path creation
* fix: clippy!
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-08 16:25:24 -04:00
Andrew Lamb
ca9f9d4cae
feat: Add column packing code ( #114 )
...
* feat: Add column packing code
* fix: remove dependency on assert_approx_equal in favor of delorean_test_helpers
* fix: Cleanups from pr comments
* fix: Apply suggestions from code review
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* fix: more cleanup per code review
* fix: pr comments
* fix: remove explict string creation from caller
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-06 06:04:41 -04:00
Jake Goulding
df39eca043
style: Apply standard lints across all crates
2020-06-05 17:02:54 -04:00
Andrew Lamb
e43ab6dc31
fix(dstool): extract schema from a sample of input rather than the whole thing ( #113 )
...
* fix: extract schema from references
* fix: use a slice reference rather than iterator
* fix: fmt and clippy
2020-06-04 10:25:36 -04:00