Commit Graph

7931 Commits (3a33e806c7ceab9c31f7412b8ca556455f72c9a7)

Author SHA1 Message Date
Andrew Lamb a106e55fa6
feat: Add parquet metadata dumping (#159)
* feat: Add parquet metadata dumping

* fix: Update delorean_parquet/src/error.rs

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-18 18:34:49 -04:00
Andrew Lamb ae37548980
feat: Add support for parsing string values in line protocol parser (#155)
* feat: add debug logging on parser error

* feat: Add support for parsing string values in line protocol parser

* fix: Fix comment

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-18 12:44:17 -04:00
Andrew Lamb 3fac49d1ba
fix: encode timestamp values properly in parquet files (#166) 2020-06-18 12:24:55 -04:00
Andrew Lamb 91de50a3a7
refactor: Refactor convert command code to have a place for TSM (#164) 2020-06-18 09:57:54 -04:00
Andrew Lamb 2be21dab57
fix: Name benchmark group consistently (#161) 2020-06-17 20:01:17 -04:00
Andrew Lamb 94f1968deb
feat: Improve line protocol parser error recovery, avoid infinite loop (#152)
* feat: Improve line protocol parser error recovery, avoid infinite loop

feat: port splitLines logic to rust line protocol parser

fix: consume trailing optional whitespace after timestamp

test: Add tests for same

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-17 17:44:04 -04:00
Andrew Lamb cf248f2143
feat: upgrade to latest arrow / byteorder (#154) 2020-06-17 12:50:23 -04:00
Andrew Lamb 8f51b8a5c1
fix: Avoid hard coded length in doc example (#146) 2020-06-16 16:38:29 -04:00
Andrew Lamb 7190b07b83
fix: Add additional encoding thoughts to doc (#151) 2020-06-15 10:33:02 -04:00
Andrew Lamb abb3338483
test: add an end to end test for writing multiple parquet files (#145)
* test: add an end to end test for writing multiple parquet files

* fix: whitespace ocd
2020-06-15 07:12:16 -04:00
Andrew Lamb d9278263a7
feat: write multiple measurements to multiple parquet files (#138)
* feat: write to a directory of parquet files

* feat: change LineProtocolConverter to push style, move sampling there

* feat: full push mode, write to multiple measurements

* fix: clarify comments on finalize

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: clippy/fmt

* fix: remove whitespace

* fix: Apply suggestions from code review

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>

* fix: fmt

* fix: make it compile again

* fix: fixup comments

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>

* fix: remove unecessary debug implementation

* fix: cleaner comment

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>

* fix: clearer iterator name

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>

* fix: Apply suggestions from code review

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>

* fix: clean

* fix: make it compile

* fix: type fix

* fix: whitespace

* fix: more review comments

* fix: more review comments

* fix: code review comments + fmt

* fix: clippy

* fix: Use EscapedStr directly for performance

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
2020-06-12 17:19:35 -04:00
Jake Goulding c2c3e2f4c1
Merge pull request #143 from influxdata/more-lints
chore: Apply common lints to recently-created crates
2020-06-12 11:20:15 -04:00
Andrew Lamb 1e2cc310d4
refactor: cleanup trim_leading in the parser, add tests (#142) 2020-06-12 11:04:40 -04:00
Andrew Lamb 2c1180b27a
test: add some additional parser tests (#137) 2020-06-12 11:02:57 -04:00
Jake Goulding b00f3ee977 chore: Apply common lints to recently-created crates 2020-06-12 09:26:18 -04:00
Edd Robinson 82da779203
Merge pull request #135 from influxdata/er/tsm-reader-refactor
refactor: simplify TSMReader API
2020-06-11 19:44:53 +01:00
Carol (Nichols || Goulding) f69ddf9a73
Merge pull request #134 from influxdata/cn-object-store 2020-06-11 14:14:11 -04:00
Carol (Nichols || Goulding) d507713503 refactor: Switch from HashMap to BTreeMap 2020-06-11 13:44:04 -04:00
Edd Robinson 5ff6652cfc refactor: simplify TSMReader API
This commit simplifies the TSMReader API to reduce the amount of mutable
state, and simplify how it's used as an iterator.
2020-06-10 22:42:24 +01:00
Carol (Nichols || Goulding) e3b26c9961 test: Only run GCS and S3 tests if the env vars are set
I don't really like this because the tests will silently not be compiled
if you haven't set the environment variables, so you'd only notice you
weren't running the tests if you looked for those tests' output lines
and saw they weren't there.

Ideally, I'd like to print a warning, but this isn't possible because:

- Anything printed in tests doesn't show up by default
- Cargo's build scripts can't tell whether you're building as a
dependency or building for that crate's tests, so the warning would show
up even if you just depended on delorean_object_store
(https://github.com/rust-lang/cargo/issues/2549)
2020-06-10 17:26:28 -04:00
Carol (Nichols || Goulding) ea1471c503 feat: Add an in-memory object store 2020-06-10 17:23:52 -04:00
Carol (Nichols || Goulding) 4ad805863f feat: Make GCS list return a stream, by wrapping its still-sync API 2020-06-10 17:23:52 -04:00
Carol (Nichols || Goulding) d83c410a5c feat: Update to the released version of cloud-storage
My submitted API improvements got merged in!
2020-06-10 17:23:52 -04:00
Carol (Nichols || Goulding) 8c878cdfd3 docs: Update capabilities 2020-06-10 17:23:52 -04:00
Carol (Nichols || Goulding) fb5d68654d feat: Change AWS list to stream back batches of object names 2020-06-10 17:23:52 -04:00
Carol (Nichols || Goulding) d3283b1096 feat: Object storage in S3 and GCS 2020-06-10 17:23:52 -04:00
Andrew Lamb faf3f534ac
refactor: move all dstool code into delorean binary (#131)
* refactor: move all dstool code into delorean binary

* fix: Move code/mods to make it compile and run

* fix: warn if db dir does not exist

* refactor: Match argument subcommands w/ more idomatic  rust

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

fix: restore hyper logging

fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: update expected code

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-10 16:04:46 -04:00
Andrew Lamb 0415b233ec
refactor: Instantiate the table writer on demand (#128)
* refactor: instantiate ParquetWriter on demand, prep for multi measurements

* fix: doc test

* fix: update names
2020-06-09 16:11:42 -04:00
Andrew Lamb 1bc9517b5d
refactor: Move delorean server code into its own module (#130) 2020-06-09 12:28:56 -04:00
Andrew Lamb 986e12d62a
refactor: Rename crate line_protocol_schema --> delorean_table_schema (#129)
* refactor: Rename crate line_protocol_schema --> delorean_table_schema

* fix: fmt
2020-06-09 11:56:16 -04:00
Andrew Lamb f1a3058b24
feat: Add file / metadata inspection + dumping with dstool (#112)
* feat: Add file / metadata inspection + dumping

* fix: apply some PR review comments

* fix: apply suggestions from code review

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>

* feat: Add tests, rearrange code into modules, add gzip aware interface

* fix: fix comment and test

* fix: test output and fmt

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
2020-06-09 10:10:55 -04:00
Jake Goulding 9844cafc5d
refactor: Use SNAFU contexts more idiomatically (#127)
* refactor: Use SNAFU contexts more idiomatically

* fix: restore source error message

Co-authored-by: alamb <andrew@nerdnetworks.org>
2020-06-09 08:25:24 -04:00
Andrew Lamb 8475b6d183
feat: Add parquet writer, hook up conversion in dstool (#124)
* feat: Add parquet writer, hook up conversion in dstool

* fix: use bigger executor for test

* fix: less cloning

* fix: make unsupported messages less pejorative

* fix: fmt

* fix: Rename writer and do not require std::File, add example

* fix: clippy and fmt

* fix: remove unnecessary module in end to end tests

* fix: remove strange use of tempfile

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: cleanup use

* fix: Use more specific error messages

* fix: comment tweak

* fix: touchup temp path creation

* fix: clippy!

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-08 16:25:24 -04:00
Andrew Lamb ca9f9d4cae
feat: Add column packing code (#114)
* feat: Add column packing code

* fix: remove dependency on assert_approx_equal in favor of delorean_test_helpers

* fix: Cleanups from pr comments

* fix: Apply suggestions from code review

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: more cleanup per code review

* fix: pr comments

* fix: remove explict string creation from caller

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-06 06:04:41 -04:00
Jake Goulding 59c2e7b135
Merge pull request #121 from influxdata/common-lints
style: Apply standard lints across all crates
2020-06-05 22:27:02 -04:00
Andrew Lamb 2200def8ea
feat: Use rust nightly (#123) 2020-06-05 17:45:44 -04:00
Andrew Lamb 81810c2faa
fix: rename measurement.txt to measurement.lp for consistency (#122)
* fix: rename measurement.txt to measurement.lp for consistency

* fix: rename the file
2020-06-05 17:28:06 -04:00
Jake Goulding df39eca043 style: Apply standard lints across all crates 2020-06-05 17:02:54 -04:00
Andrew Lamb e0c38d0976
chore: Add test to check for tsm reading errors, update doc example (#117)
* chore: Add a test that decodes the entire tsm index

* fix: update test and change example to not use hard coded len

* fix: comment cleanup

* fix: clippy

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>

* fix: fmt/clippy after code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
2020-06-05 16:22:56 -04:00
Jake Goulding 68fb580b43
style: Re-enable the elided lifetimes lint and move generated types to their own crate (#119)
* refactor: rename the module containing generated types

The nested `delorean` was confusing anyway, and this will make more
sense when we extract a new crate.

* refactor: Move the generated types to their own crate

This allows us to have more lax warnings in that crate alone, keeping
the main crate more strict.

* style: Re-enable elided lifetimes lint in the main crate
2020-06-05 16:22:27 -04:00
Edd Robinson 82d554a288
Merge pull request #111 from influxdata/er-tsm-reader
feat: add InfluxDB 2.x TSM file reader
2020-06-04 16:04:39 +01:00
Edd Robinson 4201f7ebbd refactor: address PR feedback 2020-06-04 15:47:27 +01:00
Andrew Lamb e43ab6dc31
fix(dstool): extract schema from a sample of input rather than the whole thing (#113)
* fix: extract schema from references

* fix: use a slice reference rather than iterator

* fix: fmt and clippy
2020-06-04 10:25:36 -04:00
Edd Robinson 138ff7329d refactor: please the clippy gods 2020-06-04 14:36:43 +01:00
Edd Robinson aeeff5cfb7 docs: add some TSMReader documentation 2020-06-04 14:36:43 +01:00
Edd Robinson 887ffd5977 refactor: remove lifetime to make index re-usable 2020-06-04 14:36:43 +01:00
Edd Robinson 481ce5f136 refactor: use constants for block type 2020-06-04 14:36:43 +01:00
Edd Robinson 49c58f007e chore: ignore tsm files 2020-06-04 14:36:43 +01:00
Edd Robinson 76442b752a refactor: clippy 2020-06-04 14:36:43 +01:00
Edd Robinson e3db077121 feat: add API for series key information 2020-06-04 14:36:43 +01:00