Commit Graph

729 Commits (3cd7d2eda2d45f2aecae297062cf5c51eb78acd9)

Author SHA1 Message Date
Raphael Taylor-Davies 7debe94ee6 feat: add background task tracking (#655) 2021-02-11 10:30:19 +00:00
Jake Goulding 699f4a577f feat: Add an optional Flight client to the IOx client library 2021-02-10 10:30:05 -05:00
Raphael Taylor-Davies 143488fae9 feat: add WAL metadata endpoint (#724) 2021-02-08 16:21:34 +00:00
Raphael Taylor-Davies 29314a6118 feat: consistent global error handling and logging 2021-02-04 13:15:17 +00:00
Andrew Lamb d5ebf9c3da
chore: Update deps again (#738) 2021-02-04 06:02:05 -05:00
Jake Goulding a5e09366b0 feat: Export arrow-flight from arrow-deps 2021-02-03 09:56:56 -05:00
Carol (Nichols || Goulding) 0f8ef9c7d5
Merge branch 'main' into cn+jg/osp-types 2021-02-03 09:09:04 -05:00
Andrew Lamb abc26a33c1
chore: Update dependencies (again) (#718)
* chore: Update dependencies (again)

* refactor: update for changes in DataFusion API

* fix: fmt

* fix: clippy
2021-02-02 18:33:01 -05:00
Andrew Lamb 485a59b2f8
feat: Implement logfmt (Heroku) formatted log output (#716)
* feat: add option to output logs formatted via logfmt

* refactor: Apply suggestions from code review

Co-authored-by: Edd Robinson <me@edd.io>

* fix: add tests for span inclusion

* feat: Also log spans

* fix: bug in normalizer

Co-authored-by: Edd Robinson <me@edd.io>
2021-02-01 16:43:01 -05:00
Carol (Nichols || Goulding) ff6955a433 refactor: Extract a trait for ObjectStoreApi with associated path
This is the promised cleanup. This structure gets rid of a lot of
intermediate structures and encodes through associated types how the
object stores and path types are related.

The enums are still necessary to avoid having generics leak all over
the place, but the object store variants and path variants should always
match because they'll always come from the object store trait
implementations that use the associated types.
2021-02-01 14:56:47 -05:00
Andrew Lamb f3bd8bd0e3
chore: update deps (tokio 1.0 and ecosystem) (#707)
* chore: Update arrow + tokio deps

* chore: Use bleeding edge azure

* chore: Update aws + other deps

* fix: fmt

* fix: Switch to in-house version of routerify

* fix: Upgrade to hyper 0.14

The hyper::error module is now private; hyper::Error is the public
re-export

* fix: Upgrade cloud storage to get tokio upgrade

* fix: Upgrade open_telemetry

* fix: Do not call `panic::set_hook` during another panic

Doing so leads to a double panic which aborts the process.

* fix: new h2 error who dis

Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@integer32.com>
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
2021-01-29 16:11:55 -05:00
Andrew Lamb 8308fad188
chore: update arrow deps again (#699) 2021-01-26 07:55:30 -05:00
Andrew Lamb 9b6fbae7f5 chore: Bump arrow deps 2021-01-23 08:09:46 -05:00
Andrew Lamb a967e2f1dd
fix: disallow control characters in Database names (#684) 2021-01-21 17:55:55 -05:00
Andrew Lamb c50f9b1baf
Merge branch 'main' into alamb/underscore_in_bucket_names_2 2021-01-21 15:53:27 -05:00
Andrew Lamb 747b96d801
chore: Upgrade arrow dependencies, reduce duplication with upstream (#676) 2021-01-21 08:58:11 -05:00
Andrew Lamb a6b9ff9c91 fix: allow arbitrary characters in org/bucket names 2021-01-20 17:58:15 -05:00
Andrew Lamb 7969808f09
feat: Chunk Migration APIs and query data in the read buffer via SQL (#668)
* feat: Chunk Migration APIs and query data in the read buffer via SQL

* fix: Make code more consistent

* fix: fmt / clippy

* chore: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* refactor: Remove unecessary Result and make chunks() infallable

* chore: Apply more suggestions from code review

Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Edd Robinson <me@edd.io>
2021-01-19 13:28:26 -05:00
Andrew Lamb 71627120b9
refactor: consolidate line protocol schema creation into data_types and port code to use it (#663)
* refactor: consolidate line protocol schema creation into data_types, and port code to use it

refactor: Port mutable buffer to use SchemaBuilder

* fix: doctest

* refactor: remove unecessary clippyisms

* docs: Improve comments via suggestions from code review

Co-authored-by: Edd Robinson <me@edd.io>

* refactor: use more idomatic try_ naming and TryInto trait

* docs: Change from line protocol data model to InfluxDB data model

* refactor: rename LP --> Influx in code

* feat: add support for UInteger type

Co-authored-by: Edd Robinson <me@edd.io>
2021-01-15 17:29:30 -05:00
Carol (Nichols || Goulding) 813092649d fix: Make file behave the same as other object stores with paths 2021-01-15 10:25:05 -05:00
Dom a2c0434554
Merge branch 'main' into dom/iox-api-client 2021-01-14 14:35:17 +00:00
Paul Dix 9377dc1943 chore: address pr feedback 2021-01-13 18:15:35 -05:00
Paul Dix 3b1f045f44 feat: add serialization to wal buffer segments
Adds serialization with compression and checksum for WAL buffer segments.

This required a weird structure where the flatbuffer bytes of ReplicatedWrite were kept as a raw payload. I did this because otherwise each of the replicated writes would have been rebuilt in the segment.

The other thing that isn't ideal is that deserializing a segment actually marshals it into a Rust struct as opposed to keeping the entire thing as raw flatbuffers. We could update this later to have a concept of an open segment (regular rust stuct) and closed segments that are just the flatbuffers.
2021-01-13 18:15:34 -05:00
Dom 5647bfeb6f refactor: error handling & typed errors
Refactors the API method errors.

The user of the API client needs to be able to distinguish between various error
states when an API request fails. The most ergonomic way of exposing this
information is by returning an error enum that is specific to each API method
(or at least the important ones with well defined failure modes) - currently
only the `create_database()` method has significant error states, so this is
the only one with a specific error type in this impl.

This change defines a bunch of API error codes in the API client, adds them to
the IOx API error response body, and maps them in the API client. Due to error
wrapping the error code mapping in the IOx server is less exhaustive than I had
hoped however.
2021-01-13 17:32:12 +00:00
Andrew Lamb 38abe9735f
Merge branch 'main' into dom/iox-api-client 2021-01-12 13:19:49 -05:00
Andrew Lamb 2938c8f8fc
feat: implement chunk listing and snapshotting in mutable buffer (#641)
* feat: implement chunk listing and snapshotting in mutable buffer

* fix: update to use latest version of string interner and remove custom clone

* docs: fix comment
2021-01-12 12:46:18 -05:00
Dom e06b989fa6
Merge branch 'main' into dom/iox-api-client 2021-01-12 17:10:51 +00:00
Andrew Lamb c1a7778d85
refactor: move id and deps out of query crate (#646) 2021-01-12 11:47:43 -05:00
Dom 62349edb94 feat: create IOx API client
Initialises a new library crate and implements a basic IOx API client.

The API client supports:
	- ping
	- create database

Care has been taken to abstract away the underlying HTTP client used
(reqwest) and avoid leaking it into the public API (error types is a
common leak!) This makes updating the HTTP client and/or swapping it for
something else a backwards compatible change for end users of the crate.

Outstanding items:
	- move shared API types into a sensible location
	- discriminate between various IOx error responses

The former doesn't need doing until we publish the crate and will likely
be rather invasive / conlict prone so aiming to merge this PR and then
move things around in a follow-up.

The latter would allow us to expose error conditions to the user such
that they can take actions to remidy the situation / know if the request
can or should be retried / etc. Currently we expose a string error
message when requests fail, requiring string matching and/or passing the
string higher in the stack (and thus punting the problem to the caller).
It would be very nice to have typed errors, but a detail I have left for
later.
2021-01-12 16:38:33 +00:00
Dom bdc832d040 refactor: replace config system with structopt
Replaces the hand-rolled config system with a StructOpt managed config struct.

I've got most of it ported across, but the interaction between all the logging
config bits is complex! I've left what is there and hooked in the value from
the config struct (which directly replaces the env var in usage, as it also
sources from the env).
2021-01-11 18:43:14 +00:00
Carol (Nichols || Goulding) b66ad643d5 refactor: Extract panic logging to its own crate for ease of reuse 2021-01-08 12:36:56 -05:00
Edd Robinson 4ce6821d90 feat: implement table_names on 2021-01-08 16:19:19 +00:00
Karsten Jeschkies 2cd383af6f feat: Azure support for object store
Closes #528

This patch adds support for Microsfot Azure Blob storage. The
implementations requires an account, a key and container name. They can
be configured via the environment variables `AZURE_STORAGE_ACCOUNT`,
`AZURE_STORAGE_MASTER_KEY` and `AZURE_STORAGE_CONTAINER`.
2021-01-08 16:27:17 +01:00
Andrew Lamb 8219403fab
feat: Instantiate ReadBuffer as part of server creation (#620)
* feat: Instantiate ReadBuffer as part of server creation

* refactor: remove Store from read_buffer
2021-01-07 13:25:42 -05:00
Andrew Lamb c672bb341d
feat: Extract SQL planning out of databases (#618) 2021-01-07 13:13:30 -05:00
Carol (Nichols || Goulding) 18ee1b561b feat: Use ObjectStorePath everywhere to feel out the API needed 2021-01-07 10:48:22 -05:00
Paul Dix cf56c1ba9e feat: Add object store path abstraction 2021-01-07 09:19:50 -05:00
Paul Dix 4b40d11e60 feat: Add list_with_delimiter to object store
This adds a new function list_with_delimiter to the object store. This commit contains just the implementation for S3, leaving the others to be completed in follow on commits.

This has a fixed delimiter to ensure a directory structure is created. This delimiter should be dependent on platform and which object store is used. For any of the cloud object stores or in memory, the delimiter should be /. For the future disk based implementation it should be dependendent on if you're running on Windows or Linux.

I didn't use Stream for the return type because I found it difficult to work with and I don't think it actually added anything useful. The return ListResult struct has the next token and I prefer that the caller explicitly makes calls that go over the network so they're more aware of what's going on, where a Stream abstracts that away so it's hidden behind the scenes. We can easilsy add a Stream based version on top of this existing API if we want.
2021-01-07 09:19:15 -05:00
Andrew Lamb 9f0ff678f1
feat: Formalizes the config system for IOx, including tests (#608)
* feat: Create configuration system, port IOx to use it

* docs: Apply suggestions from code review

Co-authored-by: Paul Dix <paul@influxdata.com>

* fix: fix test for setting values

Co-authored-by: Paul Dix <paul@influxdata.com>
2020-12-31 07:02:31 -05:00
Paul Dix db6ce0503c
chore: Benchmark ReplicatedWrite (#607)
This adds benchmarks to the data_types crate for ReplicatedWrite. This is the first in a series to test benchmarking Flatbuffers vs. JSON for the WAL Segment format.
2020-12-30 12:44:32 -05:00
Andrew Lamb 0d0ec0ce69
chore: Upgrade arrow dependencies (#603)
* chore: Update arrow dependencies to latest

* refactor: Update code to conform to new arrow api
2020-12-28 16:08:09 -05:00
Andrew Lamb 5fa77c32cc
feat: Add "Chunks" to the Mutable Buffer (#596)
* refactor: Update docs, remove unused field

* refactor: rename partition -> chunk

* feat: Introduce new partition, which is a holder for Chunks

* refactor: Remove use of wal from mutable database

* refactor: cleanups, remove last direct use of chunks

* fix: delete old benchmarks

* fix: clippy sacrifice

* docs: tidy up comments

* refactor: remove unused error types

* chore: remove commented out tests
2020-12-28 07:10:25 -05:00
Paul Dix 1d200c5c77 chore: move http API over to Routerify
This moves the HTTP API over to Routerify, which has the basic route parsing logic that will enable the API design for IOx.

I had a little trouble with the error handling in Routerify so I ended up creating a macro for constructing error responses in the HTTP API. I'm not sure what I think of this pattern so I'm interested in what others think. Another option would be to have two functions for each API endpoint. One which is x_handler with a Routerify function signature. Then another which is just x that has the Result<Response<Body>, ApplicationError> return type, which would make using the ? operator work in those functions. That would eliminate the need for the return_err macro.

I'm happy to refactor to that if people prefer it.
2020-12-24 16:45:20 -06:00
Edd Robinson 199ba68769 refactor: rename segment_store crate to read_buffer 2020-12-22 21:26:04 +00:00
Andrew Lamb 48c43b136c
refactor: rename write_buffer --> mutable_buffer (#595)
* refactor: git mv write_buffer mutable_buffer

* refactor: update crate name references

* refactor: update some more references
2020-12-22 10:49:53 -05:00
Andrew Lamb bb96142564
chore: Update arrow dependencies, remove custom min/max implementation (#585)
* chore: Update arrow dependency

* fix: Update code for changes in datafusion

* fix: use arrow version of min_boolean
2020-12-21 12:31:39 -05:00
Edd Robinson 7a40bd5971 perf: use hashbrown raw_entry API
This commit swaps out the std library `HashMap` for the implementation
provided by the `hashbrown` crate. Not only does this allow us to use
the raw entry API, but it increases performance through the use of a
faster non-crytographically safe hashing function. We do not need an
expensive hash function for this code path.

Benchmark improvements are roughly 20-40%.

Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_20_columns_2_rows_500000
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_20_columns_2_rows_500000: Warming up for 3.0000 s
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_20_columns_2_rows_500000: Collecting 100 samples in estimated 6.5961 s (400 iterations)
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_20_columns_2_rows_500000: Analyzing
segment_read_group_all_time_vary_cardinality/cardinality_20_columns_2_rows_500000
                        time:   [16.502 ms 16.527 ms 16.558 ms]
                        thrpt:  [1.2079 Kelem/s 1.2101 Kelem/s 1.2120 Kelem/s]
                 change:
                        time:   [-40.808% -40.616% -40.428%] (p = 0.00 < 0.05)
                        thrpt:  [+67.863% +68.394% +68.942%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_200_columns_2_rows_500000
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_200_columns_2_rows_500000: Warming up for 3.0000 s
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_200_columns_2_rows_500000: Collecting 100 samples in estimated 5.0698 s (300 iterations)
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_200_columns_2_rows_500000: Analyzing
segment_read_group_all_time_vary_cardinality/cardinality_200_columns_2_rows_500000
                        time:   [16.531 ms 16.542 ms 16.555 ms]
                        thrpt:  [12.081 Kelem/s 12.090 Kelem/s 12.099 Kelem/s]
                 change:
                        time:   [-43.304% -43.047% -42.810%] (p = 0.00 < 0.05)
                        thrpt:  [+74.856% +75.582% +76.378%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_2000_columns_2_rows_500000
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_2000_columns_2_rows_500000: Warming up for 3.0000 s
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_2000_columns_2_rows_500000: Collecting 100 samples in estimated 5.2590 s (300 iterations)
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_2000_columns_2_rows_500000: Analyzing
segment_read_group_all_time_vary_cardinality/cardinality_2000_columns_2_rows_500000
                        time:   [17.497 ms 17.568 ms 17.648 ms]
                        thrpt:  [113.33 Kelem/s 113.84 Kelem/s 114.30 Kelem/s]
                 change:
                        time:   [-38.468% -38.188% -37.880%] (p = 0.00 < 0.05)
                        thrpt:  [+60.978% +61.782% +62.518%]
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  12 (12.00%) high severe
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_20000_columns_3_rows_500000
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_20000_columns_3_rows_500000: Warming up for 3.0000 s
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_20000_columns_3_rows_500000: Collecting 100 samples in estimated 7.0471 s (300 iterations)
Benchmarking segment_read_group_all_time_vary_cardinality/cardinality_20000_columns_3_rows_500000: Analyzing
segment_read_group_all_time_vary_cardinality/cardinality_20000_columns_3_rows_500000
                        time:   [23.305 ms 23.320 ms 23.336 ms]
                        thrpt:  [857.05 Kelem/s 857.64 Kelem/s 858.20 Kelem/s]
                 change:
                        time:   [-35.933% -35.778% -35.648%] (p = 0.00 < 0.05)
                        thrpt:  [+55.396% +55.711% +56.087%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_2_rows_500000
Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_2_rows_500000: Warming up for 3.0000 s
Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_2_rows_500000: Collecting 100 samples in estimated 6.8058 s (300 iterations)
Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_2_rows_500000: Analyzing
segment_read_group_all_time_vary_columns/cardinality_20000_columns_2_rows_500000
                        time:   [22.475 ms 22.540 ms 22.622 ms]
                        thrpt:  [884.10 Kelem/s 887.31 Kelem/s 889.87 Kelem/s]
                 change:
                        time:   [-34.249% -34.051% -33.768%] (p = 0.00 < 0.05)
                        thrpt:  [+50.984% +51.633% +52.089%]
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) high mild
  9 (9.00%) high severe
Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_3_rows_500000
Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_3_rows_500000: Warming up for 3.0000 s
Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_3_rows_500000: Collecting 100 samples in estimated 7.0631 s (300 iterations)
Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_3_rows_500000: Analyzing
segment_read_group_all_time_vary_columns/cardinality_20000_columns_3_rows_500000
                        time:   [23.683 ms 23.724 ms 23.779 ms]
                        thrpt:  [841.08 Kelem/s 843.02 Kelem/s 844.49 Kelem/s]
                 change:
                        time:   [-34.575% -34.419% -34.241%] (p = 0.00 < 0.05)
                        thrpt:  [+52.070% +52.482% +52.847%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe
Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_4_rows_500000
Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_4_rows_500000: Warming up for 3.0000 s
Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_4_rows_500000: Collecting 100 samples in estimated 5.1007 s (200 iterations)
Benchmarking segment_read_group_all_time_vary_columns/cardinality_20000_columns_4_rows_500000: Analyzing
segment_read_group_all_time_vary_columns/cardinality_20000_columns_4_rows_500000
                        time:   [25.379 ms 25.456 ms 25.545 ms]
                        thrpt:  [782.93 Kelem/s 785.67 Kelem/s 788.06 Kelem/s]
                 change:
                        time:   [-37.254% -36.988% -36.701%] (p = 0.00 < 0.05)
                        thrpt:  [+57.981% +58.699% +59.373%]
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) high mild
  8 (8.00%) high severe

Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_250000
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_250000: Warming up for 3.0000 s
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_250000: Collecting 100 samples in estimated 5.7756 s (400 iterations)
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_250000: Analyzing
segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_250000
                        time:   [14.404 ms 14.411 ms 14.419 ms]
                        thrpt:  [1.3870 Melem/s 1.3878 Melem/s 1.3885 Melem/s]
                 change:
                        time:   [-28.007% -27.893% -27.798%] (p = 0.00 < 0.05)
                        thrpt:  [+38.500% +38.683% +38.903%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_500000
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_500000: Warming up for 3.0000 s
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_500000: Collecting 100 samples in estimated 6.9256 s (300 iterations)
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_500000: Analyzing
segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_500000
                        time:   [23.191 ms 23.299 ms 23.419 ms]
                        thrpt:  [854.02 Kelem/s 858.42 Kelem/s 862.40 Kelem/s]
                 change:
                        time:   [-32.647% -32.302% -31.912%] (p = 0.00 < 0.05)
                        thrpt:  [+46.868% +47.715% +48.471%]
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  11 (11.00%) high severe
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_750000
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_750000: Warming up for 3.0000 s
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_750000: Collecting 100 samples in estimated 6.1544 s (200 iterations)
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_750000: Analyzing
segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_750000
                        time:   [30.813 ms 30.859 ms 30.916 ms]
                        thrpt:  [646.92 Kelem/s 648.10 Kelem/s 649.07 Kelem/s]
                 change:
                        time:   [-37.155% -36.779% -36.436%] (p = 0.00 < 0.05)
                        thrpt:  [+57.322% +58.174% +59.121%]
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_1000000
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_1000000: Warming up for 3.0000 s
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_1000000: Collecting 100 samples in estimated 7.8548 s (200 iterations)
Benchmarking segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_1000000: Analyzing
segment_read_group_all_time_vary_rows/cardinality_20000_columns_2_rows_1000000
                        time:   [39.303 ms 39.349 ms 39.405 ms]
                        thrpt:  [507.55 Kelem/s 508.27 Kelem/s 508.86 Kelem/s]
                 change:
                        time:   [-36.857% -36.699% -36.576%] (p = 0.00 < 0.05)
                        thrpt:  [+57.669% +57.975% +58.371%]
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  8 (8.00%) high mild
  6 (6.00%) high severe
2020-12-17 17:15:49 +00:00
Edd Robinson 0d60102c74 feat: make group keys comparable and results sortable
This commit provides functionality on top of the `GroupKey` type (a
vector of materialised values), which allows them to be comparable by
implementing `Ord`.

Then, using the `permutation` crate, it is possible sort all rows in a
result set based on the group keys, which will be useful for testing.
2020-12-17 11:10:26 +00:00
Andrew Lamb a6d2c13888
chore: Update arrow + other depenencies (#540)
* chore: Update arrow + other depenencies

* chore: Update write_buffer and query crate
2020-12-15 08:46:27 -05:00
Dom 2d29b985b4 chore(deps): remove env_logger from ingest
Already using tracing!
2020-12-14 12:06:53 +00:00
Dom 60ee7e1dbb chore(deps): remove unused env_logger 2020-12-14 12:06:53 +00:00
Dom 9d7389dec2 feat(tracing): add Jaeger tracing sink
Adds telemetry / tracing with support for a Jaeger backend, and changes the
logger from env_logger to a tracing subscriber to collect the log entries.

Events are batched and then emitted asynchronosuly via UDP to the Jaeger
collector using the tokio runtime. There's a bunch of settings (env
vars) related to batch sizes and flush frequency etc - they're all using
their default values at the moment (if it ain't broke...) See the docs
for more info:

    https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/sdk-environment-variables.md#opentelemetry-environment-variable-specification

This is only part 1 of telemetry - it does NOT propagate traces across RPC
boundaries as we're still defining how all this should work. I've created #541
to track this.

Closes #202 and closes #203.
2020-12-14 12:06:52 +00:00
Edd Robinson 5e138bcded refactor: return groups as vectors 2020-12-10 15:15:34 +00:00
Edd Robinson fe27690ca8 test: add benchmarks for specific read_group path
This commit adds benchmarks to track the performance of `read_group`
when aggregating across columns that support pre-computed bit-sets of
row_ids for each distinct column value. Currently this is limited to the
RLE columns, and only makes sense when grouping by low-cardinality
columns.

The benchmarks are in three groups:

* one group fixes the number of rows in the segment but varies the
  cardinality (that is, how many groups the query produces).
* another groups fixes the cardinality and the number of rows but varies
  the number of columns needed to be grouped to produce the fixed
  cardinality.
* a final group fixes the number of columns being grouped, the
  cardinality, and instead varies the number of rows in the segment.

Some initial results from my development box are as follows:

```
                        time:   [51.099 ms 51.119 ms 51.140 ms]
                        thrpt:  [39.108 Kelem/s 39.125 Kelem/s 39.140
Kelem/s]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

segment_read_group_pre_computed_groups_no_predicates_group_cols/1
                        time:   [93.162 us 93.219 us 93.280 us]
                        thrpt:  [10.720 Kelem/s 10.727 Kelem/s 10.734
Kelem/s]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_group_cols/2
                        time:   [571.72 us 572.31 us 572.98 us]
                        thrpt:  [3.4905 Kelem/s 3.4946 Kelem/s 3.4982
Kelem/s]
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe
Benchmarking
segment_read_group_pre_computed_groups_no_predicates_group_cols/3:
Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to
increase target time to 8.9s, enable flat sampling, or reduce sample
count to 50.
segment_read_group_pre_computed_groups_no_predicates_group_cols/3
                        time:   [1.7292 ms 1.7313 ms 1.7340 ms]
                        thrpt:  [1.7301 Kelem/s 1.7328 Kelem/s 1.7349
Kelem/s]
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  1 (1.00%) high severe

segment_read_group_pre_computed_groups_no_predicates_rows/250000
                        time:   [562.29 us 565.19 us 568.80 us]
                        thrpt:  [439.52 Melem/s 442.33 Melem/s 444.61
Melem/s]
Found 18 outliers among 100 measurements (18.00%)
  6 (6.00%) high mild
  12 (12.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_rows/500000
                        time:   [561.32 us 561.85 us 562.47 us]
                        thrpt:  [888.93 Melem/s 889.92 Melem/s 890.76
Melem/s]
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_rows/750000
                        time:   [573.75 us 574.27 us 574.85 us]
                        thrpt:  [1.3047 Gelem/s 1.3060 Gelem/s 1.3072
Gelem/s]
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) high mild
  8 (8.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_rows/1000000
                        time:   [586.36 us 586.74 us 587.19 us]
                        thrpt:  [1.7030 Gelem/s 1.7043 Gelem/s 1.7054
Gelem/s]
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe
```
2020-12-10 15:15:34 +00:00
Edd Robinson 8c45170a15 feat: read group aggregates on RLE columns 2020-12-10 15:15:34 +00:00
Paul Dix fa3ecbd4ed
feat: Implement write buffer to Parquet snapshotting (#526)
* feat: Implement write buffer to Parquet snapshotting

This introduces snapshot to the server packages to manage snapshotting. It also introduces a new trait for representing a Partition. There is a very crude API wired up in http_routes for testing purposes. Follow on work will bring the server package into http_routes and rework the snapshot API.
2020-12-08 14:20:43 -05:00
Dom c3a0e893ae test: use flate2 2020-12-01 11:01:10 +00:00
Dom 867aba847a perf(convert): use flate2 for gzip decompression
Switches from `libflate` to `flate2` for the top-level commands (specifically
TSM conversion).
2020-11-30 15:18:25 +00:00
Andrew Lamb 3a9ee88f00
chore: update to latest version of arrow + update code (#486)
* chore: update to latest version of arrow + update code

* chore: Update rust toolchain to match arrow

* fix: clippy
2020-11-25 14:46:35 -05:00
Matt Freitas-Stavola 7e2df1fc59
chore(server): add logs for dropped WAL segments (#478)
* chore(server): add logs for dropped WAL segments

Added logging for dropped writes and old segments in rollover scenarios

Also including a dep on tracing and dev-dep on test_helpers

Refs: #466

* chore(server): Add more context to logs

Minor cleanup around remove_oldest_segment usage

Suggestions from @alamb's review
2020-11-24 16:37:09 -05:00
Andrew Lamb cdb26e60e4
refactor: rename `storage` crate to `query` to better reflect what it is (#475)
* refactor: rename storage --> query

* refactor: update a few more referenes
2020-11-24 14:19:29 -05:00
Paul Dix 5101e52434
Merge pull request #464 from influxdata/pd-wal_buffer-main
feat: Implement WAL in-memory buffer
2020-11-20 11:16:30 -05:00
Paul Dix 0deee2c0db feat: Implement WAL in-memory buffer
This splits the cluster package out into server and buffer modules. The WAL buffer is in-memory and split into segments. Follow on commits will implement it in the server and add persistence to object storage.
2020-11-19 19:35:17 -05:00
Andrew Lamb cad5f9166b
feat: Port Duration and Window logic to support window aggregates (#460)
* feat: Port enough of Window and Duration to implement window_bounds

* fix: clippy

* fix: Add a few more source links

* fix: Eust --> Rust in comments :(

* fix: add comments about remainder, and add test demonstraitng behavior

* fix: Apply suggestions from code review
2020-11-18 09:49:59 -05:00
Andrew Lamb 831a0875d6
chore: update to latest arrow + Rust nightly-2020-11-14 (#454)
* chore: update to latest arrow + Rust nightly-2020-11-14

* chore: update ci

* fix: update for clippy lints

* fix: Allow redundant_field_names in generated types crate

* fix: clippy about try_for_each

* fix: clippy uneeded-collect

* fix: clippy about default values

* fix: clippy mathces --> matches!

* fix: clippy sort --> sort_by_key

* fix: clippy about default values again
2020-11-16 11:48:42 -05:00
Andrew Lamb 2fa0e03162
fix: Use datafusion optimizer in IOx query plans (#439)
* chore: update arrow dep to 8e4d9ebef3

* fix: checkin Cargo.lock

* fix: Enable datafusion optimizer, use display_indent_schema
2020-11-11 18:06:21 -05:00
Edd Robinson 0958849956 chore: rename the segment store crate 2020-11-10 16:35:17 +00:00
Edd Robinson ab458b5f17 refactor: address PR feedback 2020-11-06 17:28:27 +00:00
Andrew Lamb 5bb530ccc6
refactor: rename tsm --> influxdb_tsm (#418) 2020-11-05 14:35:38 -05:00
Andrew Lamb b745a180a4
refactor: rename delorean --> InfluDB IOx (#417) 2020-11-05 13:51:04 -05:00
Andrew Lamb a52e0001c5
refactor: rename all crates that start with`delorean_` in preparation for rename (#415)
* refactor: rename delorean_cluster --> cluster

* refactor: rebane delorean_generated_types --> generated_types

* refactor: rename delorean_write_buffer --> write_buffer

* refactor: rename delorean_ingest --> ingest

* refactor: rename delorean_storage --> storage

* refactor: rename delorean_tsm --> tsm

* refactor: rename delorean_test_helpers --> test_helpers

* refactor: rename delorean_arrow --> arrow_deps

* refactor: rename delorean_line_parser --> influxdb_line_protocol
2020-11-05 13:44:36 -05:00
Andrew Lamb 9df6c24493
refactor: rename delorean_mem_qe --> mem_qe (#414) 2020-11-05 09:36:46 -05:00
Andrew Lamb 4f348836fe
refactor: remove delorean_parquet by compining with delorean_ingest (#412) 2020-11-05 09:29:59 -05:00
Andrew Lamb ff824a5477
refactor: rename delorean_wal --> wal, conslidate wal_writer (#411) 2020-11-05 09:25:29 -05:00
Andrew Lamb a3b88d5506
refactor: rename delorean_object_store --> object_store (#413) 2020-11-05 08:56:30 -05:00
Andrew Lamb 8399d2a159
refactor: rename delorean_table to packers (#409) 2020-11-05 08:52:22 -05:00
Andrew Lamb 075ba0d8d1
refactor: remove delorean_table_schema crate and fold it into data_types (#408) 2020-11-05 06:17:20 -05:00
Carol (Nichols || Goulding) 7d25dc8487 fix: Remove unused arrow dependency in delorean_ingest
This wasn't really causing any problems, just confusion, because the old
arrow and its deps were in the Cargo.lock.
2020-11-04 15:34:34 -05:00
Andrew Lamb bf0c58698e
refactor: rename delorean_data_types crate to data_type (#407)
* refactor: rename delorean_data_types crate to data_type - #401

* fix: fmt
2020-11-04 12:33:41 -05:00
Andrew Lamb 9f36914351
chore: Upgrade version of Arrow / DataFusion (3 of 3) + update code for new interfaces (#395) 2020-11-02 11:20:44 -05:00
Paul Dix 1e966b5153 feat: implement API for storing the server configuration in object storage
This adds basic API calls for persisting and loading the server configuratioon of database rules and host groups to and from object storage. It stores all the data in a single JSON file.
2020-10-26 13:43:43 -06:00
Andrew Lamb ef501871bb
feat: remove partition_store (#387) 2020-10-26 14:39:38 -04:00
Andrew Lamb 4e1e8dbf79
chore: Upgrade version of Arrow/DataFusion (2 of 3) (#391)
* chore: Upgrade version of Arrow/DataFusion (2 of 3)

* fix: Fixup error type usage and use async stream interface

* fix: post merge fixups
2020-10-26 13:49:16 -04:00
Andrew Lamb 88b9f43110
chore: Upgrade version of Arrow/DataFusion (1 of 3) (#390)
* chore: Upgrade version of Arrow/DataFusion

* fix: update code for deps
2020-10-26 11:46:02 -04:00
Andrew Lamb 1004854403
refactor: remove uneeded dependencies, switch to tracing from log (#388) 2020-10-26 06:15:47 -04:00
Andrew Lamb 0ef76db208
feat: implement series_query for write buffer database, tests for same (#360)
* feat: implement series_query for write buffer database, tests for same

* fix: fixup comments

* fix: sort field columns too
2020-10-15 17:23:14 -04:00
Paul Dix 262a988207
Merge pull request #357 from influxdata/pd-cluster_replicate
chore: refactor cluster to use in memory write buffer
2020-10-14 09:43:02 -04:00
Paul Dix 9a345e226c chore: refactor cluster to use in memory write buffer
This refactors cluster to use the in memory write buffer. It removes the injected DatabaseStore as it is no longer needed.
2020-10-14 08:36:49 -04:00
Edd Robinson 6091963d50 test: skip NaN test for now 2020-10-14 13:21:15 +01:00
Edd Robinson 74ed1904c9 feat: fixed encoding for non-null numerics 2020-10-14 13:18:42 +01:00
Andrew Lamb 206df6a325
feat: implement data fusion execution and conversion to series sets (#353) 2020-10-13 16:53:00 -04:00
Paul Dix a80eb0fed3 feat: Store replicated writes
This commit refactors the flatbuffers data types from the wal to a new crate where they can be used by storage, write buffer, and cluster. It also refactors cluster to move the configuration types out to the data types crate so they can be used across storage and elsewhere.

Finally, it adds a new method to store replicated writes on a database in the database trait and implements it.
2020-10-11 15:45:08 -04:00
Paul Dix 996f8905b6 feat: Implement partition templates and key generation
This commit implements partition templates as a struct that can be serialized and deserialzed. It is comprised of parts that can include the table name, a column name and its value, a formatted time, or a string column and regex captures of its value.
2020-10-10 11:32:17 -04:00
Paul Dix cceeebb317
Merge pull request #342 from influxdata/pd-cluster-updates
feat: Update cluster with replication and subscriptions
2020-10-09 07:41:32 -04:00
Andrew Lamb 2b8c04f2b4
chore: Update arrow (again) to pick up latest changes to datafusion (#345) 2020-10-09 07:17:02 -04:00
Andrew Lamb a72e608810
feat: enable simd in arrow (#343) 2020-10-08 11:21:22 -04:00
Paul Dix 05dcbd7236 feat: Update cluster with replication and subscriptions
This updates cluster so that the concept of replication and subscriptions for handling queries are separated. It also adds flatbuffer structure that can be used as a common format for replication.
2020-10-08 08:40:13 -04:00
Andrew Lamb bc5378c7fe
chore: Update arrow to latest version (#335)
* chore: Update arrow to latest version

* fix: Updates needed by new version of datafusion
2020-10-02 14:46:07 -04:00
Andrew Lamb ff29610e44
refactor: Switch back to https://github.com/apache/arrow (#333) 2020-10-01 16:57:12 -04:00
Andrew Lamb 2b98da593b
feat: write_database support for predicates (#326)
* feat: write_database support for predicates

* fix: temporarily pull in arrow fork to pick up fix for ARROW-10136

* fix: Update mutex usage based on PR feedback

* fix: more mutex polish and use OptionExt

* fix: update comments

* fix: rust-fu the table lookup

* fix: update docs

* fix: more idomatic rust types

* fix: better usage of reference types
2020-10-01 14:34:53 -04:00
Edd Robinson a2287acb7c
Merge pull request #330 from influxdata/er/feat/segment-store-shell
feat: Segment Store shell
2020-10-01 14:01:45 +01:00
Edd Robinson bd6b0db691 refactor: address PR feedback 2020-10-01 13:13:32 +01:00
Paul Dix fdc86fd186
feat: add some initial framework for clustering (#329) 2020-09-30 14:41:42 -04:00
Andrew Lamb 8a14896487
chore: update version of datafusion (#324)
* chore: update version of datafusion

* chore: Update interfaces to be async
2020-09-30 08:02:15 -04:00
Edd Robinson 2470bdb975 feat: segment store shell 2020-09-30 11:25:59 +01:00
Andrew Lamb da5c74d3c6
feat: storage interface plans + executor (#318)
* feat: storage interface plans + executor

* refactor: less `expect`

* fix: use more idomatic rust From
2020-09-28 11:41:10 -04:00
Andrew Lamb 0236522dfa
feat: Send panic information to tracing events (#313)
* feat: Send panic information to tracing events

* fix: PR Review improvements

* fix: PR comments

* fix: Apply suggestions from code review

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>

* fix: more fixes

* fix: clarify /cleanup drop more

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
2020-09-25 14:55:58 -04:00
Edd Robinson ec1aaa3a47 chore: update dependencies 2020-09-25 17:22:48 +01:00
Edd Robinson 9eee0c2852 refactor: make clippy happy 2020-09-25 10:12:46 +01:00
Edd Robinson c42d2dcd79 refactor: rebase with delorean_arrow 2020-09-25 10:12:46 +01:00
Edd Robinson d0f3cae9b3 feat: add tag values schema API 2020-09-25 10:12:46 +01:00
Edd Robinson 47b2f7940b refactor: spike on arrow encoding 2020-09-25 10:12:46 +01:00
Edd Robinson e5f9c7c574 refactor: add encoding trait 2020-09-25 10:12:46 +01:00
alamb 54e9d38589 chore: update the refs to github 2020-09-25 10:12:46 +01:00
alamb 41899203d9 refactor: implement a prototype datafusion integration layer demonstration 2020-09-25 10:12:46 +01:00
alamb 820277a529 feat: load segments from parquet 2020-09-25 10:12:46 +01:00
alamb acfef35a0e feat: load segments from parquet 2020-09-25 10:12:46 +01:00
alamb 7f815099d0 feat: Read from parquet rather than arrow 2020-09-25 10:12:46 +01:00
Edd Robinson a5a8667a42 feat: group by sorting 2020-09-25 10:12:46 +01:00
Edd Robinson 231f429a56 feat: sort group by measurement 2020-09-25 10:12:46 +01:00
Edd Robinson 2387b7c849 feat: add support for group by aggregate 2020-09-25 10:12:46 +01:00
Edd Robinson aba02cb731 feat: basic store 2020-09-25 10:12:46 +01:00
Andrew Lamb 77f58efca7
chore: update Arrow/Parquet/DataFusion versions, consolidate references into new crate (#309)
* chore: consolidate all arrow/parquet/datafusion dependencies

* chore: update datafusion version
2020-09-24 08:46:54 -04:00
Andrew Lamb 498478c066
refactor: rename delorean_storage_interface to delorean_storage (#308) 2020-09-22 17:18:53 -04:00
Andrew Lamb d0f2902c8d
feat: implement tag_keys and measurement_tag_keys (#307)
* feat: implement tag_keys and measurement_tag_keys

* fix: fix timestamp bound evaluation
2020-09-22 16:42:45 -04:00
Jake Goulding 648d42568d feat: Add a benchmark for restoring the WAL 2020-09-18 16:45:01 -04:00
alamb 2418ee5ab0 refactor: move partitioned_store into its own module 2020-09-18 08:12:19 -04:00
Andrew Lamb 642b1b4370
refactor: move write_buffer to delorean_write_buffer crate (#299) 2020-09-18 08:11:48 -04:00
Andrew Lamb d2c24ef7af
refactor: pull storage interface into delorean_storage_interface (#298) 2020-09-18 07:58:19 -04:00
Andrew Lamb 5fe3bfd53c
refactor: extract WalDetails into delorean_wal_writer crate (#297) 2020-09-18 07:47:37 -04:00
Carol (Nichols || Goulding) 596c987956 feat: Compress WAL entries with Snappy
Fixes #276.
2020-09-14 09:42:54 -04:00
Andrew Lamb 82d5f485c3
test: traits for database and tests for http handler (#284)
* test: traits for database and tests for http handler

* refactor: Use generics and trait bounds instead of trait objects

* refactor: Replace trait objects with an associated type

* refactor: Extract an associated Error type on the Database traits

* refactor: Remove some explicit conversions to_string that Snafu takes care of

* docs: add comments

* refactor: move traits into storage module

Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@integer32.com>
2020-09-11 17:42:00 -04:00
alamb 9b9ff484bb fix: implement escaping 2020-09-11 17:14:35 -04:00
Paul Dix 8ed3a1b440
feat: Initial prototype of WriteBuffer and WAL (#271)
This is the initial prototype of the WriteBuffer and WAL. This does the following:

* accepts a slice of ParsedLine into the DB
* writes those into an in memory structure with tags represented as u32 dictionaries and all field types supported
* persists those writes into the WAL as Flatbuffer blobs (one WAL entry per slice of lines written, or WriteBatch)
* has a method to return a table from the buffer as an Arrow RecordBatch
* recovers the WAL after the database is closed and opened back up again
* has a single test that covers the end-to-end from the DB side
* It doesn't include partitioning yet. Although the write_lines method does actually try to do partitions on time. That'll get changed to be something more general defined by a per database configuration.
* hooked up to the v2 HTTP write API
* hooked up to a read API which will execute a SQL query against the data in the buffer

This includes a refactor of the WAL:

Refactors the WAL to remove async and threading so that it can be moved higher up. This simplifies the API while keeping just about the same amount of code in ParitionStore to handle the asynchronous writes.

This also modifies the WAL to remove the SideFile implementation, which was causing significant performance problems and write amplification. The downside is that WAL writes are no longer guarranteed atomic.

Further, this modifies the WAL to keep the active segement file handle open. Appends now don't have to list the directory contents and look for the latest file and open the file handle to do appends, which should also improve performance and reduce iops.
2020-09-08 14:12:16 -04:00
Carol (Nichols || Goulding) d59702ec79 feat: Make the create bucket HTTP API match the Influx 2.0 API
The `/api/v2/create_bucket` API was delorean-specific for testing
purposes. This change makes it match the [Influx 2.0 API][influx] and
adds a method to the client for creating buckets.

The client will always send an empty array of `retentionRules` because
that is a required parameter for the Influx API. Delorean always ignores
`retentionRules`. The `description` and `rp` parameters are optional and
are never sent.

[influx]: https://v2.docs.influxdata.com/v2.0/api/#operation/PostBuckets

I believe the gRPC create bucket is also delorean-specific and perhaps
not needed, but I'm leaving it in for now with a note.
2020-08-12 10:08:32 -04:00
Edd Robinson 21c0155271 fix: improve pivot for certain sorts 2020-08-04 21:33:58 +01:00
Carol (Nichols || Goulding) 19159138cc fix: Turn off default features of parquet so arrow-flight doesn't repeatedly rebuild
Fixes #261
2020-07-30 09:43:12 -04:00
alamb f946e84a12 chore: revert upgrade parquet dependency to 1.0.0"
This reverts commit 25259b4c99.
2020-07-30 07:02:53 -04:00
alamb 25259b4c99 chore: upgrade parquet dependency to 1.0.0 2020-07-28 15:11:35 -04:00
Carol (Nichols || Goulding) 0709f90040 test: Add a mock server test in the client crate for the newline bug 2020-07-27 14:10:54 -04:00
Jake Goulding b72c2ffd73
Merge pull request #253 from influxdata/client-dynamic-data-point 2020-07-24 09:50:11 -04:00
Carol (Nichols || Goulding) c179a7e8b2 fix: Remove generate/seed utilities
These are going to be redone in the fusion repo.
2020-07-22 17:15:30 -04:00
Jake Goulding f8304e6e6b feat: Add a dynamic type to construct data points for ingestion 2020-07-22 17:03:29 -04:00
Andrew Lamb 143c350ecb
Merge pull request #250 from influxdata/alamb/feat-multi-col-stats
feat: Update stats command to handle directories of files
2020-07-20 16:48:31 -04:00
alamb ca1bd79902 feat: Update stats command to handle directories of files 2020-07-17 16:47:11 -04:00
Carol (Nichols || Goulding) 668aefae9b feat: Implement a rudimentary write API in the influx client 2020-07-17 10:28:19 -04:00
Carol (Nichols || Goulding) 7ed24241b5 feat: Set up an InfluxData 2.0 client crate 2020-07-17 10:27:33 -04:00
Carol (Nichols || Goulding) b3a16c080f feat: Update croaring
Jake dug into why the end-to-end tests fail with delorean running in the
Docker image I built, and it appears to be a crash with an illegal
instruction from CRoaring.

We think it's this issue: https://github.com/saulius/croaring-rs/pull/62
which was merged and released, so let's try updating CRoaring.
2020-07-08 08:49:28 -04:00
Edd Robinson 831f647b9d feat: implement escaped tsm key parsing 2020-07-04 08:46:45 -04:00
Edd Robinson 06e9fae845 fix: ignore conflicting field types
Fixes #205.
2020-06-30 18:08:05 +01:00
Andrew Lamb 97a5eb7e19
Merge pull request #197 from influxdata/alamb/log-requests
feat: Log gRPC calls using trace crate, allow custom log levels
2020-06-30 10:47:11 -04:00
alamb 283d6691c6 feat: enable rpc debug tracing, tweaked logging levels, respect RUST_FMT env var 2020-06-29 09:59:22 -04:00
Jake Goulding ad1e3d04bb feat: Add a local filesystem implementation of the object store 2020-06-29 08:48:48 -04:00
Edd Robinson d15256e0e7 refactor: address PR feedback 2020-06-26 12:08:42 +01:00
Edd Robinson 9d889828c3 fix: ensure all rows are emitted for each column 2020-06-26 11:50:37 +01:00
alamb 68ce351a3a refactor: remove direct parquet dependency from delorean_ingest 2020-06-23 16:58:31 -04:00
Carol (Nichols || Goulding) d7dbf061cb feat: Implement String encoding/decoding
Fixes #148.
2020-06-22 15:15:34 -04:00
Edd Robinson 106bd69b5a feat: support converting from TSM->Parquet 2020-06-22 18:56:17 +01:00
Edd Robinson 85e0b4ec16 refactor: hoist tsm reader into own crate 2020-06-22 18:56:17 +01:00
Edd Robinson ac7bb6bf68 refactor: make Packer generic 2020-06-22 11:24:29 +01:00
Jake Goulding bfb0213ac3 feat: Update Rusoto to allow streaming data on uploads 2020-06-19 09:18:44 -04:00
Andrew Lamb 8185c80c03
fix: fix logical merge conflict (#169) 2020-06-18 18:51:25 -04:00
Andrew Lamb ae37548980
feat: Add support for parsing string values in line protocol parser (#155)
* feat: add debug logging on parser error

* feat: Add support for parsing string values in line protocol parser

* fix: Fix comment

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-18 12:44:17 -04:00
Andrew Lamb cf248f2143
feat: upgrade to latest arrow / byteorder (#154) 2020-06-17 12:50:23 -04:00
Carol (Nichols || Goulding) d83c410a5c feat: Update to the released version of cloud-storage
My submitted API improvements got merged in!
2020-06-10 17:23:52 -04:00
Carol (Nichols || Goulding) d3283b1096 feat: Object storage in S3 and GCS 2020-06-10 17:23:52 -04:00
Andrew Lamb faf3f534ac
refactor: move all dstool code into delorean binary (#131)
* refactor: move all dstool code into delorean binary

* fix: Move code/mods to make it compile and run

* fix: warn if db dir does not exist

* refactor: Match argument subcommands w/ more idomatic  rust

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

fix: restore hyper logging

fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: update expected code

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-10 16:04:46 -04:00
Andrew Lamb 0415b233ec
refactor: Instantiate the table writer on demand (#128)
* refactor: instantiate ParquetWriter on demand, prep for multi measurements

* fix: doc test

* fix: update names
2020-06-09 16:11:42 -04:00
Andrew Lamb 986e12d62a
refactor: Rename crate line_protocol_schema --> delorean_table_schema (#129)
* refactor: Rename crate line_protocol_schema --> delorean_table_schema

* fix: fmt
2020-06-09 11:56:16 -04:00
Andrew Lamb f1a3058b24
feat: Add file / metadata inspection + dumping with dstool (#112)
* feat: Add file / metadata inspection + dumping

* fix: apply some PR review comments

* fix: apply suggestions from code review

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>

* feat: Add tests, rearrange code into modules, add gzip aware interface

* fix: fix comment and test

* fix: test output and fmt

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
2020-06-09 10:10:55 -04:00
Andrew Lamb 8475b6d183
feat: Add parquet writer, hook up conversion in dstool (#124)
* feat: Add parquet writer, hook up conversion in dstool

* fix: use bigger executor for test

* fix: less cloning

* fix: make unsupported messages less pejorative

* fix: fmt

* fix: Rename writer and do not require std::File, add example

* fix: clippy and fmt

* fix: remove unnecessary module in end to end tests

* fix: remove strange use of tempfile

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: cleanup use

* fix: Use more specific error messages

* fix: comment tweak

* fix: touchup temp path creation

* fix: clippy!

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-08 16:25:24 -04:00
Andrew Lamb ca9f9d4cae
feat: Add column packing code (#114)
* feat: Add column packing code

* fix: remove dependency on assert_approx_equal in favor of delorean_test_helpers

* fix: Cleanups from pr comments

* fix: Apply suggestions from code review

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: more cleanup per code review

* fix: pr comments

* fix: remove explict string creation from caller

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-06 06:04:41 -04:00
Andrew Lamb 2200def8ea
feat: Use rust nightly (#123) 2020-06-05 17:45:44 -04:00
Jake Goulding 68fb580b43
style: Re-enable the elided lifetimes lint and move generated types to their own crate (#119)
* refactor: rename the module containing generated types

The nested `delorean` was confusing anyway, and this will make more
sense when we extract a new crate.

* refactor: Move the generated types to their own crate

This allows us to have more lax warnings in that crate alone, keeping
the main crate more strict.

* style: Re-enable elided lifetimes lint in the main crate
2020-06-05 16:22:27 -04:00
Edd Robinson 887ffd5977 refactor: remove lifetime to make index re-usable 2020-06-04 14:36:43 +01:00
Edd Robinson e3db077121 feat: add API for series key information 2020-06-04 14:36:43 +01:00
Edd Robinson 413738a264 feat: support org and bucket ID in entries 2020-06-04 14:36:43 +01:00
Andrew Lamb 234b2f5752
feat: Line Protocol Schema extractor (#108)
* feat: schema inference from iterator of parsed lines

* fix: Clean up error handing even more

* fix: fmt

* fix: make a sacrifice to the clippy gods
2020-06-03 18:29:57 -04:00
Andrew Lamb 5d2c5de39d
feat: Structs to represent line protocol schema (#103)
* feat: Structs to represent line protocol schema

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
2020-06-03 08:39:35 -04:00
Andrew Lamb 18b05ce9ef
fix: move test of dstool to its delorean_storage_tool package (#107) 2020-06-02 16:10:30 -04:00
Andrew Lamb 1a2efdfd71
feat: Add dstool command line tool (#102)
* feat: Add dstool command line tool

* clippy

* Update delorean_storage_tool/src/main.rs

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* Update delorean_storage_tool/src/main.rs

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* Add in tests + PR comments

* fmt

* build first then run tests

* actually build before test

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-06-02 07:33:43 -04:00
Jake Goulding 924f20fd50 build: update semver-compatible versions 2020-05-29 13:40:44 -04:00
Jake Goulding d4af54c3de refactor: extract the line protocol parser to a separate crate
This will facilitate reusing the parser for other tasks.
2020-05-26 13:22:34 -04:00
Carol (Nichols || Goulding) f2823ccecd feat: WAL file rollover based on size of file 2020-05-18 14:08:24 -04:00
Carol (Nichols || Goulding) e25a4e1e83 feat: Integrate the WAL with delorean 2020-05-11 15:38:47 -04:00
Jake Goulding 4dd7a8cea8 feat: Introduce a WAL tailored for delorean 2020-05-11 15:38:44 -04:00
Jake Goulding bff4d2d6d9 refactor: Move temporary directory creation to test helpers 2020-05-11 15:26:00 -04:00
Jake Goulding 22136d5431 build: update semver-compatible versions 2020-05-11 15:25:54 -04:00
Jake Goulding e369ada35a refactor: extract a crate with our custom assertions
There's probably an existing crate that we should use directly, but I
haven't found an exact match yet.
2020-05-01 13:04:24 -04:00
Carol (Nichols || Goulding) 7f9eaf51d5
Merge pull request #74 from influxdata/cn-generate-points 2020-04-24 08:08:32 -04:00
Edd Robinson f1d5d50b92
Merge pull request #68 from influxdata/er-block-writer
feat: add Block Type
2020-04-23 22:48:38 +01:00
Carol (Nichols || Goulding) fa69101945 refactor: Move the point utilities into a workspace crate 2020-04-23 11:26:37 -04:00
Carol (Nichols || Goulding) 6791b9598c feat: Utility to take line protocol and make write requests 2020-04-23 11:11:51 -04:00
Jake Goulding 93231c64e0 perf: Use a SmallVec for escaped strings and sets of tags and values
This increases the performance from 56.531 MiB/s to 58.194 MiB/s.
2020-04-08 14:41:42 -04:00
Edd Robinson 9e20743b2c feat: add Block Type
This commit adds a new Block type, which is used to keep track of values
associated with individual block, and then serialise them.
2020-04-08 13:37:48 +01:00
Jake Goulding 974a142cc8 build: update semver-compatible versions 2020-04-05 16:35:27 -04:00
Jake Goulding 8629072508 build: Upgrade tonic to 0.2 2020-04-05 16:35:00 -04:00
Jake Goulding 4a28abd4de build: Upgrade assert_cmd to 1.0
This requires that we opt into the serde `derive` feature that is no
longer implicitly added from upstream.
2020-04-05 16:33:37 -04:00
Jake Goulding 48d5d16a1b build: update semver-compatible dependency versions 2020-04-05 16:33:24 -04:00
Carol (Nichols || Goulding) df67b9715a Merge remote-tracking branch 'origin/master' into pd-partiton-store 2020-04-02 11:15:26 -04:00
Jake Goulding 97d11633b8 feature: Use a unique directory per end-to-end test run 2020-04-02 11:06:36 -04:00
Carol (Nichols || Goulding) d9cf5c952a fix: Remove RocksDB code 2020-04-02 09:41:30 -04:00
Jake Goulding 4fd0c6f210 feat: Error when parsing lines with duplicate tags 2020-03-11 22:43:09 -04:00
Jake Goulding 78a53aa391 refactor: Replace the hand-written parser with one built with nom 2020-03-06 10:00:29 -05:00
Jake Goulding 5d3f99da98 refactor: Remove unused failure crate 2020-02-28 16:54:28 -05:00
Edd Robinson 17051717e2 chore: remove dependency: 2020-02-28 12:55:28 +00:00
Edd Robinson 38f23ac07a refactor: merge master in 2020-02-27 14:27:23 +00:00
Carol (Nichols || Goulding) c41652e45b feature: Add the storage gRPC proto definitions 2020-02-24 08:26:28 -05:00
Jake Goulding 3438edd18b feature: Switch from prost to tonic 2020-02-17 16:37:43 -05:00
Jake Goulding 68970f8ff3 build: Update bytes to latest version 2020-02-17 10:48:33 -05:00
Jake Goulding 155bfcbd4f build: Update prost to latest version 2020-02-17 10:48:33 -05:00
Jake Goulding eb3113b820 build: Update dotenv to latest version 2020-02-17 10:48:32 -05:00
Jake Goulding 31df104996 build: Update env_logger to latest version 2020-02-17 10:48:32 -05:00
Jake Goulding e3bfa0f835 build: Update byteorder to latest version 2020-02-17 10:48:32 -05:00
Jake Goulding 04a7f716e2 build: Update dependencies to latest semver-compatible versions 2020-02-17 10:48:32 -05:00
Carol (Nichols || Goulding) 8b1255be9d refactor: Switch to a hyper server 2020-02-14 09:59:09 -05:00
Carol (Nichols || Goulding) 062bbc5a34 Merge remote-tracking branch 'origin/master' into er-encoder-bench 2020-02-14 09:15:24 -05:00
Carol (Nichols || Goulding) 9cce1e4882 test: Add an end-to-end test
This test:
- Runs the server in a thread
- Writes some data
- Reads some data
- Shuts down the server
2020-02-13 10:40:03 -05:00
Edd Robinson 4185307d78 test: add float encoder/decoder bencmarks
This commit adds benchmarks for the float encoder and decoder. The
following scenarios are benchmarked:

- sequential values;
- random values;
- real CPU values (from Telegraf output).

Each scenario is benchmarked with a variety of block sizes.
2020-01-21 15:01:35 +00:00
Edd Robinson 75d378974e fix: cargo lock file 2020-01-07 16:05:46 +00:00
Edd Robinson 80ff911259 feat: automatically create db dir and test dir 2020-01-06 15:55:22 +00:00
Paul Dix 1a851f8d0b Add basic read endpoint
This commit adds a basic read endpoint to pull data out of the database. In order to provide the basic functionality a few things were added:

* Time package with limited support for parsing Flux style durations
* API endpoint at /api/v2/read with query paramters of org_id, bucket_name, predicate, start, and stop

The start and stop query parameters only support relative durations.

The predicate parameter supports what is possible in the parse_predicate method and in the RocksDB implementation (only == comparisons on tags and AND or OR)
2020-01-04 19:07:54 -05:00
Paul Dix 4265e7b11b Update write API endpoint
Upates the actix-web and actix-rt versions to 2.0 and 1.0 respectively. Wires up the write endpoint to create buckets if they don't exist.
2020-01-03 17:35:18 -05:00
Paul Dix 6cd4c5b583 Add basic tag key/value index
This commit brings in a Roaring Bitmap implementation to keep postings lists of tag key/value pairs to the set of series ids that have those pairs. The croaring implementation was used becasue the Treemap was required for u64 support for series ids and it was serializable (unlike the other pure Rust roaring implementation).

This doesn't shard the postings lists based on size. It also doesn't implement the time/index levels.

The predicate matching currently only works for a simple key = "value" match.
2019-12-24 13:44:30 -05:00
Paul Dix 617c2960a8 feat(storage): Implement bucket definitiions and persistence
This updates to build system to use Prost to build the protobuf objects.

It adds tests for creating, storing and loading bucket definitions.

The tests use an actual on disk RocksDB implementation to ensure that its tested all the way to persistence.
2019-12-17 17:01:41 -05:00
Paul Dix 9cadb1bb52 Add server skeleton with Actix and RocksDB 2019-12-12 10:15:16 -05:00
Edd Robinson fb83e9c7fa feat(encoding): add RLE encoder/decoder 2019-12-09 18:04:01 +00:00
Edd Robinson 55d711599e tests: add tests for simple8b 2019-12-04 13:14:37 +00:00
Paul Dix b9b5a815b7 Initial commit with some notes and proto 2019-11-22 16:59:04 -05:00