Commit Graph

990 Commits (9a345e226cf533a61c10f005f2a3e4f92046075c)

Author SHA1 Message Date
Paul Dix 9a345e226c chore: refactor cluster to use in memory write buffer
This refactors cluster to use the in memory write buffer. It removes the injected DatabaseStore as it is no longer needed.
2020-10-14 08:36:49 -04:00
Paul Dix 0d6bfd2f29
Merge pull request #356 from influxdata/pd-refactor_cluster_write_buffer
feat: Make table return all columns if none specified for arrow batch
2020-10-14 07:34:09 -04:00
Paul Dix dbc6b7b2d6 feat: Make table return all columns if none specified for arrow batch 2020-10-14 07:28:58 -04:00
Andrew Lamb 1326c831c6
docs: Motivate the use of Arc in SeriesSet and SeriesSetPlan (#354) 2020-10-13 18:11:32 -04:00
Andrew Lamb 206df6a325
feat: implement data fusion execution and conversion to series sets (#353) 2020-10-13 16:53:00 -04:00
Andrew Lamb 246a3d4400
docs: Update comments (#352) 2020-10-12 20:04:34 -04:00
Andrew Lamb 80088ffe37
feat: gRPC plumbing + interface structures for read_filter (#351)
* feat: gRPC plumbing + support structures for read_filter

* fix: cleanup comments
2020-10-12 14:12:53 -04:00
Paul Dix befd386088
Merge pull request #347 from influxdata/pd-partition-key-generation
feat: Implement partition templates and key generation
2020-10-12 08:26:11 -04:00
Paul Dix 77e732cc69
Merge pull request #349 from influxdata/pd-replicate
feat: Store replicated writes
2020-10-12 08:17:42 -04:00
Paul Dix a80eb0fed3 feat: Store replicated writes
This commit refactors the flatbuffers data types from the wal to a new crate where they can be used by storage, write buffer, and cluster. It also refactors cluster to move the configuration types out to the data types crate so they can be used across storage and elsewhere.

Finally, it adds a new method to store replicated writes on a database in the database trait and implements it.
2020-10-11 15:45:08 -04:00
Paul Dix 996f8905b6 feat: Implement partition templates and key generation
This commit implements partition templates as a struct that can be serialized and deserialzed. It is comprised of parts that can include the table name, a column name and its value, a formatted time, or a string column and regex captures of its value.
2020-10-10 11:32:17 -04:00
Paul Dix cceeebb317
Merge pull request #342 from influxdata/pd-cluster-updates
feat: Update cluster with replication and subscriptions
2020-10-09 07:41:32 -04:00
Andrew Lamb 2b8c04f2b4
chore: Update arrow (again) to pick up latest changes to datafusion (#345) 2020-10-09 07:17:02 -04:00
Andrew Lamb aaeb0d4c84
refactor: implement automatic error conversion for errors that do not have lots of context (#341)
* refactor: implement automatic error conversion for errors that do not have lots of context

* fix: implement code review suggestions
2020-10-08 11:21:54 -04:00
Andrew Lamb a72e608810
feat: enable simd in arrow (#343) 2020-10-08 11:21:22 -04:00
Paul Dix 05dcbd7236 feat: Update cluster with replication and subscriptions
This updates cluster so that the concept of replication and subscriptions for handling queries are separated. It also adds flatbuffer structure that can be used as a common format for replication.
2020-10-08 08:40:13 -04:00
Andrew Lamb 5400c55b2a
refactor: apply timestamp predicate in visit code (#340) 2020-10-07 12:33:04 -04:00
Andrew Lamb 9a81bf4d72
feat: implement column_values for write buffer database (#339) 2020-10-07 10:12:28 -04:00
Andrew Lamb 3ba1a95795
refactor: extract "traverse the write buffer structure" into a visitor trait/pattern (#338) 2020-10-06 17:08:46 -04:00
Andrew Lamb 3d670fb556
feat: Implement gRPC routes tag_values and measurement_tag_values (#337) 2020-10-06 17:07:03 -04:00
Andrew Lamb bc5378c7fe
chore: Update arrow to latest version (#335)
* chore: Update arrow to latest version

* fix: Updates needed by new version of datafusion
2020-10-02 14:46:07 -04:00
Paul Dix 1b69a5a79c
refactor: WriteBuffer database and WAL Flatbuffers (#331)
* chore: Refactor write buffer WAL

This commit refactors the WAL to remove partition events and to collapse rows into a single write buffer entry.
This further simplifies the WAL by removing WriteBufferBatch.
Finally, this removes the concept of a partition generation as that is currently not used.

* refactor: WriteBuffer database and WAL Flatbuffers

This refactor updates the WriteBuffer write path signficantly. At the public API it takes parsed lines, but then immediately converts them over to a built Flatbuffer byte array, which has also been signficantly refactored.

The Flatbuffer structure has been updated so that a WriteBufferBatch contains a vec of WriteBufferEntry. Each of those entries corresponds to a collection of data that is bound for a single partition. The generated partition key is now kept as part of this entry.

Within the WriteBufferEntry you now have a vec of TableWriteBatch which have the table name and a vec of Row. This pulls the table name out of the row, elminating redundancy for writes that have multiple rows being written into the same table.

The database now has methods to accept the Flatbuffer WriteBufferEntry with updates down the line to Partition and Table.

This also has a nice little performance bump for WAL restore:

wal-restoration/restore_single_entry_single_partition
                        time:   [684.51 us 688.45 us 692.53 us]
                        thrpt:  [1.4440 Melem/s 1.4525 Melem/s 1.4609 Melem/s]
                 change:
                        time:   [-55.913% -55.351% -54.800%] (p = 0.00 < 0.05)
                        thrpt:  [+121.24% +123.97% +126.82%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

wal-restoration/restore_multiple_entry_multiple_partition
                        time:   [8.7483 ms 8.8964 ms 9.0815 ms]
                        thrpt:  [1.3214 Melem/s 1.3489 Melem/s 1.3717 Melem/s]
                 change:
                        time:   [-55.952% -55.166% -54.213%] (p = 0.00 < 0.05)
                        thrpt:  [+118.40% +123.04% +127.02%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe

* fix: fmt

Co-authored-by: alamb <andrew@nerdnetworks.org>
2020-10-02 13:52:00 -04:00
Andrew Lamb 45c4f1e24e
refactor: make table_names API consistent with tag_keys API, other cleanups (#327) 2020-10-02 09:42:06 -04:00
Andrew Lamb 0a48c04a9b
refactor: improve predicate conversion code (#325) 2020-10-01 17:26:39 -04:00
Andrew Lamb ff29610e44
refactor: Switch back to https://github.com/apache/arrow (#333) 2020-10-01 16:57:12 -04:00
Andrew Lamb 3d7d4111be
fix: Upgrade the resource class used to run CI tests (#332) 2020-10-01 14:56:32 -04:00
Andrew Lamb 2b98da593b
feat: write_database support for predicates (#326)
* feat: write_database support for predicates

* fix: temporarily pull in arrow fork to pick up fix for ARROW-10136

* fix: Update mutex usage based on PR feedback

* fix: more mutex polish and use OptionExt

* fix: update comments

* fix: rust-fu the table lookup

* fix: update docs

* fix: more idomatic rust types

* fix: better usage of reference types
2020-10-01 14:34:53 -04:00
Edd Robinson a2287acb7c
Merge pull request #330 from influxdata/er/feat/segment-store-shell
feat: Segment Store shell
2020-10-01 14:01:45 +01:00
Edd Robinson bd6b0db691 refactor: address PR feedback 2020-10-01 13:13:32 +01:00
Edd Robinson 30c1c9c615
refactor: Update delorean_segment_store/src/table.rs 2020-10-01 12:16:36 +01:00
Edd Robinson 10219427e3
refactor: update delorean_segment_store/src/lib.rs
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2020-10-01 12:11:48 +01:00
Edd Robinson 440e8d71a1 refactor: use some arrow types for value arrays 2020-10-01 11:34:06 +01:00
Edd Robinson 2404429b07 refactor: address PR feedback 2020-10-01 11:22:13 +01:00
Edd Robinson d7ee107a5e chore: please clippy for now 2020-10-01 10:40:54 +01:00
Edd Robinson 4ef18d7b8e refactor: thinking about moving from WB to SS 2020-09-30 22:10:21 +01:00
Edd Robinson b2795f57b9 refactor: wire execution up to store 2020-09-30 21:59:54 +01:00
Paul Dix fdc86fd186
feat: add some initial framework for clustering (#329) 2020-09-30 14:41:42 -04:00
Edd Robinson b0783d11a4 feat: add metadata 2020-09-30 18:58:51 +01:00
Edd Robinson d8e49411ac feat: basic store API 2020-09-30 16:18:39 +01:00
Andrew Lamb 0976498dd9
feat: gRPC Predicate --> DataFusion Predicate conversion (#323) 2020-09-30 08:19:30 -04:00
Andrew Lamb 8a14896487
chore: update version of datafusion (#324)
* chore: update version of datafusion

* chore: Update interfaces to be async
2020-09-30 08:02:15 -04:00
Andrew Lamb 30c7cc6895
feat: Add SchemaPivot node + DataFusion planning plumbing (#320)
* feat: Add SchemaPivot node + DataFusion planning plumbing

* refactor: more idomatic snafu

* fix: remove hack
2020-09-30 07:52:34 -04:00
Edd Robinson 2470bdb975 feat: segment store shell 2020-09-30 11:25:59 +01:00
Andrew Lamb d40ed663fb
fix: respect `columns` parameter in table_to_arrow (#322) 2020-09-29 07:48:19 -04:00
Andrew Lamb da5c74d3c6
feat: storage interface plans + executor (#318)
* feat: storage interface plans + executor

* refactor: less `expect`

* fix: use more idomatic rust From
2020-09-28 11:41:10 -04:00
Andrew Lamb d606a1f1cd
refactor: split delorean_write_buffer/src/database.rs into multiple modules (#317) 2020-09-28 06:20:59 -04:00
Carol (Nichols || Goulding) 5f135e922a
Merge pull request #312 from influxdata/cn/small-changes 2020-09-25 21:34:14 -04:00
Andrew Lamb 0236522dfa
feat: Send panic information to tracing events (#313)
* feat: Send panic information to tracing events

* fix: PR Review improvements

* fix: PR comments

* fix: Apply suggestions from code review

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>

* fix: more fixes

* fix: clarify /cleanup drop more

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
2020-09-25 14:55:58 -04:00
Edd Robinson 2d17e8e1ae
Merge pull request #316 from influxdata/er/chore/deps
chore: update dependencies
2020-09-25 17:44:32 +01:00
Edd Robinson ec1aaa3a47 chore: update dependencies 2020-09-25 17:22:48 +01:00