Commit Graph

1343 Commits (1446b5fcfc2e13ac3ea9d48622e76476d466e513)

Author SHA1 Message Date
Andrew Lamb cad5f9166b
feat: Port Duration and Window logic to support window aggregates (#460)
* feat: Port enough of Window and Duration to implement window_bounds

* fix: clippy

* fix: Add a few more source links

* fix: Eust --> Rust in comments :(

* fix: add comments about remainder, and add test demonstraitng behavior

* fix: Apply suggestions from code review
2020-11-18 09:49:59 -05:00
Paul Dix b1ae1e8e91
Update README.md 2020-11-17 17:14:45 -05:00
Paul Dix 398030d792
Update README.md 2020-11-17 15:31:34 -05:00
Paul Dix f7627266c1
Update README.md 2020-11-17 13:55:14 -05:00
Andrew Lamb fe663c3534
feat: add cpu_feature_check (#458)
* feat: add cpu_feature_check

* fix: clarify output
2020-11-17 13:28:23 -05:00
Edd Robinson 0720cc36d0 refactor: address PR feedback 2020-11-17 15:41:58 +00:00
Edd Robinson 936eb16ce2
refactor: update segment_store/src/column/dictionary/plain.rs
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2020-11-17 14:55:02 +00:00
Paul Dix 7f21283e79
Merge pull request #456 from influxdata/pd-update-readme
chore: Update README with intro and status
2020-11-17 07:44:24 -05:00
Paul Dix a096f688d3 chore: Update README with intro and status 2020-11-17 07:39:26 -05:00
Edd Robinson 556f4dd343 refactor: tidy up API 2020-11-16 22:16:12 +00:00
Edd Robinson a2338b9348 perf: add SIMD-enabled method of matching equality predicate
This commit adds an alternative implementation of `row_ids_equal` for
the `Plain` dictionary encoding, which uses SIMD intrinsics to improve
the performance of identifying all rows in the column containing a
specified `u32` integer.

The approach is as follows. First, the integer constant of interest is
packed into a 256 bit SIMD register. Then the column is iterated over
in chunks of size 8 (thus, 256 bits at a time). The expectation is that
for a colum using this encoding it is likely most values will not match
an equality predicate, so the happy path is to compare the packed
register against each chunked register. This is done using the
`_mm256_cmpeq_epi32`[1] intrinsic, which returns a mask where each 32
bits is `0xFFFFFFFF` if the two values at that location in the register
are equal, or `0x00000000` otherwise.

Becuase the expectation is that most values don't match the id we want,
we check if all 32-bit values in this 256-bit mask register are `0`. If
the register's values are not all 0 then the register is inspected to
determine the locations where values match. The offsets of these values
are used to determine the row id to add to the result set.

On my laptop, benchmarking indicates that the SIMD implementation
increases throughput performance (finding all matching rows) by
~100%-390%.

This SIMD implementation will be automatically used if the CPU supports
avx2 instructions, otherwise the a non-SIMD implementation will be
fallen back to.

[1] https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm256_cmpeq_epi32&expand=774
```
2020-11-16 22:12:25 +00:00
Edd Robinson 25af7674ca perf: benchmark plain dictionary encoding 2020-11-16 22:12:25 +00:00
Edd Robinson d54c30147e refactor: expose public API 2020-11-16 22:12:25 +00:00
Edd Robinson fc881776dd feat: implement size and cardinality 2020-11-16 22:12:25 +00:00
Edd Robinson 43373cb650 feat: implement size on Dictionary encoding 2020-11-16 22:12:25 +00:00
Edd Robinson 1252d1b2f4 feat: wire up Plain dictionary encoder 2020-11-16 22:12:25 +00:00
Edd Robinson 94d37a9ff2 refactor: rename Column StringEncoding::RLE to RLEDictionary 2020-11-16 22:12:25 +00:00
Edd Robinson 59512bff74 feat: implement materialisation functionality 2020-11-16 22:12:25 +00:00
Edd Robinson 04505bf818 feat: implement row_ids_filter 2020-11-16 22:12:25 +00:00
Edd Robinson a6627aa5db feat: implement push on plain dict encoding 2020-11-16 22:12:25 +00:00
Edd Robinson d8f382e5b7 feat: skeleton dictionary plain encoding 2020-11-16 22:12:25 +00:00
Edd Robinson bcd8a63556 refactor: introduce dictionary enum and wire tests 2020-11-16 22:12:25 +00:00
Edd Robinson b2c69dff1d refactor: create dictionary module 2020-11-16 22:12:25 +00:00
Andrew Lamb 597933622d
fix: improve error messages with more context (#455) 2020-11-16 16:40:29 -05:00
Andrew Lamb 831a0875d6
chore: update to latest arrow + Rust nightly-2020-11-14 (#454)
* chore: update to latest arrow + Rust nightly-2020-11-14

* chore: update ci

* fix: update for clippy lints

* fix: Allow redundant_field_names in generated types crate

* fix: clippy about try_for_each

* fix: clippy uneeded-collect

* fix: clippy about default values

* fix: clippy mathces --> matches!

* fix: clippy sort --> sort_by_key

* fix: clippy about default values again
2020-11-16 11:48:42 -05:00
Arve Knudsen cc6394d68a
fix: return error from binding HTTP server address instead of panicking (#453)
Signed-off-by: Andrew Lamb <alamb@influxdata.com>
2020-11-16 10:59:47 -05:00
Andrew Lamb 87626a3635
feat: Update storage protobuf definitions, add stubs for read_window_aggregate (#444)
* feat: Update storage protobuf definitions, add stubs for read_window_aggregate

* refactor: Extract the features field in a clearer way

* docs: Add provenance information to service.proto
2020-11-12 07:07:42 -05:00
Andrew Lamb b9f347c2bc
fix: Update git branch ref from master --> main to reflect new default branch (#445) 2020-11-12 07:03:43 -05:00
Andrew Lamb 659da9264a
chore: Update predicate protobuf definitions (#443) 2020-11-11 18:06:39 -05:00
Andrew Lamb 2fa0e03162
fix: Use datafusion optimizer in IOx query plans (#439)
* chore: update arrow dep to 8e4d9ebef3

* fix: checkin Cargo.lock

* fix: Enable datafusion optimizer, use display_indent_schema
2020-11-11 18:06:21 -05:00
Andrew Lamb bcbf06be09
refactor: split protobuf definitions into multiple files, matching influxdb (#442) 2020-11-11 15:20:53 -05:00
Andrew Lamb 33f3ca8b6d
feat: Print message to stdout when the server is read (#432) 2020-11-11 06:41:54 -05:00
Andrew Lamb 986436300b
fix: update the branch referece in nightly ci image to main (#438) 2020-11-11 06:40:38 -05:00
Edd Robinson c79a47b8fa
Merge pull request #431 from influxdata/er/perf/rle_equality_opt
perf: enable row ID bitset optimisation for equality predicates
2020-11-11 09:57:38 +00:00
Edd Robinson 27160e35c3
Merge pull request #435 from influxdata/er/refactor/packers
refactor: change String variant to Bytes
2020-11-11 09:56:41 +00:00
Edd Robinson 4edbe171c8 refactor: change UtfString variant to String 2020-11-11 09:50:14 +00:00
Edd Robinson c6439e46a9 refactor: change String variant to Bytes 2020-11-10 22:31:14 +00:00
Andrew Lamb 141527425d
fix: log errors from spawned tokio async tasks (#423) 2020-11-10 16:54:26 -05:00
Edd Robinson 26c0d0a7f4
Merge pull request #434 from influxdata/er/feat/packers-string
feat: add String support to Packers
2020-11-10 21:36:31 +00:00
Edd Robinson 8254ce0d6a feat: add string support to Packers 2020-11-10 18:23:33 +00:00
Edd Robinson 8f26270d44 perf: optimise equality predicate on rle column 2020-11-10 17:18:14 +00:00
Carol (Nichols || Goulding) 5bfb2c2533
fix: Add LICENSE (#430) 2020-11-10 12:10:07 -05:00
Carol (Nichols || Goulding) 572ff1947a
Merge pull request #428 from influxdata/back-to-the-future 2020-11-10 12:00:27 -05:00
Carol (Nichols || Goulding) a0a4ca235f
Merge pull request #429 from influxdata/update-readme
fix: Improve docs based on feedback from @rbetts
2020-11-10 11:58:22 -05:00
Carol (Nichols || Goulding) 5bd807e44c fix: Restore test assertion for now 2020-11-10 11:57:24 -05:00
Carol (Nichols || Goulding) c096682ab8 fix: Improve docs based on feedback from @rbetts 2020-11-10 11:53:44 -05:00
Carol (Nichols || Goulding) 05b60b8fd0 fix: Remove remaining mentions of Delorean 2020-11-10 11:47:42 -05:00
Edd Robinson ab4844b65b test: add benchmark for selecting RLE col 2020-11-10 16:43:36 +00:00
Edd Robinson b6dc9b53b6
Merge pull request #427 from influxdata/er/segment_store
chore: rename the segment store crate
2020-11-10 16:39:31 +00:00
Carol (Nichols || Goulding) eca083d5d1
Merge pull request #426 from influxdata/create-dir
fix: Create the database directory if it doesn't already exist
2020-11-10 11:38:19 -05:00