Andrew Lamb
cad5f9166b
feat: Port Duration and Window logic to support window aggregates ( #460 )
...
* feat: Port enough of Window and Duration to implement window_bounds
* fix: clippy
* fix: Add a few more source links
* fix: Eust --> Rust in comments :(
* fix: add comments about remainder, and add test demonstraitng behavior
* fix: Apply suggestions from code review
2020-11-18 09:49:59 -05:00
Paul Dix
b1ae1e8e91
Update README.md
2020-11-17 17:14:45 -05:00
Paul Dix
398030d792
Update README.md
2020-11-17 15:31:34 -05:00
Paul Dix
f7627266c1
Update README.md
2020-11-17 13:55:14 -05:00
Andrew Lamb
fe663c3534
feat: add cpu_feature_check ( #458 )
...
* feat: add cpu_feature_check
* fix: clarify output
2020-11-17 13:28:23 -05:00
Edd Robinson
0720cc36d0
refactor: address PR feedback
2020-11-17 15:41:58 +00:00
Edd Robinson
936eb16ce2
refactor: update segment_store/src/column/dictionary/plain.rs
...
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2020-11-17 14:55:02 +00:00
Paul Dix
7f21283e79
Merge pull request #456 from influxdata/pd-update-readme
...
chore: Update README with intro and status
2020-11-17 07:44:24 -05:00
Paul Dix
a096f688d3
chore: Update README with intro and status
2020-11-17 07:39:26 -05:00
Edd Robinson
556f4dd343
refactor: tidy up API
2020-11-16 22:16:12 +00:00
Edd Robinson
a2338b9348
perf: add SIMD-enabled method of matching equality predicate
...
This commit adds an alternative implementation of `row_ids_equal` for
the `Plain` dictionary encoding, which uses SIMD intrinsics to improve
the performance of identifying all rows in the column containing a
specified `u32` integer.
The approach is as follows. First, the integer constant of interest is
packed into a 256 bit SIMD register. Then the column is iterated over
in chunks of size 8 (thus, 256 bits at a time). The expectation is that
for a colum using this encoding it is likely most values will not match
an equality predicate, so the happy path is to compare the packed
register against each chunked register. This is done using the
`_mm256_cmpeq_epi32`[1] intrinsic, which returns a mask where each 32
bits is `0xFFFFFFFF` if the two values at that location in the register
are equal, or `0x00000000` otherwise.
Becuase the expectation is that most values don't match the id we want,
we check if all 32-bit values in this 256-bit mask register are `0`. If
the register's values are not all 0 then the register is inspected to
determine the locations where values match. The offsets of these values
are used to determine the row id to add to the result set.
On my laptop, benchmarking indicates that the SIMD implementation
increases throughput performance (finding all matching rows) by
~100%-390%.
This SIMD implementation will be automatically used if the CPU supports
avx2 instructions, otherwise the a non-SIMD implementation will be
fallen back to.
[1] https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm256_cmpeq_epi32&expand=774
```
2020-11-16 22:12:25 +00:00
Edd Robinson
25af7674ca
perf: benchmark plain dictionary encoding
2020-11-16 22:12:25 +00:00
Edd Robinson
d54c30147e
refactor: expose public API
2020-11-16 22:12:25 +00:00
Edd Robinson
fc881776dd
feat: implement size and cardinality
2020-11-16 22:12:25 +00:00
Edd Robinson
43373cb650
feat: implement size on Dictionary encoding
2020-11-16 22:12:25 +00:00
Edd Robinson
1252d1b2f4
feat: wire up Plain dictionary encoder
2020-11-16 22:12:25 +00:00
Edd Robinson
94d37a9ff2
refactor: rename Column StringEncoding::RLE to RLEDictionary
2020-11-16 22:12:25 +00:00
Edd Robinson
59512bff74
feat: implement materialisation functionality
2020-11-16 22:12:25 +00:00
Edd Robinson
04505bf818
feat: implement row_ids_filter
2020-11-16 22:12:25 +00:00
Edd Robinson
a6627aa5db
feat: implement push on plain dict encoding
2020-11-16 22:12:25 +00:00
Edd Robinson
d8f382e5b7
feat: skeleton dictionary plain encoding
2020-11-16 22:12:25 +00:00
Edd Robinson
bcd8a63556
refactor: introduce dictionary enum and wire tests
2020-11-16 22:12:25 +00:00
Edd Robinson
b2c69dff1d
refactor: create dictionary module
2020-11-16 22:12:25 +00:00
Andrew Lamb
597933622d
fix: improve error messages with more context ( #455 )
2020-11-16 16:40:29 -05:00
Andrew Lamb
831a0875d6
chore: update to latest arrow + Rust nightly-2020-11-14 ( #454 )
...
* chore: update to latest arrow + Rust nightly-2020-11-14
* chore: update ci
* fix: update for clippy lints
* fix: Allow redundant_field_names in generated types crate
* fix: clippy about try_for_each
* fix: clippy uneeded-collect
* fix: clippy about default values
* fix: clippy mathces --> matches!
* fix: clippy sort --> sort_by_key
* fix: clippy about default values again
2020-11-16 11:48:42 -05:00
Arve Knudsen
cc6394d68a
fix: return error from binding HTTP server address instead of panicking ( #453 )
...
Signed-off-by: Andrew Lamb <alamb@influxdata.com>
2020-11-16 10:59:47 -05:00
Andrew Lamb
87626a3635
feat: Update storage protobuf definitions, add stubs for read_window_aggregate ( #444 )
...
* feat: Update storage protobuf definitions, add stubs for read_window_aggregate
* refactor: Extract the features field in a clearer way
* docs: Add provenance information to service.proto
2020-11-12 07:07:42 -05:00
Andrew Lamb
b9f347c2bc
fix: Update git branch ref from master --> main to reflect new default branch ( #445 )
2020-11-12 07:03:43 -05:00
Andrew Lamb
659da9264a
chore: Update predicate protobuf definitions ( #443 )
2020-11-11 18:06:39 -05:00
Andrew Lamb
2fa0e03162
fix: Use datafusion optimizer in IOx query plans ( #439 )
...
* chore: update arrow dep to 8e4d9ebef3
* fix: checkin Cargo.lock
* fix: Enable datafusion optimizer, use display_indent_schema
2020-11-11 18:06:21 -05:00
Andrew Lamb
bcbf06be09
refactor: split protobuf definitions into multiple files, matching influxdb ( #442 )
2020-11-11 15:20:53 -05:00
Andrew Lamb
33f3ca8b6d
feat: Print message to stdout when the server is read ( #432 )
2020-11-11 06:41:54 -05:00
Andrew Lamb
986436300b
fix: update the branch referece in nightly ci image to main ( #438 )
2020-11-11 06:40:38 -05:00
Edd Robinson
c79a47b8fa
Merge pull request #431 from influxdata/er/perf/rle_equality_opt
...
perf: enable row ID bitset optimisation for equality predicates
2020-11-11 09:57:38 +00:00
Edd Robinson
27160e35c3
Merge pull request #435 from influxdata/er/refactor/packers
...
refactor: change String variant to Bytes
2020-11-11 09:56:41 +00:00
Edd Robinson
4edbe171c8
refactor: change UtfString variant to String
2020-11-11 09:50:14 +00:00
Edd Robinson
c6439e46a9
refactor: change String variant to Bytes
2020-11-10 22:31:14 +00:00
Andrew Lamb
141527425d
fix: log errors from spawned tokio async tasks ( #423 )
2020-11-10 16:54:26 -05:00
Edd Robinson
26c0d0a7f4
Merge pull request #434 from influxdata/er/feat/packers-string
...
feat: add String support to Packers
2020-11-10 21:36:31 +00:00
Edd Robinson
8254ce0d6a
feat: add string support to Packers
2020-11-10 18:23:33 +00:00
Edd Robinson
8f26270d44
perf: optimise equality predicate on rle column
2020-11-10 17:18:14 +00:00
Carol (Nichols || Goulding)
5bfb2c2533
fix: Add LICENSE ( #430 )
2020-11-10 12:10:07 -05:00
Carol (Nichols || Goulding)
572ff1947a
Merge pull request #428 from influxdata/back-to-the-future
2020-11-10 12:00:27 -05:00
Carol (Nichols || Goulding)
a0a4ca235f
Merge pull request #429 from influxdata/update-readme
...
fix: Improve docs based on feedback from @rbetts
2020-11-10 11:58:22 -05:00
Carol (Nichols || Goulding)
5bd807e44c
fix: Restore test assertion for now
2020-11-10 11:57:24 -05:00
Carol (Nichols || Goulding)
c096682ab8
fix: Improve docs based on feedback from @rbetts
2020-11-10 11:53:44 -05:00
Carol (Nichols || Goulding)
05b60b8fd0
fix: Remove remaining mentions of Delorean
2020-11-10 11:47:42 -05:00
Edd Robinson
ab4844b65b
test: add benchmark for selecting RLE col
2020-11-10 16:43:36 +00:00
Edd Robinson
b6dc9b53b6
Merge pull request #427 from influxdata/er/segment_store
...
chore: rename the segment store crate
2020-11-10 16:39:31 +00:00
Carol (Nichols || Goulding)
eca083d5d1
Merge pull request #426 from influxdata/create-dir
...
fix: Create the database directory if it doesn't already exist
2020-11-10 11:38:19 -05:00