Commit Graph

60 Commits (21110dc23317273fecc56b796799df42d28434e9)

Author SHA1 Message Date
Dom 6f473984d0 style: wrap comments
Runs rustfmt with the new config.
2020-12-11 18:22:26 +00:00
Carol (Nichols || Goulding) fdf82be70b docs: Improving the column_name_and_column description 2020-12-10 15:22:45 -05:00
Carol (Nichols || Goulding) be8c266d3a refactor: Remove one whole lifetime from ReadGroupResult 2020-12-10 15:22:45 -05:00
Carol (Nichols || Goulding) b620c37ecb refactor: Remove one use of the input lifetime in ReadGroupResult 2020-12-10 15:22:45 -05:00
Carol (Nichols || Goulding) 383cd7cf38 refactor: Simplify lifetimes by returning col names from seg store 2020-12-10 15:22:45 -05:00
Carol (Nichols || Goulding) 383b601e10 fix: Some of the slice lifetimes aren't needed
Some are, though
2020-12-10 15:22:45 -05:00
Carol (Nichols || Goulding) 2abb9abfbc refactor: Elide some more lifetimes 2020-12-10 15:22:45 -05:00
Carol (Nichols || Goulding) 4dbc77b441 fix: Change lifetimes on ReadGroupResults too 2020-12-10 15:22:45 -05:00
Carol (Nichols || Goulding) f98f45e49f fix: Correct and clarify lifetimes around the segment store 2020-12-10 15:22:45 -05:00
Edd Robinson 7e04a6eaab refactor: address more PR feedback 2020-12-10 15:15:34 +00:00
Edd Robinson 90b112c652 refactor: address PR feedback 2020-12-10 15:15:34 +00:00
Edd Robinson 138031d5b1 test: add test case for multiple aggregates 2020-12-10 15:15:34 +00:00
Edd Robinson 1d1414e9cf refactor: use for loop 2020-12-10 15:15:34 +00:00
Edd Robinson 34faf72968 refactor: commented code 2020-12-10 15:15:34 +00:00
Edd Robinson 5e138bcded refactor: return groups as vectors 2020-12-10 15:15:34 +00:00
Edd Robinson cc80a73768 perf: replace map with vector 2020-12-10 15:15:34 +00:00
Edd Robinson e1b57aaec4 perf: copy as needed 2020-12-10 15:15:34 +00:00
Edd Robinson 99003b0a6a perf: check intersection cardinality before allocating
Becuase `bitset.and()` allocates a new bitset regardles of the resulting
cardinality we will be allocating more bitsets than necessary. This
change checks if we actually want to make the allocation.

It improves `read_group` performance by ~2X.

```
segment_read_group_pre_computed_groups_no_predicates_cardinality/2000
                        time:   [57.917 ms 58.286 ms 58.700 ms]
                        thrpt:  [34.072 Kelem/s 34.313 Kelem/s 34.532 Kelem/s]
                 change:
                        time:   [-59.703% -59.357% -59.057%] (p = 0.00 < 0.05)
                        thrpt:  [+144.24% +146.05% +148.16%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe
```
2020-12-10 15:15:34 +00:00
Edd Robinson fe27690ca8 test: add benchmarks for specific read_group path
This commit adds benchmarks to track the performance of `read_group`
when aggregating across columns that support pre-computed bit-sets of
row_ids for each distinct column value. Currently this is limited to the
RLE columns, and only makes sense when grouping by low-cardinality
columns.

The benchmarks are in three groups:

* one group fixes the number of rows in the segment but varies the
  cardinality (that is, how many groups the query produces).
* another groups fixes the cardinality and the number of rows but varies
  the number of columns needed to be grouped to produce the fixed
  cardinality.
* a final group fixes the number of columns being grouped, the
  cardinality, and instead varies the number of rows in the segment.

Some initial results from my development box are as follows:

```
                        time:   [51.099 ms 51.119 ms 51.140 ms]
                        thrpt:  [39.108 Kelem/s 39.125 Kelem/s 39.140
Kelem/s]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

segment_read_group_pre_computed_groups_no_predicates_group_cols/1
                        time:   [93.162 us 93.219 us 93.280 us]
                        thrpt:  [10.720 Kelem/s 10.727 Kelem/s 10.734
Kelem/s]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_group_cols/2
                        time:   [571.72 us 572.31 us 572.98 us]
                        thrpt:  [3.4905 Kelem/s 3.4946 Kelem/s 3.4982
Kelem/s]
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe
Benchmarking
segment_read_group_pre_computed_groups_no_predicates_group_cols/3:
Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to
increase target time to 8.9s, enable flat sampling, or reduce sample
count to 50.
segment_read_group_pre_computed_groups_no_predicates_group_cols/3
                        time:   [1.7292 ms 1.7313 ms 1.7340 ms]
                        thrpt:  [1.7301 Kelem/s 1.7328 Kelem/s 1.7349
Kelem/s]
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  1 (1.00%) high severe

segment_read_group_pre_computed_groups_no_predicates_rows/250000
                        time:   [562.29 us 565.19 us 568.80 us]
                        thrpt:  [439.52 Melem/s 442.33 Melem/s 444.61
Melem/s]
Found 18 outliers among 100 measurements (18.00%)
  6 (6.00%) high mild
  12 (12.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_rows/500000
                        time:   [561.32 us 561.85 us 562.47 us]
                        thrpt:  [888.93 Melem/s 889.92 Melem/s 890.76
Melem/s]
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_rows/750000
                        time:   [573.75 us 574.27 us 574.85 us]
                        thrpt:  [1.3047 Gelem/s 1.3060 Gelem/s 1.3072
Gelem/s]
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) high mild
  8 (8.00%) high severe
segment_read_group_pre_computed_groups_no_predicates_rows/1000000
                        time:   [586.36 us 586.74 us 587.19 us]
                        thrpt:  [1.7030 Gelem/s 1.7043 Gelem/s 1.7054
Gelem/s]
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe
```
2020-12-10 15:15:34 +00:00
Edd Robinson 596e20ac92 feat: add from String implementation 2020-12-10 15:15:34 +00:00
Edd Robinson 10552eb51b refactor: create collection of ReadGroupResult type 2020-12-10 15:15:34 +00:00
Edd Robinson 8c45170a15 feat: read group aggregates on RLE columns 2020-12-10 15:15:34 +00:00
Edd Robinson 8fd211798a refactor: aggregate sum can return a Scalar 2020-12-10 15:15:34 +00:00
Edd Robinson 6d2b69d4a3 feat: add column properties
Column properties can be used to determine what abilities a column has
at runtime, which will vary depending on the encoding used.
2020-12-10 15:15:34 +00:00
Edd Robinson e4b8fb3387 refactor: use Cow for group row ids 2020-12-10 15:15:34 +00:00
Edd Robinson f7f87164b4 refactor: initial read_group skeleton 2020-12-10 15:15:34 +00:00
Edd Robinson c199d59c04 refactor: improve aggregate support 2020-12-10 15:15:34 +00:00
Edd Robinson c259a461c1 feat: extend dictionary column API
Add methods for getting distinct row ids for values and for getting
logical values.
2020-12-10 15:15:34 +00:00
Edd Robinson 254dfc14d8
refactor: apply suggestions from code review
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2020-12-03 11:47:41 +00:00
Edd Robinson 4f32778596 refactor: implement ReadFilterResults type
The `ReadFilterResults` type encapsulates results from multiple
segments. It implements `Display` to allow visualisation of results from
segments in a `select` call.
2020-12-03 11:23:12 +00:00
Edd Robinson 7ad0b4ad9a refactor: encapsulate read filter results in type
This commit also adds `Display` and `Debug` implementations for
`ReadFilterResult`. These can be used for visualising the contents of
the result of a `read_filter` call on a segment.

The former trait elides the column names.
2020-12-03 11:23:09 +00:00
Edd Robinson 381c3038aa
refactor: update segment_store/src/segment.rs
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2020-12-02 19:13:00 +00:00
Edd Robinson 4dc5cc46a9 refactor: DRY up the predicate logic 2020-12-02 17:59:45 +00:00
Edd Robinson ab83288067 refactor: segment doesn't require time range 2020-12-02 17:56:59 +00:00
Edd Robinson 9dc9e505ff refactor: add From<&str> implementation for Value 2020-11-30 13:33:56 +00:00
Edd Robinson 681b0f0660 refactor: implement From trait for Value
This commit adds a set of helper `From` trait implementations for
numerical scalar types.
2020-11-30 13:28:34 +00:00
Edd Robinson dfdf7082d9 refactor: remove non-64-bit Scalar types
Supports: #501

This commit removes scalar types that are not 64-bit, since we don't
plan to expose these datatypes outside of a column.
2020-11-30 13:14:25 +00:00
Edd Robinson ccc84de894 refactor: remove logical f32 type
Supports: #501

This commit removes the logical `f32` type.
2020-11-30 12:52:36 +00:00
Edd Robinson 8d1d653193 refactor: reduce set of supported logical types
Supports: #501

This commit removes logical integer types other than `i64` and `u64`.
2020-11-30 12:52:31 +00:00
Edd Robinson a260dc37b1 refactor: remove some lifetimes 2020-11-25 14:27:18 +00:00
Edd Robinson 0720cc36d0 refactor: address PR feedback 2020-11-17 15:41:58 +00:00
Edd Robinson 936eb16ce2
refactor: update segment_store/src/column/dictionary/plain.rs
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2020-11-17 14:55:02 +00:00
Edd Robinson 556f4dd343 refactor: tidy up API 2020-11-16 22:16:12 +00:00
Edd Robinson a2338b9348 perf: add SIMD-enabled method of matching equality predicate
This commit adds an alternative implementation of `row_ids_equal` for
the `Plain` dictionary encoding, which uses SIMD intrinsics to improve
the performance of identifying all rows in the column containing a
specified `u32` integer.

The approach is as follows. First, the integer constant of interest is
packed into a 256 bit SIMD register. Then the column is iterated over
in chunks of size 8 (thus, 256 bits at a time). The expectation is that
for a colum using this encoding it is likely most values will not match
an equality predicate, so the happy path is to compare the packed
register against each chunked register. This is done using the
`_mm256_cmpeq_epi32`[1] intrinsic, which returns a mask where each 32
bits is `0xFFFFFFFF` if the two values at that location in the register
are equal, or `0x00000000` otherwise.

Becuase the expectation is that most values don't match the id we want,
we check if all 32-bit values in this 256-bit mask register are `0`. If
the register's values are not all 0 then the register is inspected to
determine the locations where values match. The offsets of these values
are used to determine the row id to add to the result set.

On my laptop, benchmarking indicates that the SIMD implementation
increases throughput performance (finding all matching rows) by
~100%-390%.

This SIMD implementation will be automatically used if the CPU supports
avx2 instructions, otherwise the a non-SIMD implementation will be
fallen back to.

[1] https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm256_cmpeq_epi32&expand=774
```
2020-11-16 22:12:25 +00:00
Edd Robinson 25af7674ca perf: benchmark plain dictionary encoding 2020-11-16 22:12:25 +00:00
Edd Robinson d54c30147e refactor: expose public API 2020-11-16 22:12:25 +00:00
Edd Robinson fc881776dd feat: implement size and cardinality 2020-11-16 22:12:25 +00:00
Edd Robinson 43373cb650 feat: implement size on Dictionary encoding 2020-11-16 22:12:25 +00:00
Edd Robinson 1252d1b2f4 feat: wire up Plain dictionary encoder 2020-11-16 22:12:25 +00:00
Edd Robinson 94d37a9ff2 refactor: rename Column StringEncoding::RLE to RLEDictionary 2020-11-16 22:12:25 +00:00