* chore(server): add logs for dropped WAL segments
Added logging for dropped writes and old segments in rollover scenarios
Also including a dep on tracing and dev-dep on test_helpers
Refs: #466
* chore(server): Add more context to logs
Minor cleanup around remove_oldest_segment usage
Suggestions from @alamb's review
This splits the cluster package out into server and buffer modules. The WAL buffer is in-memory and split into segments. Follow on commits will implement it in the server and add persistence to object storage.
* feat: Port enough of Window and Duration to implement window_bounds
* fix: clippy
* fix: Add a few more source links
* fix: Eust --> Rust in comments :(
* fix: add comments about remainder, and add test demonstraitng behavior
* fix: Apply suggestions from code review
This commit adds an alternative implementation of `row_ids_equal` for
the `Plain` dictionary encoding, which uses SIMD intrinsics to improve
the performance of identifying all rows in the column containing a
specified `u32` integer.
The approach is as follows. First, the integer constant of interest is
packed into a 256 bit SIMD register. Then the column is iterated over
in chunks of size 8 (thus, 256 bits at a time). The expectation is that
for a colum using this encoding it is likely most values will not match
an equality predicate, so the happy path is to compare the packed
register against each chunked register. This is done using the
`_mm256_cmpeq_epi32`[1] intrinsic, which returns a mask where each 32
bits is `0xFFFFFFFF` if the two values at that location in the register
are equal, or `0x00000000` otherwise.
Becuase the expectation is that most values don't match the id we want,
we check if all 32-bit values in this 256-bit mask register are `0`. If
the register's values are not all 0 then the register is inspected to
determine the locations where values match. The offsets of these values
are used to determine the row id to add to the result set.
On my laptop, benchmarking indicates that the SIMD implementation
increases throughput performance (finding all matching rows) by
~100%-390%.
This SIMD implementation will be automatically used if the CPU supports
avx2 instructions, otherwise the a non-SIMD implementation will be
fallen back to.
[1] https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm256_cmpeq_epi32&expand=774
```
* feat: Update storage protobuf definitions, add stubs for read_window_aggregate
* refactor: Extract the features field in a clearer way
* docs: Add provenance information to service.proto