- only convert dictionary types that we really want to convert (instead
of blindly converting all types)
- handle missing / NULL columns
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: use stored sort key to deduplicate data
* refactor: verify if one is a super sort key of the other
* test: unit tests for scan and deduplication plans
* fix: typo
* refactor: refactor and add comments
* feat: cache partition sort key to read during planning as needed
* test: tests for query plans with different overlap groups
* chore: cleanup
* chore: resolve merge conflicts
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: improve `IngesterData` public interface
* feat: impl `Debug` for `Test{Namespace,Sequencer}`
* refactor: trait interface for `LifecyleHandle`
This is required to mock the lifecycle for query tests.
* refactor: trait for partitioner
Add the "http_request_limit_rejected" metric that is incremented once
each time a request is dropped due to the simultaneous request service
protection limit being exceeded.
This metric will allow us to effectively alert on router saturation, and
the increase rate/second will help inform us in how much capacity needs
adding.
This commit adds a service protection limit, allowing the router to
restrict the number of simultaneous HTTP requests to a hard-coded limit
of 200.
By dropping additional requests once the router is serving the
configured maximum, the router is prevented from accepting more and more
requests that stack up waiting on the catalog / postgres connection
pool. Currently buffering all these stalled connections consumes so much
memory the router is OOM killed. After this commit, a portion of
requests will be satisfied and the rest will be dropped early with a
HTTP 503 Service Unavailable response.
* refactor: document and improve `MockIngesterConnection`
* refactor: split `OldOneMeasurementFourChunksWithDuplicates` for `EXPLAIN` queries
* fix: mark "IngsterPartition" chunks as unsorted
* fix: "group by" queries may require sorted comparison
* refactor: re-export a few more types from querier
* fix: ensure that test parquet files are de-duped
* test: chunks in ingester stage
* docs: explain test code
* refactor: grouping overlaps is now use the same overlap function in both compactor and deduplication
* chore: commit missing file
* chore: address review comments
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: Expose data generation tool for wider use
* feat: Add a step for retrieving the server metrics
* refactor: Copy the OG end-to-end metrics test to NG
* feat: Rewrite the NG end-to-end metrics test
This is still broken because the the row timestamp metrics don't exist
in NG.
* fix: Test metrics relevant to NG
* refactor: Move the data generator to the test helper crate
* refactor: Extract a ReadFilter request builder into the test helper crate
* refactor: Make test helper request builder able to build other gRPC requests
Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
* feat: Copy end-to-end logging test to NG
This was created with:
cp influxdb_iox/tests/end_to_end_cases/influxdb_ioxd.rs influxdb_iox/tests/end_to_end_ng_cases/logging.rs
* feat: Port logging test to NG end-to-end tests
And re-enable it, it was ignored.
* fix: Specify that an in-memory catalog should be used for the logging test
* fix: Check for gRPC instead of HTTP
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This commit allows the LP consumer to correctly handle line protocol
writes that duplicate one or more fields or tags within a single line:
table v=2,bananas=42,v=3,platanos=24
▲ ▲
└───────┬──────┘
│
duplicate field "v"
This change implements the following logic when processing fields:
IF field is duplicated
IF all duplicate field values are of the same data type
use last occurrence ("last write wins")
ELSE
return an error
Any duplication of tags is rejected, and use of a field name as both a
field and a tag is remains forbidden (unchanged in this PR, a previously
agreed breaking change from TSM).
See https://github.com/influxdata/influxdb_iox/issues/4326 for context.
* refactor: use `info!` rather `println!` in end to end tests
* chore: change from println to info in end_to_end_ng_cases too
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* fix: column summary conversion for "unknown" TS
Both IOx and DataFusion have the same data model for min/max statistics:
`Option<Option<i64>>` (or any other inner type)
The interpretation is:
1. **`None`:** Value unknown.
2. **`Some(None)`:** Value known to be NULL.
3. **`Some(Some(x))`:** Value known and non NULL.
The bug was that during the conversion from the IOx statistics type to
the DataFusion statistics type for timestamps, case 1 was converted into
case 2.
Up until now this didn't make a difference between timestamps were
basically known all the time, but during the development of NG there are
cases where the timestamps are unknown (this might change, but the query
engine should be correct w/o assuming that).
* docs: explain test
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>