influxdb/ingester/tests
Marco Neumann 6729b5681a
fix(ingester): re-transmit schema over flight if it changes (#7812)
* fix(ingester): re-transmit schema over flight if it changes

Fixes https://github.com/influxdata/idpe/issues/17408 .

So a `[Sendable]RecordBatchStream` contains `RecordBatch`es of the SAME
schema. When the ingester crafts a response for a specific partition,
this is also almost always the case however when there's a persist job
running (I think) it may have multiple snapshots for a partition. These
snapshots may have different schemas (since the ingester only creates
columns if the contain any data). Now the current implementation munches
all these snapshots into a single stream, and hands them over to arrow
flight which has a high-perf encode routine (i.e. it does not re-check
every single schema) so it sends the schema once and then sends the data
for every batch (the data only, schema data is NOT repeated). On the
receiver side (= querier) we decode that data and get confused why on
earth some batches have a different column count compared to the schema.

For the OG ingester I carefully crafted the response to ensure that we
do not run into this problem, but apparently a number of rewrites and
refactors broke that. So here is the fix:

- remove the stream that isn't really as stream (and cannot error)
- for each partition go over the `RecordBatch`es and chunk them
  according to the schema (because this check is likely cheaper than
  re-transmitting the schema for every `RecordBatch`)
- adjust a bunch of testing code to cope with this

* refactor: nicify code

* test: adjust test
2023-05-23 14:27:11 +00:00
..
write.rs fix(ingester): re-transmit schema over flight if it changes (#7812) 2023-05-23 14:27:11 +00:00