influxdb

Commit Graph

Author	SHA1	Message	Date
Dom Dwyer	b79b120788	refactor: per-partition summary statistics Provide row count & timestamp min/max statistics on a per-partition basis. This commit builds on the FSM summary statistics, merging all FSM statistics across all data within the PartitionData (in various states) and making them available to the caller.	2023-07-25 14:44:38 +02:00
Carol (Nichols \|\| Goulding)	c2606ff3ac	test: Add and use methods creating arbitrary TransitionPartitionId and PartitionHashIds	2023-07-17 09:56:55 -04:00
Fraser Savage	a2ca5ca17c	Merge branch 'main' into savage/hook-up-wal-reference-counter-actor	2023-07-17 10:49:45 +01:00
Dom Dwyer	787f9a57dc	test: projection with and without "time" Add test cases driving projection logic for BufferTree queries with, and without the "time" column.	2023-07-13 14:42:52 +02:00
Dom Dwyer	7d0e3637ed	perf(ingester): projection pushdown to data source Prior to this change projection pushdown was implemented as a filter, which meant a query using it would take the following steps: * Query arrives * Find necessary partition data * Copy all the partition data into a RecordBatch * Filter that RecordBatch to apply the projection * Return results to caller This is far from ideal, as the underlying partition data is copied in its entirety and then the unneeded columns discarded - a pure waste! After this PR, the projection is pushed down to the point of RecordBatch generation: * Query arrives * Find necessary partition data * Copy only the projected columns to a RecordBatch * Return results to the caller This minimises the amount of data copying, which for large amounts of data should lead to a meaningful performance improvement when querying for a subset of columns. It also uses a slightly more efficient projection implementation by using a single pass over the columns (still O(n) but less constant overhead).	2023-07-05 13:44:11 +02:00
Fraser Savage	246c2b0749	refactor(ingester): Accept a predicate as parameter to `query_exec` This will allow the ingester to apply a predicate when serving a query and only stream back data that satisfies the predicate.	2023-07-03 17:24:56 +02:00
Carol (Nichols \|\| Goulding)	afcd2d859d	refactor: Use test constants in more places So that when I change the type of PartitionIds to TransitionPartitionId, I don't have to update all these places that just need an arbitrary partition ID or related values. These test constants probably didn't exist when these tests were created.	2023-06-26 17:25:14 -04:00
Carol (Nichols \|\| Goulding)	d991e12fbb	feat: Send PartitionHashId from ingesters to queriers	2023-06-22 09:01:22 -04:00
Marco Neumann	6729b5681a	fix(ingester): re-transmit schema over flight if it changes (#7812 ) * fix(ingester): re-transmit schema over flight if it changes Fixes https://github.com/influxdata/idpe/issues/17408 . So a `[Sendable]RecordBatchStream` contains `RecordBatch`es of the SAME schema. When the ingester crafts a response for a specific partition, this is also almost always the case however when there's a persist job running (I think) it may have multiple snapshots for a partition. These snapshots may have different schemas (since the ingester only creates columns if the contain any data). Now the current implementation munches all these snapshots into a single stream, and hands them over to arrow flight which has a high-perf encode routine (i.e. it does not re-check every single schema) so it sends the schema once and then sends the data for every batch (the data only, schema data is NOT repeated). On the receiver side (= querier) we decode that data and get confused why on earth some batches have a different column count compared to the schema. For the OG ingester I carefully crafted the response to ensure that we do not run into this problem, but apparently a number of rewrites and refactors broke that. So here is the fix: - remove the stream that isn't really as stream (and cannot error) - for each partition go over the `RecordBatch`es and chunk them according to the schema (because this check is likely cheaper than re-transmitting the schema for every `RecordBatch`) - adjust a bunch of testing code to cope with this * refactor: nicify code * test: adjust test	2023-05-23 14:27:11 +00:00
Carol (Nichols \|\| Goulding)	56916cf942	fix: Rename ingester2 to ingester	2023-05-08 12:03:05 -04:00

10 Commits (6246275c4adb30ebacc21eb1bb38f92facb42260)