Commit Graph

100 Commits (196c589ef64f73677eb3e89e60b219f862bde19a)

Author SHA1 Message Date
Dom Dwyer 7d0e3637ed
perf(ingester): projection pushdown to data source
Prior to this change projection pushdown was implemented as a filter,
which meant a query using it would take the following steps:

    * Query arrives
    * Find necessary partition data
    * Copy all the partition data into a RecordBatch
    * Filter that RecordBatch to apply the projection
    * Return results to caller

This is far from ideal, as the underlying partition data is copied in
its entirety and then the unneeded columns discarded - a pure waste!

After this PR, the projection is pushed down to the point of RecordBatch
generation:

    * Query arrives
    * Find necessary partition data
    * Copy only the projected columns to a RecordBatch
    * Return results to the caller

This minimises the amount of data copying, which for large amounts of
data should lead to a meaningful performance improvement when querying
for a subset of columns. It also uses a slightly more efficient
projection implementation by using a single pass over the columns (still
O(n) but less constant overhead).
2023-07-05 13:44:11 +02:00
Dom Dwyer a17bd3bded
refactor: don't Arc-wrap RecordBatch instances
RecordBatch are internally ref-counted, so don't Arc wrap them again.
2023-07-05 13:44:09 +02:00
dependabot[bot] 990044dcb2
chore(deps): Bump indexmap from 1.9.3 to 2.0.0 (#8073)
* chore(deps): Bump indexmap from 1.9.3 to 2.0.0

Bumps [indexmap](https://github.com/bluss/indexmap) from 1.9.3 to 2.0.0.
- [Changelog](https://github.com/bluss/indexmap/blob/master/RELEASES.md)
- [Commits](https://github.com/bluss/indexmap/compare/1.9.3...2.0.0)

---
updated-dependencies:
- dependency-name: indexmap
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-26 08:52:51 +00:00
Dom Dwyer 928a4d163e
build: remove unused dependencies from crates
This commit fixes loads of crates (47!) had unused dependencies, or
mis-configured dependencies (test deps as normal deps).

I added the "unused_crate_dependencies" to all crates to help prevent
this mess from growing again!

    https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html

This has the minor downside of false-positives when specifying
dev-dependencies for test/bench binaries - these are files in /test or
/benches (not normal tests). This commit includes a workaround,
importing them in lib.rs (gated by a feature flag). I think the
trade-off of better dependency management is worth it!
2023-05-23 14:55:43 +02:00
Dom Dwyer 7bddba8eae
ci: add missing lints to schema
This crate was missing some of the common lints we use everywhere else.
2023-05-23 14:55:40 +02:00
kayagokalp 81eb663122 refactor: accept impl Into<String> for schema methods 2023-05-11 01:44:14 +03:00
Stuart Carnie 19c5aa9ac8
feat: Add API to find the field type by a name 2023-04-27 08:05:42 +10:00
Dom Dwyer c5bb88e173
chore: remove unused dependencies
Some crates import dependencies they never use.
2023-04-18 12:07:13 +02:00
Andrew Lamb f46d06d56f
chore: Update DataFusion + arrow ecosystem to 37 (#7544)
* chore: Update datafusion and arrow/parquet to 37, tonic to 0.9.1

* refactor: Update for FieldRef and other API changes

* fix: Update field size calculation

* fix: Use `NullBuffer` directly

* fix: remove outdated comment

* chore: Update test for tonic

* chore: Run cargo hakari tasks

* chore: cargo update

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-14 12:43:01 +00:00
Stuart Carnie 08ef689d21
feat: Teach InfluxQL how to plan an aggregate query (#7230)
* feat: Display failed query

Allows a user to immediately identify the failed query.

* feat: API improvements to InfluxQL parser

* feat: Extend `SchemaProvider` trait to query for UDFs

* fix: We don't want the parser to panic on overflows

* fix: ensure `map_type` maps the timestamp data type

* feat: API to map a InfluxQL duration expression to a DataFusion interval

* chore: Copied APIs from DataFusion SQL planner

These APIs are private but useful for InfluxQL planning.

* feat: Initial aggregate query support

* feat: Add an API to fetch a field by name

* chore: Fixes to handling NULLs in aggregates

* chore: Add ability to test expected failures for InfluxQL

* chore: appease rustfmt and clippy 😬

* chore: produce same error as InfluxQL

* chore: appease clippy

* chore: Improve docs

* chore: Simplify aggregate and raw planning

* feat: Add support for GROUP BY TIME(stride, offset)

* chore: Update docs

* chore: remove redundant `is_empty` check

Co-authored-by: Christopher M. Wolff <chris.wolff@influxdata.com>

* chore: PR feedback to clarify purpose of function

* chore: The series_sort can't be empty, as `time` is always added

This was originally intended as an optimisation when executing an
aggregate query that did not group by time or tags, as it will produce
N rows, where N is the number of measurements queried.

* chore: update comment for clarity

---------

Co-authored-by: Christopher M. Wolff <chris.wolff@influxdata.com>
2023-03-23 01:13:15 +00:00
Stuart Carnie 2b74f07fe5
feat: Support `GROUP BY` with tags in raw `SELECT` queries (#7109)
* chore: Normalise name of Call expression to lowercase

Simplifies matching functions in planner, as they are guaranteed to be
lowercase.

This also ensures compatibility with InfluxQL when generating column
alias names, which are reflected in updated tests.

* chore: Ensure aggregate functions fail gracefully.

* feat: GROUP BY tag support

* feat: Ensure schema-level metadata is propagated

Requires: https://github.com/apache/arrow-rs/issues/3779

* chore: Add some tests to validate GROUP BY output

* chore: Add clarifying comment

* chore: Declare message in flight.proto

The metadata is public API, so best practice is to encode this in a way
that is most compatible for clients in other languages, and will also
document the history of schema changes.

Added tests to validate the metadata is encoded correctly.

* chore: Placate linters

* chore: Use correct column in test cases

* chore: Add `is_projected` to the TagKeyColumn message

`is_projected` is necessary to inform a client whether it should include
the tag key is used exclusively for the group key (false) or also
projected in the `SELECT` column list.

* refactor: Move constants to `schema` crate per PR feedback

* chore: rustfmt 🙄

* chore: Update docs for InfluxQlMetadata

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

---------

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2023-03-07 22:40:23 +00:00
Marco Neumann 999a5dae03
refactor: sort key cleanups (#7113)
* refactor: remove unused `ColumnSort`

* refactor: remove invalid assertion

It is true that time SHOULD be the last sort key, but we absoletely
don't require that, esp. not in the query tier. The ingester will
currently always produce sort keys where time is last, but if we ever
going to deal w/ external data sources like bulk loaded parquet files,
this may not always be the case.

Found while constructing some edge case tests.

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-02 16:08:21 +00:00
Carol (Nichols || Goulding) faae5eb438 chore: Rerun cargo hakari manage-deps 2023-02-27 11:56:15 +01:00
Raphael Taylor-Davies d3601a59f8
chore: update DataFusion, upgrade `arrow` `arrow-flight` and `parquet` to `32.0.0` (#6756)
* chore: update DataFusion

* fix: test

* chore: format

* chore: clippy

* chore: update arrow

* chore: arrow upgrade fallout

* chore: Run cargo hakari tasks

* chore: remove failing warm compaction test

* fix: flight error propagation

* chore: update parquet size

* fix: Update error message

* chore: Update parquet metadata test

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-06 11:35:39 +00:00
Carol (Nichols || Goulding) 30fea67701
fix: Move variables within format strings. Thanks clippy!
Changes made automatically using `cargo clippy --fix`.
2023-02-03 13:06:17 -05:00
kodiakhq[bot] 9a0db0ec30
Merge branch 'main' into alamb-patch-1 2023-01-06 11:16:52 +00:00
Andrew Lamb f0141bcb41 fix: fmt 2023-01-06 06:13:40 -05:00
Andrew Lamb 544f5c5bff
fix: Update schema/src/lib.rs
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
2023-01-06 06:13:04 -05:00
Raphael Taylor-Davies e1036a0c63
refactor: cleanup schema boxing (#6511)
* refactor: cleanup Schema boxing

* chore: clippy
2023-01-06 10:57:39 +00:00
Andrew Lamb ba844ddf03
docs: Update docstring on Schema 2023-01-06 05:49:49 -05:00
Carol (Nichols || Goulding) 46ff8854ec
fix: Use code backticks around invalid HTML tags in doc strings 2022-12-21 16:36:17 -05:00
Dom Dwyer e76b107332
feat(ingester2): persist back-pressure
This commit causes an ingester2 instance to stop accepting new writes
when at least one persist queue is full. Writes continue to be rejected
until the persist workers have processed enough outstanding persist
tasks to drain the queues to half of their capacity, at which point
writes are accepted again.

When a write is rejected, the ingester returns a "resource exhausted"
RPC code to the caller.

Checking if the system is in a healthy state for writes is extremely
cheap, as it is on the hot path for all writes.
2022-12-14 17:17:17 +01:00
Andrew Lamb 9175f4a0b5
chore: Upgrade datafusion to get correct support for multi-part identifiers (#6349)
* test: add tests for periods in measurement names

* chore: Update Datafusion

* chore: Update for changed APIs

* chore: Update expected plan output

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-08 11:27:13 +00:00
Andrew Lamb 14a9bc92e9
Revert "Revert "chore: Update Datafusion and arrow/arrow-flight/parquet to `28.0.0` (#6279)" (#6294)" (#6296)
This reverts commit b7e52c0d8d.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-01 14:20:43 +00:00
Andrew Lamb b7e52c0d8d
Revert "chore: Update Datafusion and arrow/arrow-flight/parquet to `28.0.0` (#6279)" (#6294)
This reverts commit 039a45ddd1.
2022-12-01 11:38:42 +00:00
Andrew Lamb 039a45ddd1
chore: Update Datafusion and arrow/arrow-flight/parquet to `28.0.0` (#6279)
* chore: Update Datafusion and arrow/arrow-flight/parquet to `28.0.0`

* chore: Update thrift to 0.17

* fix: use workspace arrow-flight in ingester2

* chore: Update for API changes

* fix: test

* chore: Update hakari

* chore: Update hakari again

* chore: Update trace_exporters to latest thrift

* fix: update test

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-30 14:12:30 +00:00
kodiakhq[bot] 05d7d1495e
Merge branch 'main' into dependabot/cargo/hashbrown-0.13.1 2022-11-11 21:26:40 +00:00
Carol (Nichols || Goulding) bdff4e8848
fix: Consistently use 'namespace' instead of 'database' in comments and other internal text 2022-11-11 15:46:04 -05:00
Jake Goulding cc17e5a54b refactor: use a workspace dependency for hashbrown 2022-11-11 13:25:39 -05:00
dependabot[bot] 5024523f00 chore(deps): Bump hashbrown from 0.12.3 to 0.13.1
Bumps [hashbrown](https://github.com/rust-lang/hashbrown) from 0.12.3 to 0.13.1.
- [Release notes](https://github.com/rust-lang/hashbrown/releases)
- [Changelog](https://github.com/rust-lang/hashbrown/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/hashbrown/compare/v0.12.3...v0.13.1)

---
updated-dependencies:
- dependency-name: hashbrown
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-11-11 13:24:56 -05:00
Andrew Lamb 4fb2843d05
refactor: Rename `schema::selection::Selection` to `schema::projection::Projection` (#6037)
* chore: Rename `schema::selection::Selection` to `schema::projection::Projection`

* fix: docs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-02 18:15:04 +00:00
Marco Neumann 6369d88633
refactor: enforce name of the one-and-only time column (#5982)
* refactor: enforce name of the one-and-only time column

We currently only support a single time dimension and some parts of
other stack rely on the name of the time column. So lets enforce the
name (note that `schema::try_from_arrow` already checks for duplicate
column, so we are now left with a single dimension).

* refactor: mark a few errors as "internal"

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-27 12:42:49 +00:00
Carol (Nichols || Goulding) 3145e2c05b
feat: Use workspace dep inheritance for the arrow crate 2022-10-26 10:34:29 -04:00
Marco Neumann 9b48437711
refactor: make influx column type mandatory (#5978)
We basically assume everywhere that a column falls into one of the three
known categories (time, tag, field), so lets encode this in our type
system instead of defining "unknown" as "undefined behavior, may or may
not crash".

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-26 11:20:29 +00:00
Marco Neumann 5fd98ab483
refactor: enforce nullability for (known) IOx data types (#5972)
We don't support non-null tags, non-null fields, or nullable timestamps. Let's
just remove this from `schema` so that this never happens on accident.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-25 14:42:50 +00:00
Carol (Nichols || Goulding) 2e83e04eab
feat: Use workspace package metadata to reduce differences and repetition 2022-10-24 13:04:09 -04:00
Marco Neumann 3e4db81bc6 refactor: make `SchemaBuilder::field` fallible
It would be nice if the IOx data type would not be optional and this is
a prep clean-up to achieve that.
2022-10-24 18:12:42 +02:00
Andrew Lamb d706f8221d
chore: Update datafusion and arrow / parquet / arrow-flight 25.0.0 (#5900)
* chore: Update datafusion and  `arrow` / `parquet` / `arrow-flight` 25.0.0

* chore: Update for structure changes

* chore: Update for new projection pushdown

* chore: Run cargo hakari tasks

* fix: fmt

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-18 20:58:47 +00:00
Andrew Lamb d57c99638c
chore: Update datafusion + `arrow`, `arrow-flight`, and `parquet` to 24.0.0.0 (#5792)
* chore: Update datafusion + `arrow`, `arrow-flight`, and `parquet` to 24.0.0.0

* fix: Update for coercion, fix explain plans for change in column name display

* chore: Update datafusion lock

* fix: Update for other API changes

* chore: Update to latest datafusion pin

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-12 16:19:14 +00:00
Andrew Lamb 66dbb9541f
chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to 23.0.0, `thrift` to 0.16.0 (#5694)
* chore: Update datafusion and `arrow`/`parquet`/`arrow-flight`  to 23.0.0

* chore: Update thrift / remove parquet_format

* fix: Update APIs

* chore: Update lock + Run cargo hakari tasks

* fix: use patched version of arrow-rs to work around https://github.com/apache/arrow-rs/issues/2779

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-27 12:50:54 +00:00
Nga Tran 84b10b28b2
feat: send only needed projection columns from querier to ingester in… (#5678)
* feat: send only needed projection columns from querier to ingester in case of normal SQL queries

* refactor: push column index down until we need to convert them strings

* fix: make the test deterministic

* test: test for the projection pushdown

* test: add asserts for the proj pushdown test

* test: implement projection pushdown for partitions of MockIngesterConnection

* chore: cleanup

* chore: address review comments

* chore: Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refactor: address review comments

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-26 17:19:20 +00:00
dependabot[bot] ea1e822e3b
chore(deps): Bump itertools from 0.10.4 to 0.10.5 (#5707)
Bumps [itertools](https://github.com/rust-itertools/itertools) from 0.10.4 to 0.10.5.
- [Release notes](https://github.com/rust-itertools/itertools/releases)
- [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-itertools/itertools/commits)

---
updated-dependencies:
- dependency-name: itertools
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-21 08:15:59 +00:00
Nga Tran 7c4c918636
chore: add parttion id into panic message (#5641)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-15 02:21:13 +00:00
dependabot[bot] 7e1f013346
chore(deps): Bump itertools from 0.10.3 to 0.10.4 (#5631)
Bumps [itertools](https://github.com/rust-itertools/itertools) from 0.10.3 to 0.10.4.
- [Release notes](https://github.com/rust-itertools/itertools/releases)
- [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-itertools/itertools/compare/v0.10.3...v0.10.4)

---
updated-dependencies:
- dependency-name: itertools
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-14 14:02:14 +00:00
Andrew Lamb 1fd31ee3bf
chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0 (#5591)
* chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0

* fix: enable dynamic comparison flag

* chore: derive Eq for clippy

* chore: update explain plans

* chore: Update sizes for ReadBuffer encoding

* chore: update more tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-12 17:45:03 +00:00
YIXIAO SHI 52ae60bf2e
chore: fix comment typo (#5551)
Co-authored-by: Dom <dom@itsallbroken.com>
2022-09-07 08:49:29 +00:00
Andrew Lamb 6669d85fb4
chore: Update datafusion + arrow/parquet to `21.0.0` (#5519)
* chore: Update arrow/arrow-flight/parquet to 21.0.0

* chore: Update datafusion pin

* chore: Fix arrow update script

* chore: Update Cargo.lock

* chore: Update for new API
2022-08-31 13:30:47 +00:00
Luke Bond 10fee5535a
feat: import schema updates iox catalog (#5385)
* feat: import schema updates iox catalog

- renamed import/schema module to aggregate_tsm_schema to not conflic
  with schema crate
- fetch schema from iox catalog, and validate/merge/create as needed

chore: add catalog dsn config to import schema command
chore: import schema command connects to catalog
chore: import schema merge validation errors return non-zero code
chore: simplified and tidies import update catalog code

chore: tests and refactoring of import schema catalog update

* chore: require retention on ns creation in import

* chore: fixed bad test in import schema validation

* chore: friendlier errors & more tests in import schema catalog update
2022-08-16 11:05:27 +00:00
Marco Neumann 90fec1365f
feat: intern schemas during query planning (#5215)
* feat: intern schemas during query planning

Helps with #5202.

* refactor: `SchemaMerger::build` shall return an `Arc`

* feat: `SchemaMerger::with_interner`

* refactor: hash-based schema interning
2022-08-11 12:28:51 +00:00
Andrew Lamb ce3e2c3a15
chore: make terminology in iox_query::Provider consistent (remove super notation) (#5349)
* chore: make terminology in iox_query::Provider consistent (remove super notation)

* refactor: be more specific about *which* sort key is meant

* refactor: rename another sort_key --> output_sort_key

* refactor: rename additional sort_key to output_sort_key

* refactor: rename sort_key --> chunk_sort_key

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-10 10:59:47 +00:00