Commit Graph

65 Commits (7202dddab6d9ede46c74664c0675fe349da2fd13)

Author SHA1 Message Date
dependabot[bot] 2277fcf08a
chore(deps): Bump serde_json from 1.0.85 to 1.0.86
Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.85 to 1.0.86.
- [Release notes](https://github.com/serde-rs/json/releases)
- [Commits](https://github.com/serde-rs/json/compare/v1.0.85...v1.0.86)

---
updated-dependencies:
- dependency-name: serde_json
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-10 01:42:37 +00:00
Andrew Lamb 8013781ac2
feat: rewrite missing column references to NULL (#5818)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-07 18:05:54 +00:00
Marco Neumann c4c83e0840
fix: query error propagation (#5801)
- treat OOM protection as "resource exhausted"
- use `DataFusionError` in more places instead of opaque `Box<dyn Error>`
- improve conversion from/into `DataFusionError` to preserve more
  semantics

Overall, this improves our error handling. DF can now return errors like
"resource exhausted" and gRPC should now automatically generate a
sensible status code for it.

Fixes #5799.
2022-10-06 08:54:01 +00:00
Andrew Lamb 56a1c579a1
refactor: Change influxdb_iox client to use http rather than grpc for write (#5756)
* refactor: Change influxdb_iox client to use http rather than grpc for write

* refactor: remove custom variants

* refactor: consolidate more
2022-09-29 11:12:51 +00:00
Andrew Lamb 66dbb9541f
chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to 23.0.0, `thrift` to 0.16.0 (#5694)
* chore: Update datafusion and `arrow`/`parquet`/`arrow-flight`  to 23.0.0

* chore: Update thrift / remove parquet_format

* fix: Update APIs

* chore: Update lock + Run cargo hakari tasks

* fix: use patched version of arrow-rs to work around https://github.com/apache/arrow-rs/issues/2779

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-27 12:50:54 +00:00
Marco Neumann 7e00426d49
refactor: concurrent table scan for "tag values" (#5671)
Ref #5668.
2022-09-19 14:11:51 +00:00
Andrew Lamb 1fd31ee3bf
chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0 (#5591)
* chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0

* fix: enable dynamic comparison flag

* chore: derive Eq for clippy

* chore: update explain plans

* chore: Update sizes for ReadBuffer encoding

* chore: update more tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-12 17:45:03 +00:00
Marco Neumann 8933f47ec1
refactor: make `QueryChunk::partition_id` non-optional (#5614)
In our data model, a chunk always belongs to a partition[^1], so let's
not make this attribute optional. The optional value only leads to
-- mostly surprising -- conditional behavior, ranging from "do not equalize
the partition sort key" (querier) to "always consider the chunk overlapping"
(iox_query when dealing with ingester chunks).

[^1]: This is even true when the chunk belongs to a parquet file that is not
      yet added to the catalog, contrary to what a comment in the ingester
      stated. The catalog and data model used by the querier are two totally
      different things.
2022-09-12 13:52:51 +00:00
Marco Neumann b676049358
fix: apply selection in `TestChunk::read_filter` (#5613)
* fix: apply selection in `TestChunk::read_filter`

TBH I have no idea how this worked so well before, but the chunks are
expected to apply the given selection. This is because
`IOxReadFilterNode::execute` will wrap the `QueryChunk::read_filter`
output into a `SchemaAdapterStream` and this one expects that there are
no input columns that are absent in the output schema (i.e. it will only
add null columns, it won't remove any). Funnily the `SchemaAdapterStream`
error will blame DataFusion for the mess.

* test: make `test_storage_rpc_tag_values_grouped_by_measurement_and_tag_key` a bit harder

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-12 13:10:37 +00:00
Marco Neumann adeacf416c
ci: fix (#5569)
* ci: use same feature set in `build_dev` and `build_release`

* ci: also enable unstable tokio for `build_dev`

* chore: update tokio to 1.21 (to fix console-subscriber 0.1.8

* fix: "must use"
2022-09-06 14:13:28 +00:00
Andrew Lamb 6669d85fb4
chore: Update datafusion + arrow/parquet to `21.0.0` (#5519)
* chore: Update arrow/arrow-flight/parquet to 21.0.0

* chore: Update datafusion pin

* chore: Fix arrow update script

* chore: Update Cargo.lock

* chore: Update for new API
2022-08-31 13:30:47 +00:00
Carol (Nichols || Goulding) b982bdaf2f
fix: Derive Eq when we derive PartialEq and members can derive Eq
Allow this in generated code that we don't control, though.

Recommended by clippy now. https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq
2022-08-11 15:04:06 -04:00
Andrew Lamb 16ddc5efc6
chore: Update datafusion / arrow/parquet/arrow-flight and prost/tonic ecosystem (#5360)
* chore: Update datafusion and arrow

* chore: Update Cargo.lock

* chore: update to Decimal128

* chore: Update tonic/prost/pbjson/etc

* chore: Run cargo hakari tasks

* fix: doctest in generated types

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-08-09 17:30:44 +00:00
Andrew Lamb e0ea335b70
fix: Support RegExMatch and RegExNotMatch predicates on `_field` (#5301)
* test: add tests for regex_match_on_field

* feat: more general `_field` predicate handling

* fix: remove old comment

* fix: update tests

* fix: improve test a little more

* fix: fmt

* fix: Update predicate/src/rpc_predicate/field_rewrite.rs

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>

* fix: Handle predicates that can not be evaluated

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-04 19:42:16 +00:00
dependabot[bot] e8231b2986
chore(deps): Bump serde_json from 1.0.82 to 1.0.83 (#5297)
* chore(deps): Bump serde_json from 1.0.82 to 1.0.83

Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.82 to 1.0.83.
- [Release notes](https://github.com/serde-rs/json/releases)
- [Commits](https://github.com/serde-rs/json/compare/v1.0.82...v1.0.83)

---
updated-dependencies:
- dependency-name: serde_json
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-04 14:28:29 +00:00
Marco Neumann 87bdabb38a
feat: log external span for query gRPC requests (#5187)
* feat: log external span for query gRPC requests

This should simplify the correlation with our binlog data.

* refactor: address review comments

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-28 12:53:12 +00:00
Sam Arnold 3fbe860bb9
fix: interpret [MIN_NANO_TIME, MAX_NANO_TIME) range as all time for optimization (#5231)
InfluxQL queries can send (technically incorrect) ranges like this, meaning all time
but excluding the max nanosecond time.

Since this is an important case, we should handle it specially and use the optimized
'all time' handling for meta queries even though this is technically wrong in that
it does not filter out column names / measurement names at MAX_NANO_TIME exactly.

Closes: https://github.com/influxdata/conductor/issues/1072

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-28 12:24:26 +00:00
Andrew Lamb 9215a534d0
chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0` (#5229)
* chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0`

* chore: Run cargo hakari tasks

* fix: Update for API changes

* fix: clippy

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-28 08:10:47 +00:00
Andrew Lamb fbf672015e
refactor: Reduce ceremony requried to create a `Span` from `SpanContext` (#5181)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-22 11:19:38 +00:00
Marko Mikulicic 5a0af921c8
chore: Roll forward: Sync ReadWindowAggregate API: TagKeyMetaNames (#5186)
This reverts commit 5d02c755687ef041f5f45dbfc3e633a833284edb.
2022-07-22 10:44:06 +00:00
Marco Neumann baee020efe
refactor: improve query gRPC logging (#5185)
- ensure that logging is done BEFORE the DB/namespace is requested (i.e.
  any actual work is done) but AFTER the query semaphore is acquired
- simplify tag key decoding (so that the logging statements are simpler
  to write)
2022-07-22 10:30:45 +00:00
Marko Mikulicic 07cdb99192
chore: Revert "Sync ReadWindowAggregate API: TagKeyMetaNames" (#5184)
We're noticing a possible regression (OOMs) in our testing cluster that roughly correlates with this.
2022-07-22 09:26:42 +00:00
Marco Neumann 0f54281d24 feat: trace namespace cache
For #5129.
2022-07-21 16:10:06 +02:00
Marko Mikulicic 21d033eafd fix: Sync ReadWindowAggregate API: TagKeyMetaNames
The storage API has been updated in https://github.com/influxdata/idpe/pull/12868
in January, but since we forked the `.proto` files we never noticed.
2022-07-21 15:07:04 +02:00
Marko Mikulicic c20288f60e
fix: Add TagKeyMetaNamesCapability capability (#5160)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-20 10:52:40 +00:00
Marko Mikulicic b8236e2b9d
fix: Fix SeriesKey sort order for special _measurement and _field (#5150)
* fix: Fix SeriesKey sort order for special _measurement and _field

* fix: Update expected test output

* fix: Update more tests

* fix: Re-sort tag key when using binary encoding

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-07-20 08:45:17 +00:00
Andrew Lamb e2d871b00b
chore: Update datafusion and arrow/parquet/arrow-flight to `18.0.0` (#5079)
* chore: Update datafusion to 10.0.0, arrow/parquet/arrow-flight to 18

* chore: Run cargo hakari tasks

* fix: update cargo pin

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-18 15:01:03 +00:00
Marco Neumann f0bd278652
feat: add tracing to instrumented semaphores (#5130)
This will allow us to easily see how much time we spend during query
processing waiting for the query semaphore.

Ref #5129.
2022-07-15 07:50:28 +00:00
dependabot[bot] 9b67de2f43
chore(deps): Bump tokio from 1.19.2 to 1.20.0
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.19.2 to 1.20.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.19.2...tokio-1.20.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-14 01:21:43 +00:00
Andrew Lamb c46e1c6347
chore: Update datafusion + arrow/parquet/arrow-flight to `17.0.0` (#5021)
* fix: correct nullability declaration of system tables

* chore: Update datafusion and arrow/parquet/arrow-flight

* chore: Run cargo hakari tasks

* fix: Update tests

* fix: Update tests

* fix: predicate pruning

* fix: add some tests

* fix: query_functions

* fix: fix read_buffer test

* fix: fix clippy

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-07 19:22:15 +00:00
Sam Arnold e193913ed3
fix: optimize field columns for all-time predicates (#5046)
* fix: optimize field columns for all-time predicates

Also fix timestamp range to allow selecting points at MAX_NANO_TIME

* fix: clamp end to MIN_NANO_TIME for safety

* refactor: add contains_all method to TimestampRange
2022-07-06 12:01:28 +00:00
dependabot[bot] 2b527bbf64
chore(deps): Bump regex from 1.5.6 to 1.6.0 (#5048)
Bumps [regex](https://github.com/rust-lang/regex) from 1.5.6 to 1.6.0.
- [Release notes](https://github.com/rust-lang/regex/releases)
- [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/regex/compare/1.5.6...1.6.0)

---
updated-dependencies:
- dependency-name: regex
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-06 10:25:28 +00:00
Sam Arnold 03f456d8fd
fix: optimize tag_keys to go only to schema when predicate is empty (#4985)
* docs: fix comment

* test: add test for delete behaviour

* fix: tag_keys optimization for empty predicate

Also need to eliminate 'true' predicates from simplified predicate so
is_empty works correctly.

* refactor: use lit instead of spelling out literal true

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-05 12:45:25 +00:00
dependabot[bot] 40a8525520
chore(deps): Bump serde_json from 1.0.81 to 1.0.82 (#4992)
Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.81 to 1.0.82.
- [Release notes](https://github.com/serde-rs/json/releases)
- [Commits](https://github.com/serde-rs/json/compare/v1.0.81...v1.0.82)

---
updated-dependencies:
- dependency-name: serde_json
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-30 09:54:08 +00:00
Andrew Lamb e91d00b10c
chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `16.0.0 (#4851)
* chore: TEMP Update DataFusion to pre-release

* chore: update arrow et al to 16.0.0

* chore: Run cargo hakari tasks

* fix: update reader read_dictionary API

* chore: Update to real Datafusion release

* fix: Update parquet API

* fix: update test

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-06-14 16:31:40 +00:00
Marco Neumann 66623fe0cd
feat: expose query semaphore metrics (#4836)
The groundwork for that was already done, just needed a bit of wiring.
This might help us to judge timeouts.
2022-06-13 09:36:50 +00:00
Andrew Lamb 2ec7764fdd
refactor: rename builder like predicate methods to be `with_` (#4808)
* refactor: rename builder like predicate methods to be `with_`

* fix: merge conflict

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-09 11:26:03 +00:00
Andrew Lamb afc1c12062
refactor: consolidate `PredicateBuilder` into `Predicate` (#4799)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-08 12:21:24 +00:00
Andrew Lamb 8e96a2721d
chore: Update datafusion (again) (#4788)
* chore: Update datafusion

* chore: Update imports

* refactor: update API usage

* refactor: clean up some uses of binary_expr

* fix: remove unused export

* fix: update explain output

* chore: update more explain tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-07 08:17:56 +00:00
dependabot[bot] e03bf94420
chore(deps): Bump tokio from 1.18.2 to 1.19.1 (#4783)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.18.2 to 1.19.1.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.18.2...tokio-1.19.1)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-06 14:15:12 +00:00
Andrew Lamb 3592aa52d8
chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0` (#4743)
* chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0`

* chore: Update APIs

* chore: Run cargo hakari tasks

* feat: normalize parquet file metadata

* chore: update size tests

* chore: add docs on metadata stripping

* chore: TEMP UPDATE TO DF BRANCH

* chore: Update for new API

* fix: Update to latest DF

* fix: cargo hakari

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
2022-06-03 10:32:26 +00:00
Marco Neumann f7cbd5d490
test: query limits (#4769)
* test: query limits

This was left out of #4760.

* test: additional debugging

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-06-03 07:30:30 +00:00
Marco Neumann 9e30a3eb29
refactor: rework querier concurrency limiting (#4760)
* refactor: rework querier concurrency limiting

With #4752 we introduced a concurrency limit into the querier. It works
by drawing permits from a central semaphore whenever we create a
`QuerierNamespace`. This however only limits concurrency during query
planning and not query execution, because the objects contained within
the plan (chunks and some metadata) neither reference the permit nor the
`QuerierNamespace`.

Now one approach to fix that would be to wire up the permit all the down
into all the query-related data structures. This however is very fiddly
and potentially will get lost at some point, because as soon as we
transform these data structures -- e.g. into streams -- the permit might
get lost again. This will be potentially query-dependent and very hard
to debug.

So instead we reverse the approach and track the permits at the upper
layer of the stack: the gRPC service entry points. There we also need to
be careful -- e.g. when we return streams to tonic -- but it's way
easier to review that then the deeply nested object hierarchy that is
involved with queries. Also the separation of concerns is a bit clearer,
because why would a "chunk" care about the "query concurrency" as a
whole.

* refactor: improve gRPC permit keeping and prepare tests
2022-06-02 09:49:58 +00:00
Andrew Lamb 257aaa7e7b
fix: Support `_field != <name>` predicates (#4721)
* fix: Support `_field != <name>` predicates

* fix: update test

* fix: add negative test

* fix: improve comments

* refactor: make `add_include` and `add_exclude` infallible

* chore: add type annotations

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-01 16:04:53 +00:00
Carol (Nichols || Goulding) b52a3586a7
fix: Turn cargo doc warnings into errors (#4710)
* fix: Correct intra-doc links

* fix: Turn cargo doc warnings into errors

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-28 11:24:22 +00:00
dependabot[bot] 5c033b462e
chore(deps): Bump regex from 1.5.5 to 1.5.6 (#4655)
Bumps [regex](https://github.com/rust-lang/regex) from 1.5.5 to 1.5.6.
- [Release notes](https://github.com/rust-lang/regex/releases)
- [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/regex/compare/1.5.5...1.5.6)

---
updated-dependencies:
- dependency-name: regex
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-23 08:39:01 +00:00
Marco Neumann 52346642a0
ci: fix cargo deny (#4629)
* ci: fix cargo deny

* chore: downgrade `socket2`, version 0.4.5 was yanked

* chore: rename `query` to `iox_query`

`query` is already taken on crates.io and yanked and I am getting tired
of working around that.
2022-05-18 09:38:35 +00:00
Andrew Lamb 3a33e806c7
chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `14.0.0` (#4619)
* chore: Update datafusion deps

* chore: update arrow/parquet/arrow flight deps

* chore: Run cargo hakari tasks

* chore: Update location of utils

* chore: Update some more APIs

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-05-17 14:13:03 +00:00
Carol (Nichols || Goulding) 068096e7e1
fix: Rename data_types2 to data_types 2022-05-06 14:45:39 -04:00
Carol (Nichols || Goulding) 0541c6e40f
fix: Remove data_types crate where it's no longer used 2022-05-06 14:45:39 -04:00