Commit Graph

8121 Commits (9328ba8c4508b8a25529b24b438f7d76a78288df)

Author SHA1 Message Date
Carol (Nichols || Goulding) 9328ba8c45
feat: Use new extra loading info to load read buffer chunks into cache 2022-06-02 09:22:44 -04:00
Carol (Nichols || Goulding) 054c25de50
refactor: Add more methods to DecodedParquetFile
I'm tired of trying to remember which info is on which metadata.
2022-06-02 09:22:44 -04:00
Marco Neumann 9e30a3eb29
refactor: rework querier concurrency limiting (#4760)
* refactor: rework querier concurrency limiting

With #4752 we introduced a concurrency limit into the querier. It works
by drawing permits from a central semaphore whenever we create a
`QuerierNamespace`. This however only limits concurrency during query
planning and not query execution, because the objects contained within
the plan (chunks and some metadata) neither reference the permit nor the
`QuerierNamespace`.

Now one approach to fix that would be to wire up the permit all the down
into all the query-related data structures. This however is very fiddly
and potentially will get lost at some point, because as soon as we
transform these data structures -- e.g. into streams -- the permit might
get lost again. This will be potentially query-dependent and very hard
to debug.

So instead we reverse the approach and track the permits at the upper
layer of the stack: the gRPC service entry points. There we also need to
be careful -- e.g. when we return streams to tonic -- but it's way
easier to review that then the deeply nested object hierarchy that is
involved with queries. Also the separation of concerns is a bit clearer,
because why would a "chunk" care about the "query concurrency" as a
whole.

* refactor: improve gRPC permit keeping and prepare tests
2022-06-02 09:49:58 +00:00
Andrew Lamb 1472ec272f
refactor: consolidate duplicate testing logic (#4708)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-01 20:02:13 +00:00
Andrew Lamb a37c553545
refactor: Split up rpc_predicate module a bit (#4763)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-01 19:56:11 +00:00
Andrew Lamb 7328cc6a9a
docs: Update readme (#4765)
* docs: Update readme

* fix: Update README.md

Co-authored-by: Nga Tran <nga-tran@live.com>

Co-authored-by: Nga Tran <nga-tran@live.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-01 19:50:06 +00:00
kodiakhq[bot] b714269b13
Merge pull request #4754 from influxdata/cn/extra-cache-system
feat: Add an Extra type to Cacher Loader to specify extra information…
2022-06-01 18:11:46 +00:00
kodiakhq[bot] 7ad3a50dd4
Merge branch 'main' into cn/extra-cache-system 2022-06-01 18:06:09 +00:00
kodiakhq[bot] f3fb040294
Merge pull request #4756 from influxdata/dom/fix-e2e-db-fire
test(e2e): do not mangle prod database
2022-06-01 16:38:34 +00:00
kodiakhq[bot] 51114a0a56
Merge branch 'main' into dom/fix-e2e-db-fire 2022-06-01 16:32:41 +00:00
Dom Dwyer 1caeb04869 test(e2e): do not mangle prod database
Unset the all env vars for the following CLI e2e tests:

    * default_mode_is_run_all_in_one
    * default_run_mode_is_all_in_one

This prevents them from executing against the "prod" catalog, running
migrations and inserting values to the prod database specified in the
prod DSN env (INFLUXDB_IOX_CATALOG_DSN).
2022-06-01 17:12:12 +01:00
kodiakhq[bot] 5a52954d0a
Merge pull request #4759 from influxdata/dom/ignored-metadata-test
refactor: always panic for empty parquet files
2022-06-01 16:11:29 +00:00
kodiakhq[bot] 69da424a41
Merge branch 'main' into dom/ignored-metadata-test 2022-06-01 16:05:43 +00:00
Andrew Lamb 257aaa7e7b
fix: Support `_field != <name>` predicates (#4721)
* fix: Support `_field != <name>` predicates

* fix: update test

* fix: add negative test

* fix: improve comments

* refactor: make `add_include` and `add_exclude` infallible

* chore: add type annotations

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-01 16:04:53 +00:00
Dom Dwyer f8b83c5085 test: assert panic behaviour
Modifies the existing test added as part of #4695 to ensure a panic is
emitted when serialising an empty parquet file.
2022-06-01 16:55:53 +01:00
Dom Dwyer 0bfc11f4a1 refactor: always panic for empty parquet files
Moves the panic into the child call to_parquet() so all code paths are
covered (i.e. not serialising into memory via to_parquet_bytes()).
2022-06-01 16:54:36 +01:00
kodiakhq[bot] 507e153c5a
Merge pull request #4699 from influxdata/dom/silly-config
refactor: warn for silly object store configs
2022-06-01 15:48:10 +00:00
Dom Dwyer 60de97ac26 test(e2e): ensure "partition pull" writes files
Adds a test case covering the "remote partition pull" command configured
with file-based object storage.
2022-06-01 16:41:57 +01:00
Dom Dwyer 6d647fb7a9 refactor: warn for silly object store configs
Warn when downloading files to an in-memory object store.

The "remote partition pull" command downloads parquet files from an
object store via a router, and saves them locally. It's pretty unlikely
the user intends to download those files to memory of the CLI process
which then exits when the pull is complete, throwing away the downloaded
files, but this is the default.
2022-06-01 16:41:57 +01:00
kodiakhq[bot] 1ca58b2b70
Merge pull request #4757 from influxdata/dom/use-constructors-plz
refactor: constructor for ParquetFileWithTombstone
2022-06-01 15:34:02 +00:00
Dom Dwyer 9ae58c89b6 refactor: constructor for ParquetFileWithTombstone
Use a constructor to initialise a ParquetFileWithTombstone struct,
rather than making the fields pub.

This allows IDEs to "go to" places where this is constructed when
browsing the code, but also keeps the type closed for modification of
internals (SOLID).
2022-06-01 15:58:06 +01:00
Nga Tran f0e477fcee
chore: let aggressively increase compactor job size and concurrency level (#4747)
* chore: let aggressively increase compactor job size and concurrency level

* chore: Apply suggestions from code review

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-01 14:32:36 +00:00
Carol (Nichols || Goulding) 39fc19e946
test: Exercise the Extra type in the cache system tests 2022-06-01 09:19:52 -04:00
Carol (Nichols || Goulding) 37347f2389
feat: Add an Extra type to Cacher Loader to specify extra information for loading entries 2022-06-01 08:58:19 -04:00
Andrew Lamb 2886149afc
chore: naming / comment cleanups from namespace semaphore (#4753) 2022-06-01 12:46:38 +00:00
Marco Neumann 446d94487d
feat: add tooling to instrument async semaphores (#4751)
* feat: add tooling to instrument async semaphores

Ref #4739.

* test: improve `test_permits_acquired_and_holders_acquired`

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-01 12:19:23 +00:00
Marco Neumann ebeccf037c
feat: limit querier concurrency by limiting number of active namespaces (#4752)
This is a rather quick fix for prod. On the mid-term we probably wanna
rethink our deployment strategy, e.g. by using "one query per pod" and
by deploying queryd w/ IOx into the same pod.
2022-06-01 11:59:35 +00:00
dependabot[bot] e638385782
chore(deps): Bump pbjson-types from 0.3.1 to 0.3.2 (#4750)
Bumps [pbjson-types](https://github.com/influxdata/pbjson) from 0.3.1 to 0.3.2.
- [Release notes](https://github.com/influxdata/pbjson/releases)
- [Commits](https://github.com/influxdata/pbjson/compare/0.3.1...0.3.2)

---
updated-dependencies:
- dependency-name: pbjson-types
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-01 09:02:37 +00:00
dependabot[bot] 21ec05a6ee
chore(deps): Bump pbjson from 0.3.1 to 0.3.2 (#4749)
Bumps [pbjson](https://github.com/influxdata/pbjson) from 0.3.1 to 0.3.2.
- [Release notes](https://github.com/influxdata/pbjson/releases)
- [Commits](https://github.com/influxdata/pbjson/compare/0.3.1...0.3.2)

---
updated-dependencies:
- dependency-name: pbjson
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-01 08:12:46 +00:00
dependabot[bot] 043dee43c8
chore(deps): Bump pbjson-build from 0.3.1 to 0.3.2 (#4748)
Bumps [pbjson-build](https://github.com/influxdata/pbjson) from 0.3.1 to 0.3.2.
- [Release notes](https://github.com/influxdata/pbjson/releases)
- [Commits](https://github.com/influxdata/pbjson/compare/0.3.1...0.3.2)

---
updated-dependencies:
- dependency-name: pbjson-build
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-01 07:54:13 +00:00
Marco Neumann c91dbe062e
test: "optimize" ingesterrecord batches in query tests (#4700)
* test: "optimize" ingesterrecord batches in query tests

It seems that I had the right idea in #4656 but wasn't able to trigger
https://github.com/influxdata/conductor/issues/955 because the query
tests do not "optimize" the record batches in the same way the actual
gRPC implementation does. If we apply the same transformation we indeed
end up with the same error.

* fix: all batches within the ingester flight response must have same schema

* refactor: simplify and reuse code

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-01 07:37:11 +00:00
Nga Tran 79220720be
chore: increase size of a compactor job and level of concurrency (#4746)
* fix: let us not compact no-data

* fix: split time must be greater min_time, too

* fix: resolve merge conflict

* chore: increase size of a compactor job and level of concurrency

Co-authored-by: Dom <dom@itsallbroken.com>
2022-05-31 19:57:06 +00:00
Nga Tran dfd35c05a1
fix: let us not compact no-data (#4744)
* fix: let us not compact no-data

* fix: split time must be greater min_time, too

* fix: resolve merge conflict

Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-31 17:02:14 +00:00
kodiakhq[bot] 51fd20c769
Merge pull request #4745 from influxdata/dom/metadata-test-with-schema
test: fix test_metadata_from_parquet_metadata
2022-05-31 16:42:46 +00:00
Dom Dwyer 5d74ae2ac1 test: fix test_metadata_from_parquet_metadata
Changes the test_metadata_from_parquet_metadata test to embed the IOx
metadata before asserting it can be read back.
2022-05-31 17:34:04 +01:00
kodiakhq[bot] f7c198ebb5
Merge pull request #4729 from influxdata/dom/consistent-sort-key
refactor(compactor): consistent sort key
2022-05-31 16:24:45 +00:00
kodiakhq[bot] 2d08478e1b
Merge branch 'main' into dom/consistent-sort-key 2022-05-31 16:15:08 +00:00
Marco Neumann 988bd38e93
refactor: remove unused code (#4742) 2022-05-31 11:36:02 +00:00
Marco Neumann 5a95da7327
refactor: do NOT use ANY file IO for parquet reading (#4741) 2022-05-31 11:18:24 +00:00
Marco Neumann 2bf03e57bf
feat: limit tmp parquet file count and size (#4737)
Fixes #4736.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-31 08:56:15 +00:00
dependabot[bot] 01fc550034
chore(deps): Bump parking_lot from 0.12.0 to 0.12.1 (#4732)
Bumps [parking_lot](https://github.com/Amanieu/parking_lot) from 0.12.0 to 0.12.1.
- [Release notes](https://github.com/Amanieu/parking_lot/releases)
- [Changelog](https://github.com/Amanieu/parking_lot/blob/master/CHANGELOG.md)
- [Commits](https://github.com/Amanieu/parking_lot/compare/0.12.0...0.12.1)

---
updated-dependencies:
- dependency-name: parking_lot
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-31 08:01:58 +00:00
dependabot[bot] 5b5f0efef5
chore(deps): Bump rayon from 1.5.2 to 1.5.3 (#4731)
Bumps [rayon](https://github.com/rayon-rs/rayon) from 1.5.2 to 1.5.3.
- [Release notes](https://github.com/rayon-rs/rayon/releases)
- [Changelog](https://github.com/rayon-rs/rayon/blob/master/RELEASES.md)
- [Commits](https://github.com/rayon-rs/rayon/compare/v1.5.2...v1.5.3)

---
updated-dependencies:
- dependency-name: rayon
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-31 07:50:42 +00:00
dependabot[bot] 642aef103d
chore(deps): Bump comfy-table from 5.0.1 to 6.0.0 (#4730)
Bumps [comfy-table](https://github.com/nukesor/comfy-table) from 5.0.1 to 6.0.0.
- [Release notes](https://github.com/nukesor/comfy-table/releases)
- [Changelog](https://github.com/Nukesor/comfy-table/blob/main/CHANGELOG.md)
- [Commits](https://github.com/nukesor/comfy-table/compare/v5.0.1...v6.0.0)

---
updated-dependencies:
- dependency-name: comfy-table
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-31 07:10:37 +00:00
Dom Dwyer 70864b9f48 refactor: always use correct chunk sort key
Don't use the same sort key for all files - sort keys may grow over
time, and the information is already at hand.
2022-05-30 17:41:41 +01:00
Dom Dwyer 6aa2a6958a refactor: assert consistent parquet file metadata
Assert consistent metadata when evaluating candidate parquet files for
compaction.

Asserts all files have the same:
    * Sequencer ID
    * Namespace ID
    * Table ID
    * Partition ID
    * Sort key
2022-05-30 17:41:41 +01:00
Dom Dwyer 0f16d6cabb refactor: consistent SortKey source
Changes the compaction logic to always reference the same SortKey
instance, rather than repeatedly querying for it.

The Partition metadata is always read from the catalog as part of
compact_partition(), where it previously threw away all metadata except
the sort key, which was passed into compact(). Then compact() would
always re-query the catalog to look up just the sort key again, and mix
up the two instances during use - one passed into the fn, one freshly
queried within the fn.

Now the Partition metadata is resolved in compact_partition() as it was
previously, but the entire Partition reference is passed to compact(),
and this is consistently used do access the sort key. This also removes
a catalog query per compaction call.
2022-05-30 17:41:41 +01:00
Marco Neumann 79c054ffc9
fix: do NOT block in parquet file IO (#4727)
* fix: do NOT block in parquet file IO

I think for historical reason we were using blocking IO to read parquet
files. With the current streaming `SendableRecordStream` approach this
is technically NOT required anymore.

Now one might think that the sync-async dance that we did is kinda
harmless, but looking at our producition querier I think it is really
bad. The querier seems to be stuck but looking at `strace` and other
health signal it seems it is not entirely dead. Looking at GDB
backtraces it seems that nearly all threads are busy in
`download_and_scan_parquet`. Looking at the tokio docs
(<https://docs.rs/tokio/1.18.2/tokio/task/fn.spawn_blocking.html>)
for `spawn_blocking` (which is used to start the sync download) this
makes sense: tokio only starts replacement threads for the current
runtime thread (which calls `spawn_blocking`) if this does NOT exceed the
runtime thread limit. However we set the runtime thread limit to the
number of CPU cores available to IOx, so this is a limiting factor. This
means that there are only a few threads left to do actual work (I've
seen postgres data flowing back and forth for example) but tokio is not
able to use its full potential anymore. This is esp. bad because the
sync code in `download_and_scan_parquet` then uses `futures` `block_on`
functionality to call back into async code, so it waits for tokio
itself.

The change is rather simple: just use async task spawns.

* fix: use async IO to write stream to temp file

* fix: do not block tokio thread during parquet file reading

* refactor: ensure parquet IO tasks are cancelled if they are not needed anymore

There is no REAL way to cancel sync tasks, but at least we can try our
best.
2022-05-30 13:32:20 +00:00
Andrew Lamb d0903b11bb
refactor: reduce test duplication in `querier/src/table/mod.rs` (#4698)
* refactor: reduce test duplication in `querier/src/table/mod.rs`

* fix: Apply suggestions from code review

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>

* fix: Update querier/src/table/test_util.rs

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>

* fix: use now_nanos()

* refactor: Add TestQuerierTable

* refactor: rename functions for explicitness

Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
2022-05-30 12:56:09 +00:00
Paul Dix 6af32b7750
feat: add concurrency limit for ingester queries (#4703)
I've defaulted it to 20, we can adjust as needed.

Closes #4657

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-30 10:22:17 +00:00
dependabot[bot] 73168f7989
chore(deps): Bump flate2 from 1.0.23 to 1.0.24 (#4726)
Bumps [flate2](https://github.com/rust-lang/flate2-rs) from 1.0.23 to 1.0.24.
- [Release notes](https://github.com/rust-lang/flate2-rs/releases)
- [Commits](https://github.com/rust-lang/flate2-rs/commits)

---
updated-dependencies:
- dependency-name: flate2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-30 08:26:12 +00:00