Commit Graph

8136 Commits (c73011d4c4be25d99b5f471877c7b2697a9e8694)

Author SHA1 Message Date
Andrew Lamb c73011d4c4
docs: update logging (#4766)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-03 17:54:45 +00:00
Andrew Lamb 2e752157b9
docs: Clean up docs (#4764)
* docs: Remove outdated instructions for running OG

* docs: clarify docs / readme

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-03 14:21:14 +00:00
Andrew Lamb 40d3a09296
docs: Add some comments to InstrumentedAsyncOwnedSemaphorePermit (#4775) 2022-06-03 11:08:16 +00:00
Andrew Lamb 3592aa52d8
chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0` (#4743)
* chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0`

* chore: Update APIs

* chore: Run cargo hakari tasks

* feat: normalize parquet file metadata

* chore: update size tests

* chore: add docs on metadata stripping

* chore: TEMP UPDATE TO DF BRANCH

* chore: Update for new API

* fix: Update to latest DF

* fix: cargo hakari

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
2022-06-03 10:32:26 +00:00
dependabot[bot] 9a21292db8
chore(deps): Bump async-trait from 0.1.53 to 0.1.56 (#4774)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.53 to 0.1.56.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.53...0.1.56)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-03 09:10:40 +00:00
dependabot[bot] 73a7e6f0a5
chore(deps): Bump syn from 1.0.95 to 1.0.96 (#4773)
Bumps [syn](https://github.com/dtolnay/syn) from 1.0.95 to 1.0.96.
- [Release notes](https://github.com/dtolnay/syn/releases)
- [Commits](https://github.com/dtolnay/syn/compare/1.0.95...1.0.96)

---
updated-dependencies:
- dependency-name: syn
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-03 08:02:56 +00:00
Marco Neumann f7cbd5d490
test: query limits (#4769)
* test: query limits

This was left out of #4760.

* test: additional debugging

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-06-03 07:30:30 +00:00
Marco Neumann 81730fd0ff
feat: add owned versions of instrumented semaphores (#4770)
Owned versions will be required to instrument the query concurrency
limiter.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-03 07:20:54 +00:00
Ryan Russell d279deddad
docs(various): Improve Readability (#4768)
Signed-off-by: Ryan Russell <git@ryanrussell.org>

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-02 18:01:06 +00:00
Nga Tran 79895b995c
chore: add debug info to see how many concurrent partitions being compacted in each cycle (#4772)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-02 15:19:08 +00:00
kodiakhq[bot] 1f87dde95f
Merge pull request #4722 from influxdata/cn/convert
feat: Cache read buffer chunks
2022-06-02 14:15:46 +00:00
Carol (Nichols || Goulding) 9d9c5d3692
fix: Take backoff config as an argument to be consistent with the other caches 2022-06-02 09:50:48 -04:00
Carol (Nichols || Goulding) 76b40ac6a1
refactor: Make the type alias into a struct 2022-06-02 09:26:11 -04:00
Carol (Nichols || Goulding) 715c65dfef
docs: Clarify a comment about what is an Arc 2022-06-02 09:22:44 -04:00
Carol (Nichols || Goulding) 879dd7cec4
test: LRU behavior of the read buffer chunk cache 2022-06-02 09:22:44 -04:00
Carol (Nichols || Goulding) 9328ba8c45
feat: Use new extra loading info to load read buffer chunks into cache 2022-06-02 09:22:44 -04:00
Carol (Nichols || Goulding) 054c25de50
refactor: Add more methods to DecodedParquetFile
I'm tired of trying to remember which info is on which metadata.
2022-06-02 09:22:44 -04:00
Marco Neumann 9e30a3eb29
refactor: rework querier concurrency limiting (#4760)
* refactor: rework querier concurrency limiting

With #4752 we introduced a concurrency limit into the querier. It works
by drawing permits from a central semaphore whenever we create a
`QuerierNamespace`. This however only limits concurrency during query
planning and not query execution, because the objects contained within
the plan (chunks and some metadata) neither reference the permit nor the
`QuerierNamespace`.

Now one approach to fix that would be to wire up the permit all the down
into all the query-related data structures. This however is very fiddly
and potentially will get lost at some point, because as soon as we
transform these data structures -- e.g. into streams -- the permit might
get lost again. This will be potentially query-dependent and very hard
to debug.

So instead we reverse the approach and track the permits at the upper
layer of the stack: the gRPC service entry points. There we also need to
be careful -- e.g. when we return streams to tonic -- but it's way
easier to review that then the deeply nested object hierarchy that is
involved with queries. Also the separation of concerns is a bit clearer,
because why would a "chunk" care about the "query concurrency" as a
whole.

* refactor: improve gRPC permit keeping and prepare tests
2022-06-02 09:49:58 +00:00
Andrew Lamb 1472ec272f
refactor: consolidate duplicate testing logic (#4708)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-01 20:02:13 +00:00
Andrew Lamb a37c553545
refactor: Split up rpc_predicate module a bit (#4763)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-01 19:56:11 +00:00
Andrew Lamb 7328cc6a9a
docs: Update readme (#4765)
* docs: Update readme

* fix: Update README.md

Co-authored-by: Nga Tran <nga-tran@live.com>

Co-authored-by: Nga Tran <nga-tran@live.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-01 19:50:06 +00:00
kodiakhq[bot] b714269b13
Merge pull request #4754 from influxdata/cn/extra-cache-system
feat: Add an Extra type to Cacher Loader to specify extra information…
2022-06-01 18:11:46 +00:00
kodiakhq[bot] 7ad3a50dd4
Merge branch 'main' into cn/extra-cache-system 2022-06-01 18:06:09 +00:00
kodiakhq[bot] f3fb040294
Merge pull request #4756 from influxdata/dom/fix-e2e-db-fire
test(e2e): do not mangle prod database
2022-06-01 16:38:34 +00:00
kodiakhq[bot] 51114a0a56
Merge branch 'main' into dom/fix-e2e-db-fire 2022-06-01 16:32:41 +00:00
Dom Dwyer 1caeb04869 test(e2e): do not mangle prod database
Unset the all env vars for the following CLI e2e tests:

    * default_mode_is_run_all_in_one
    * default_run_mode_is_all_in_one

This prevents them from executing against the "prod" catalog, running
migrations and inserting values to the prod database specified in the
prod DSN env (INFLUXDB_IOX_CATALOG_DSN).
2022-06-01 17:12:12 +01:00
kodiakhq[bot] 5a52954d0a
Merge pull request #4759 from influxdata/dom/ignored-metadata-test
refactor: always panic for empty parquet files
2022-06-01 16:11:29 +00:00
kodiakhq[bot] 69da424a41
Merge branch 'main' into dom/ignored-metadata-test 2022-06-01 16:05:43 +00:00
Andrew Lamb 257aaa7e7b
fix: Support `_field != <name>` predicates (#4721)
* fix: Support `_field != <name>` predicates

* fix: update test

* fix: add negative test

* fix: improve comments

* refactor: make `add_include` and `add_exclude` infallible

* chore: add type annotations

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-01 16:04:53 +00:00
Dom Dwyer f8b83c5085 test: assert panic behaviour
Modifies the existing test added as part of #4695 to ensure a panic is
emitted when serialising an empty parquet file.
2022-06-01 16:55:53 +01:00
Dom Dwyer 0bfc11f4a1 refactor: always panic for empty parquet files
Moves the panic into the child call to_parquet() so all code paths are
covered (i.e. not serialising into memory via to_parquet_bytes()).
2022-06-01 16:54:36 +01:00
kodiakhq[bot] 507e153c5a
Merge pull request #4699 from influxdata/dom/silly-config
refactor: warn for silly object store configs
2022-06-01 15:48:10 +00:00
Dom Dwyer 60de97ac26 test(e2e): ensure "partition pull" writes files
Adds a test case covering the "remote partition pull" command configured
with file-based object storage.
2022-06-01 16:41:57 +01:00
Dom Dwyer 6d647fb7a9 refactor: warn for silly object store configs
Warn when downloading files to an in-memory object store.

The "remote partition pull" command downloads parquet files from an
object store via a router, and saves them locally. It's pretty unlikely
the user intends to download those files to memory of the CLI process
which then exits when the pull is complete, throwing away the downloaded
files, but this is the default.
2022-06-01 16:41:57 +01:00
kodiakhq[bot] 1ca58b2b70
Merge pull request #4757 from influxdata/dom/use-constructors-plz
refactor: constructor for ParquetFileWithTombstone
2022-06-01 15:34:02 +00:00
Dom Dwyer 9ae58c89b6 refactor: constructor for ParquetFileWithTombstone
Use a constructor to initialise a ParquetFileWithTombstone struct,
rather than making the fields pub.

This allows IDEs to "go to" places where this is constructed when
browsing the code, but also keeps the type closed for modification of
internals (SOLID).
2022-06-01 15:58:06 +01:00
Nga Tran f0e477fcee
chore: let aggressively increase compactor job size and concurrency level (#4747)
* chore: let aggressively increase compactor job size and concurrency level

* chore: Apply suggestions from code review

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-01 14:32:36 +00:00
Carol (Nichols || Goulding) 39fc19e946
test: Exercise the Extra type in the cache system tests 2022-06-01 09:19:52 -04:00
Carol (Nichols || Goulding) 37347f2389
feat: Add an Extra type to Cacher Loader to specify extra information for loading entries 2022-06-01 08:58:19 -04:00
Andrew Lamb 2886149afc
chore: naming / comment cleanups from namespace semaphore (#4753) 2022-06-01 12:46:38 +00:00
Marco Neumann 446d94487d
feat: add tooling to instrument async semaphores (#4751)
* feat: add tooling to instrument async semaphores

Ref #4739.

* test: improve `test_permits_acquired_and_holders_acquired`

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-01 12:19:23 +00:00
Marco Neumann ebeccf037c
feat: limit querier concurrency by limiting number of active namespaces (#4752)
This is a rather quick fix for prod. On the mid-term we probably wanna
rethink our deployment strategy, e.g. by using "one query per pod" and
by deploying queryd w/ IOx into the same pod.
2022-06-01 11:59:35 +00:00
dependabot[bot] e638385782
chore(deps): Bump pbjson-types from 0.3.1 to 0.3.2 (#4750)
Bumps [pbjson-types](https://github.com/influxdata/pbjson) from 0.3.1 to 0.3.2.
- [Release notes](https://github.com/influxdata/pbjson/releases)
- [Commits](https://github.com/influxdata/pbjson/compare/0.3.1...0.3.2)

---
updated-dependencies:
- dependency-name: pbjson-types
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-01 09:02:37 +00:00
dependabot[bot] 21ec05a6ee
chore(deps): Bump pbjson from 0.3.1 to 0.3.2 (#4749)
Bumps [pbjson](https://github.com/influxdata/pbjson) from 0.3.1 to 0.3.2.
- [Release notes](https://github.com/influxdata/pbjson/releases)
- [Commits](https://github.com/influxdata/pbjson/compare/0.3.1...0.3.2)

---
updated-dependencies:
- dependency-name: pbjson
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-01 08:12:46 +00:00
dependabot[bot] 043dee43c8
chore(deps): Bump pbjson-build from 0.3.1 to 0.3.2 (#4748)
Bumps [pbjson-build](https://github.com/influxdata/pbjson) from 0.3.1 to 0.3.2.
- [Release notes](https://github.com/influxdata/pbjson/releases)
- [Commits](https://github.com/influxdata/pbjson/compare/0.3.1...0.3.2)

---
updated-dependencies:
- dependency-name: pbjson-build
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-01 07:54:13 +00:00
Marco Neumann c91dbe062e
test: "optimize" ingesterrecord batches in query tests (#4700)
* test: "optimize" ingesterrecord batches in query tests

It seems that I had the right idea in #4656 but wasn't able to trigger
https://github.com/influxdata/conductor/issues/955 because the query
tests do not "optimize" the record batches in the same way the actual
gRPC implementation does. If we apply the same transformation we indeed
end up with the same error.

* fix: all batches within the ingester flight response must have same schema

* refactor: simplify and reuse code

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-01 07:37:11 +00:00
Nga Tran 79220720be
chore: increase size of a compactor job and level of concurrency (#4746)
* fix: let us not compact no-data

* fix: split time must be greater min_time, too

* fix: resolve merge conflict

* chore: increase size of a compactor job and level of concurrency

Co-authored-by: Dom <dom@itsallbroken.com>
2022-05-31 19:57:06 +00:00
Nga Tran dfd35c05a1
fix: let us not compact no-data (#4744)
* fix: let us not compact no-data

* fix: split time must be greater min_time, too

* fix: resolve merge conflict

Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-31 17:02:14 +00:00
kodiakhq[bot] 51fd20c769
Merge pull request #4745 from influxdata/dom/metadata-test-with-schema
test: fix test_metadata_from_parquet_metadata
2022-05-31 16:42:46 +00:00
Dom Dwyer 5d74ae2ac1 test: fix test_metadata_from_parquet_metadata
Changes the test_metadata_from_parquet_metadata test to embed the IOx
metadata before asserting it can be read back.
2022-05-31 17:34:04 +01:00