Commit Graph

8589 Commits (964062b40cd31be7064733867b978b36e5219e34)

Author SHA1 Message Date
Stuart Carnie 964062b40c
feat: Add information about profiling using Instruments on macOS (#5275) 2022-08-03 08:45:33 +00:00
Marko Mikulicic a4e2f880be
feat: Expose a C API for the IOx LP parser (#5267)
Can be useful to call the IOx LP parser from other processes, for example from Go.
I used it to run an online comparison of IOx and influxdb Go LP parser in order to identify compatibility
issues.
2022-08-02 15:44:41 +00:00
Marco Neumann 8e2443d879
feat: use two RAM pools in querier (#5271)
Quick&Dirty implementation of a RAM-pool split to see if this has any
effect. I expect the querier performance to improve due to this because
large read buffers can no longer evict precious metadata.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-02 15:14:26 +00:00
Nga Tran 4812db9887
feat: fewer buckets but larger ranges for compaction duration histogram (#5259)
* chore: reduce log info

* feat: fewer buckets but larger ranges for compaction duration histogram

* chore: Apply suggestions from code review

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>

* chore: run fmt after appying reviewer's suggestions

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-02 14:19:30 +00:00
Andrew Lamb 9c9658ca38
test(influxdb_line_protocol): add value verification test (#5270) 2022-08-02 11:18:09 +00:00
Marco Neumann ee491cbbfc
fix: re-enable querier read buffer cache (#5268)
This reverts commit 82913743f1 / #5252.

I misjudged the cache hit ratio for the RB, see
https://github.com/influxdata/k8s-infra/pull/4548

So let's bring back the RB cache until we have some form of parquet
cache in place.
2022-08-02 08:37:30 +00:00
dependabot[bot] e57ae07db7
chore(deps): Bump serde from 1.0.140 to 1.0.141 (#5260)
Bumps [serde](https://github.com/serde-rs/serde) from 1.0.140 to 1.0.141.
- [Release notes](https://github.com/serde-rs/serde/releases)
- [Commits](https://github.com/serde-rs/serde/compare/v1.0.140...v1.0.141)

---
updated-dependencies:
- dependency-name: serde
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-02 07:52:06 +00:00
Marco Neumann a8f6d579c8
feat: add metric for predicate-based cache entry removal (#5257) 2022-08-02 07:44:53 +00:00
Marco Neumann fec6b18d80
feat: add metric for TTL cache expiration (#5256)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-02 07:00:30 +00:00
Marko Mikulicic 7d15bd6029
Merge pull request #5265 from influxdata/lp_fp
fix: Fix bug and incompatibility in floating point parsing of scientific notation
2022-08-02 06:23:16 +02:00
Marko Mikulicic 84a856069b fix: Scientific notation without + or -
Closes #5264
2022-08-02 05:46:28 +02:00
Marko Mikulicic a926996485 fix: Negative scientific notation without decimal parts
Closes #5263
2022-08-02 05:40:55 +02:00
Nga Tran 8f1b6f2465
chore: reduce log info (#5254)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-01 16:00:34 +00:00
Marco Neumann 82913743f1
refactor: disable querier read buffer cache (#5252)
Let's try and see how this performs in prod.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-01 15:43:22 +00:00
Marco Neumann bb172f8fa8
refactor: bump batch size (#5251)
This is what DataFusion uses by default and I don't see a reason why we
should use such small batch sizes.

The affect is probably only visible in certain filter-aggregate queries
that don't focus on a single series (because there we likely end up with
1 or 2 batches only, esp. after #5250) for coarse-grained filters, esp.
  when the filter key is not the first sort key.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-01 13:49:58 +00:00
Marco Neumann b12ebe1109
fix: do not panic on invalid timestamp ranges (#5249)
Timestamp ranges come from "untrusted" inputs (via gRPC) and must not
lead to panics. The only case where this could happen is at `start >
end`. Let's just set `start = end` in this case. Reaonsing:

- Semantically this is a sound range, since this is only a somewhat
  degenerated case of "empty".
- We already allow `start = end` to represent "empty" ranges.
- We already clamp (and therefore modify) `start` to the valid range.

Fixes https://github.com/influxdata/conductor/issues/1080.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-01 13:35:34 +00:00
Marco Neumann 3ae89de324
refactor: increase batch size when reading parquet (#5250)
* refactor: increase batch size when reading parquet

This reduced our overhead when reading parquet files quite a lot.

In some internal benchmark, this reduces the size to perform a single
series aggregation of a rather large series with cold caches from 58s to
48s for cold caches. No real difference could be measured for warm
caches (~21ms for both).

This should also help the compactor since the record batches should be
larger.

* refactor: ensure that parquet row group size is in-sync

Ensure that we use the same row group size for reading and writing
parquet files. This is the same value as upstream currently uses as a
default, but let's make sure we don't diverge from that:

3032a521c9/parquet/src/file/properties.rs (L65)

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-01 10:31:26 +00:00
dependabot[bot] de340cfa53
chore(deps): Bump sqlparser from 0.18.0 to 0.19.0 (#5237)
Bumps [sqlparser](https://github.com/sqlparser-rs/sqlparser-rs) from 0.18.0 to 0.19.0.
- [Release notes](https://github.com/sqlparser-rs/sqlparser-rs/releases)
- [Changelog](https://github.com/sqlparser-rs/sqlparser-rs/blob/main/CHANGELOG.md)
- [Commits](https://github.com/sqlparser-rs/sqlparser-rs/compare/v0.18.0...v0.19.0)

---
updated-dependencies:
- dependency-name: sqlparser
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-01 08:58:32 +00:00
dependabot[bot] fbd39844d8
chore(deps): Bump async-trait from 0.1.56 to 0.1.57 (#5247)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.56 to 0.1.57.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.56...0.1.57)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-01 08:30:33 +00:00
dependabot[bot] d0db73d8de
chore(deps): Bump clap from 3.2.15 to 3.2.16 (#5239)
Bumps [clap](https://github.com/clap-rs/clap) from 3.2.15 to 3.2.16.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v3.2.15...v3.2.16)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-01 08:00:47 +00:00
dependabot[bot] a687844ddf
chore(deps): Bump bytes from 1.2.0 to 1.2.1 (#5244)
Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.2.0 to 1.2.1.
- [Release notes](https://github.com/tokio-rs/bytes/releases)
- [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md)
- [Commits](https://github.com/tokio-rs/bytes/commits)

---
updated-dependencies:
- dependency-name: bytes
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-01 07:50:06 +00:00
Andrew Lamb 7cc8486e5a
fix: remove left over `deb!` macro (#5224)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-30 15:33:02 +00:00
Marco Neumann 0e9695f202
feat: add a few helpful compactor debug logs (#5235)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-28 17:38:33 +00:00
Marco Neumann 87bdabb38a
feat: log external span for query gRPC requests (#5187)
* feat: log external span for query gRPC requests

This should simplify the correlation with our binlog data.

* refactor: address review comments

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-28 12:53:12 +00:00
Sam Arnold 3fbe860bb9
fix: interpret [MIN_NANO_TIME, MAX_NANO_TIME) range as all time for optimization (#5231)
InfluxQL queries can send (technically incorrect) ranges like this, meaning all time
but excluding the max nanosecond time.

Since this is an important case, we should handle it specially and use the optimized
'all time' handling for meta queries even though this is technically wrong in that
it does not filter out column names / measurement names at MAX_NANO_TIME exactly.

Closes: https://github.com/influxdata/conductor/issues/1072

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-28 12:24:26 +00:00
Andrew Lamb 9215a534d0
chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0` (#5229)
* chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0`

* chore: Run cargo hakari tasks

* fix: Update for API changes

* fix: clippy

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-28 08:10:47 +00:00
Nga Tran fcce00bf09
feat: run many compact partitions in parallel (#5230)
* feat: run many compact partitions in parallel

* refactor: Use rust futures fu to run compactor jobs in parallel

* chore: Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-07-27 20:55:45 +00:00
Andrew Lamb 7eebe061a6
fix: reduce log verbosity for `found compaction candidates` message (#5225)
* fix: reduce log verbosity

* refactor: sleep for a sec if no work, print debug

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-27 19:35:31 +00:00
Marko Mikulicic 9da8062a16
fix: Fix typo in log message (#5222)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-27 15:34:37 +00:00
Marco Neumann 9a9a1a4777
feat: limit per-table chunk data for every query (#5223)
* feat: `QueryChunk::as_any`

* feat: allo `ChunkPruner::prune_chunks` to fail

* feat: limit per-table chunk data for every query

Closes #5211.

* fix: address review comments

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-07-27 13:20:05 +00:00
Marko Mikulicic 6d01a9ad68
fix: Clarify error msg: line number is 1-based (#5219)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-27 08:37:28 +00:00
Marco Neumann d7ab7362fd
refactor: avoid schema copies in `select_schema` (#5214)
This massively helps with #5202.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-27 08:30:26 +00:00
Marco Neumann 85c186f5b8
feat: cache projected chunk schemas in querier (#5213)
* feat: cache projected chunk schemas in querier

Ref #5202.

* refactor: simplify size calculations

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-27 08:23:20 +00:00
Marko Mikulicic 2bbc419f95
fix: Tell which column failed typecheck (#5220) 2022-07-27 08:16:15 +00:00
Marko Mikulicic de22b6b080
test: Add a test for LP quoting (#5210)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-26 20:08:37 +00:00
dependabot[bot] 727f0152d0
chore(deps): Bump tokio from 1.20.0 to 1.20.1 (#5209)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.20.0 to 1.20.1.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.20.0...tokio-1.20.1)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-26 15:36:25 +00:00
Andrew Lamb bbcf4ec64e
fix: Run compactor streams in parallel to avoid deadlock (#5212)
* fix: run compaction streams at once

* fix: make it compile

* fix: improve wording

* fix: use task to write parquet files in parallel
2022-07-26 12:17:38 +00:00
Marco Neumann 7d16ac7de0
docs: extend profiling guide (#5205)
* docs: extend profiling guide

More tools.

* chore: fix docs lint for `localhost` links

* docs: do not duplicate tracing docs

* refactor: clean up `lint_docs` and strip anchors from relative links
2022-07-26 09:40:42 +00:00
dependabot[bot] 246b294b05
chore(deps): Bump clap from 3.2.14 to 3.2.15 (#5208)
Bumps [clap](https://github.com/clap-rs/clap) from 3.2.14 to 3.2.15.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/v3.2.15/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v3.2.14...v3.2.15)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-07-26 08:44:02 +00:00
Nga Tran d05f383a98
refactor: reduce compacting size and compacted file size to prevent compactor from waiting for reading a large file forever (#5206)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-25 20:08:11 +00:00
Marco Neumann 5f15d97dd1
chore: always pass `ROARING_ARCH` (#5203)
* chore: always pass `ROARING_ARCH`

Always pass the `ROARING_ARCH` that we would use for our prod builds.
Otherwise this can easily be missed during testing, profiling or build
system changes (e.g. should we ever move aways for our `Dockerfile`).

This feature was introduced with Rust/Cargo 1.56.

* docs: explain env passing

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-07-25 16:15:39 +00:00
Marco Neumann 614ca5ca96
refactor: allocate less memory for tracing (#5200)
While I could not find evidence that these allocations are a problem,
the metadata and links of spans are rarely used so we shouldn't pay for
them even for heavily traced applications.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-25 12:52:29 +00:00
Andrew Lamb e4dc8c2067
refactor: rename garbage collector crates for consistency (#5196)
* refactor: rename garbage collector crates for consistency

* fix: cargo fmt

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-25 12:44:37 +00:00
Marco Neumann 96c3a05481
feat: add debug log to `with_clear_timestamp_if_max_range` (#5199) 2022-07-25 11:43:54 +00:00
Andrew Lamb 66af2bdd88
refactor: Split up `delete_three_delete_three_chunks.sql` test case (#5197)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-22 20:57:31 +00:00
Carol (Nichols || Goulding) f4d0f13689
feat: split large compactions (#5195)
* feat: Split large compactions into multiple compacted files

Connects to #5121

* refactor: Extract update catalog function and error type

* refactor: Share physical plan to object store streaming

And only differ in the logical plan building based on split times in
different compaction cases.

* fix: Test for a split time equal to the max time and don't split then
2022-07-22 20:35:31 +00:00
Nga Tran 69640c0ba5
feat: Different branch to hook up new compaction algorithm (#5194)
* chore: cherry pick the first 3 commits of branch cn/connect-new-compaction

* fix: modify the test to work correctly with compactor running

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-22 19:29:47 +00:00
Andrew Lamb 465a69c41d
chore: Update datafusion (#5193)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-22 18:07:38 +00:00
Carol (Nichols || Goulding) 94343b1f27
fix: compute_split_time returns one value when min_time = max_time (#5192)
* test: Document the behavior of compute_split_time when min time = max time

* fix: compute_split_time returns one value when min_time = max_time

Co-authored-by: NGA-TRAN <nga-tran@live.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-22 17:29:50 +00:00
Nga Tran bbe07fcc79
feat: metrics for selection partition candidates for compaction (#5190)
* feat: metrics for selection partition candidates for compaction

* chore: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* refactor: remove unused metric labels

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-22 15:25:53 +00:00