Commit Graph

921 Commits (105e3542991aef5f1654fe260f9fa53a32f622f2)

Author SHA1 Message Date
Andrew Lamb c100737a81 chore: Do not send dictionary encoded data to clients 2023-01-26 06:35:15 -05:00
Nga Tran b8a80869d4
feat: introduce a new way of max_sequence_number for ingester, compactor and querier (#6692)
* feat: introduce a new way of max_sequence_number for ingester, compactor and querier

* chore: cleanup

* feat: new column max_l0_created_at to order files for deduplication

* chore: cleanup

* chore: debug info for chnaging cpu.parquet

* fix: update test parquet file

Co-authored-by: Marco Neumann <marco@crepererum.net>
2023-01-26 10:52:47 +00:00
Marco Neumann ed694d3be4
feat: introduce scratchpad store for compactor (#6706)
* feat: introduce scratchpad store for compactor

Use an intermediate in-memory store (can be a disk later if we want) to
stage all inputs and outputs of the compaction. The reasons are:

- **fewer IO ops:** DataFusion's streaming IO requires slightly more
  IO requests (at least 2 per file) due to the way it is optimized to
  read as little as possible. It first reads the metadata and then
  decides which content to fetch. In the compaction case this is (esp.
  w/o delete predicates) EVERYTHING. So in contrast to the querier,
  there is no advantage of this approach. In contrary this easily adds
  100ms latency to every single input file.
- **less traffic:** For divide&conquer partitions (i.e. when we need to
  run multiple compaction steps to deal with them) it is kinda pointless
  to upload an intermediate result just to download it again. The
  scratchpad avoids that.
- **higher throughput:** We want to limit the number of concurrent
  DataFusion jobs because we don't wanna blow up the whole process by
  having too much in-flight arrow data at the same time. However while
  we perform the actual computation, we were waiting for object store
  IO. This was limiting our throughput substantially.
- **shadow mode:** De-coupling the stores in this way makes it easier to
  implement #6645.

Note that we assume here that the input parquet files are WAY SMALLER
than the uncompressed Arrow data during compaction itself.

Closes #6650.

* fix: panic on shutdown

* refactor: remove shadow scratchpad (for now)

* refactor: make scratchpad safe to use
2023-01-26 10:03:08 +00:00
Andrew Lamb 7853a19953
feat: JDBC integration tests with FlightSQL (#6693)
* feat: basic JDBC integration test

* fix: do not run test without env set

* docs: add maven link

* refactor: clean up java with switch statement
2023-01-25 22:21:18 +00:00
Carol (Nichols || Goulding) 57b5b639d6
test: Port all field columns query_tests to end-to-end tests (#6707)
* test: Port a test that's not actually supported through the full gRPC API

* test: Port remaining field column/measurement fields tests

* test: Remove unsupported measurement predicate and clarify purposes of tests

Andrew confirmed that the only way to invoke a Measurement Fields
request is with a measurement/table name specified: <0249b5018e/generated_types/protos/influxdata/platform/storage/service.proto (L43)>

so testing with a `_measurement` predicate is not valid.

I thought this test would become redundant with some other tests, but
they're actually still different enough; I took this opportunity to
better highlight the differences in the test names.

* refactor: Move all measurement fields tests to their own file

* test: Remove field columns tests that are now covered in end-to-end measurement fields tests
2023-01-25 19:49:29 +00:00
Carol (Nichols || Goulding) 4658510102
fix: For Ingester2, persist a particular namespace on demand and share MiniClusters
This should hopefully help CI from running out of Postgres
connections 😬

The old architecture will still need to be non-shared and persist
everything.
2023-01-25 10:36:56 -05:00
Carol (Nichols || Goulding) f310e01b1a
test: Start of porting InfluxRpc query_tests
Make a new trait, `InfluxRpcTest`, that types can implement to define
how to run a test on a specific Storage gRPC API. `InfluxRpcTest` takes
care of iterating through the two architectures, running the setups, and
creating the custom test step.

Implementers of the trait can define aspects of the tests that differ
per run, to make the parameters of the test clearer and highlight what
different tests are testing.
2023-01-25 10:27:42 -05:00
Andrew Lamb 0c55a0f257
feat: Implement basic prepared statement support in IOx (#6667)
* feat: allow override of flightsql namespace

* feat: Implement DoAction endpoint

* refactor: Remove try_unpack

* fix: remove unused code / more clone
2023-01-25 12:00:43 +00:00
Andrew Lamb 6caf31acf3
chore: Move garbage collection configuration into clap_blocks (#6678)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-25 11:31:48 +00:00
Luke Bond e3fc873b2e
feat: enable object store metrics on ingester2 (#6672)
Signed-off-by: Luke Bond <luke.n.bond@gmail.com>

Signed-off-by: Luke Bond <luke.n.bond@gmail.com>
2023-01-24 01:59:58 +00:00
Andrew Lamb 1b882e0062
fix: `error arrow/ipc: could not read message schema: EOF` (#6668)
* chore: Test for schema from query

* fix: Send schema even for no RecordBatches

* fix: docs
2023-01-23 22:23:34 +00:00
Nga Tran 411b3db928
fix: Get shard id from a constant (topic, shard_index) to avoid error of shard_id FK violation (#6658)
* fix: ake shard_id FK always 1

* fix: use const shard_index to read its ID

* refactor: read shard_id during compactor initiation
2023-01-22 16:49:06 +00:00
Carol (Nichols || Goulding) 6afd782b3f
fix: Move query_tests2 into influxdb_iox/tests so that the code rebuilds 2023-01-19 16:44:31 -05:00
Andrew Lamb 65c020c9f2
refactor: remove iox_arrow_flight use in `influxdb_iox_client ` and `querier` (#6624)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-19 18:48:23 +00:00
Marco Neumann 5e297b4667
refactor: lift up compactor2 CLI args, set mem limit to 8GB (#6631)
- use a single data structure for CLI args (not two)
- set mem limit default to 8GB (same as querier). We can always tune
  this later, but we should not run with "unlimited" to begin with.
2023-01-19 12:21:51 +00:00
kodiakhq[bot] 33168b97f0
Merge branch 'main' into cn/query-tests-grpc 2023-01-18 19:03:51 +00:00
Marco Neumann e72173d58d
feat: very basic compactor2 skeleton (#6614)
Sets up crate and wires up the main binary. No tests yet, no algorithm
framework, just the bare minimum.

Also I decided to not offer a gRPC server in `compactor2` at the moment
and hence did not implement any handle/delegate infrastructure. We add
this later if we need it. This also means compactor2 does NOT provide a
catalog service for now.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-18 16:36:40 +00:00
Carol (Nichols || Goulding) f3b5dcaab7
feat: Reimagining query_tests 2023-01-18 10:24:17 -05:00
dependabot[bot] 0a70e9f43f
chore(deps): Bump rustyline from 10.0.0 to 10.1.0
Bumps [rustyline](https://github.com/kkawakam/rustyline) from 10.0.0 to 10.1.0.
- [Release notes](https://github.com/kkawakam/rustyline/releases)
- [Changelog](https://github.com/kkawakam/rustyline/blob/master/History.md)
- [Commits](https://github.com/kkawakam/rustyline/compare/v10.0.0...v10.1.0)

---
updated-dependencies:
- dependency-name: rustyline
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-01-16 02:06:24 +00:00
Dom f7ff877582
Merge branch 'main' into cn/ingester-persist-tick 2023-01-13 12:31:45 +00:00
Carol (Nichols || Goulding) f56123bf30
test: Allow integration tests that should_panic to pass if TEST_INTEGRATION isn't set 2023-01-12 15:31:34 -05:00
Carol (Nichols || Goulding) 1c7ffb95df
test: Write a should_panic test that shows ingester is persisting when I thought it wouldn't 2023-01-12 14:55:28 -05:00
Carol (Nichols || Goulding) b989e0893f
test: Make persist-on-demand test a step test and check the number of parquet files 2023-01-12 11:40:46 -05:00
Carol (Nichols || Goulding) 3a2544a7eb
feat: Define a new gRPC service for ingester persist 2023-01-12 11:03:12 -05:00
Carol (Nichols || Goulding) e9b3efb33d
refactor: Extract a method for making requests to the ingester onto MiniCluster 2023-01-12 11:03:10 -05:00
Marco Neumann e2da573dcf
refactor: improve thread naming (#6579)
- name exec driver thread (instead of using the default that `thread::spawn`
  gives us)
- provide number to every worker thread (both for the dedicatd executor
  and for the main runtime)
- shorten thread names (current naming too long for most debug tools)
2023-01-12 14:22:49 +00:00
Dom 1e5b594863
Merge branch 'main' into dom/shutdown-persist 2023-01-12 10:15:34 +00:00
Nga Tran fa0893819c
fix: have warm compaction work with compactor2 (#6571)
* refactor: same function to select partition candidates

* fix: have warm compaction work with compactor2

* fix: format

* chore: cleanup
2023-01-12 02:32:39 +00:00
Carol (Nichols || Goulding) be7c312033
fix: Wait for a particular number of Parquet files, not just any change 2023-01-11 12:11:56 -05:00
Carol (Nichols || Goulding) 7e921e6a23
fix: Make recording num parquet files an explicit test step
To support a case where someone calls WriteLineProtocol twice in
a row to simulate two write requests. The test should be able to
record this state before the two write requests and not twice.
2023-01-11 11:51:56 -05:00
Carol (Nichols || Goulding) 6677ae5c61
test: Record number of Parquet files before a write
Fixes #6506.

Also has the pleasant side effect of making this code simpler and less
hacky-- it now checks the number of Parquet files for the whole
namespace, which is useful in cases where the line protocol writes to
several tables.
2023-01-11 11:51:55 -05:00
Dom Dwyer 303c9e4398
test: fix e2e test
This test was relying on a graceful shutdown of the ingester to drive a
WAL replay, restoring the buffered state at startup.

Now the shutdown causes the data to be persisted and not replayed, this
didn't work.
2023-01-11 17:15:04 +01:00
Stuart Carnie 66047f4372
feat: InfluxQL learns how to plan some InfluxQL queries (#6520)
* feat: InfluxQL learns how to plan some queries

Also added a means to test the planner and execution

* chore: Update module docs

* chore: Document the planner functions

* chore: Update end_to_end_cases crate

* chore: Clarify why `SLIMIT` and `SOFFSET` return `NotImplemented`

* chore: Address lint issues

* chore: Fix rustdoc link issue

* chore: Remove InfluxQL tests from query_tests crate

Will follow conventions established by @carols10cents when
new query_tests crate is merged.

* chore: `now` field

`now` is a DataFusion built-in scalar function

* chore: remove unused code

* chore: Add additional arithmetic expression tests

* chore: Establish pattern for identifying and tracking InfluxQL issues

* chore: Add tests for case sensitivity issues

* chore: group tests into modules and functions

This avoids mass rewriting of insta snapshots as new
tests are added to each function. When tests are added in the middle,
existing snapshots are renamed (-N+1, -N+2, etc) resulting in
having to review numerous additional snapshots.
2023-01-11 02:50:49 +00:00
Nga Tran 62c0f3dbdd
feat: have cold compaction work with Compactor2 (#6542)
* feat: cold

* chore: debug info

* feat: only compact qualified cold partition candidates

* fix: catalog test

* chore: cleanup

* chore: add new config flag for cold partition candidates

* chore: implement display for CompactionType and add tests for max num partitions

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-10 16:42:57 +00:00
dependabot[bot] b49cc2e35e
chore(deps): Bump tokio from 1.24.0 to 1.24.1 (#6545)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.24.0 to 1.24.1.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.24.0...tokio-1.24.1)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-10 09:48:44 +00:00
dependabot[bot] 09746c3aea
chore(deps): Bump assert_cmd from 2.0.7 to 2.0.8
Bumps [assert_cmd](https://github.com/assert-rs/assert_cmd) from 2.0.7 to 2.0.8.
- [Release notes](https://github.com/assert-rs/assert_cmd/releases)
- [Changelog](https://github.com/assert-rs/assert_cmd/blob/master/CHANGELOG.md)
- [Commits](https://github.com/assert-rs/assert_cmd/compare/v2.0.7...v2.0.8)

---
updated-dependencies:
- dependency-name: assert_cmd
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-01-10 01:06:02 +00:00
Paul Dix 828992c9c5
feat: Ingest replica skeleton (#6529)
* feat: Update replication.proto

* Remove the PartitionId in the replicate request as a single replicate request can have the data for many partitions.
* Add namespace_id and table_id to persist complete request to make data easier to lookup in buffer.

* feat: Initial ingest_replica skeleton

A bunch of copy pasta here from ingester2, but this takes out a ton of stuff that isn't used in replicas.
Also lays the groundwork for the simpler buffer structure to keep the data and a basic cache for catalog information that will be required.

* feat: update replication.proto GetPartitionBufferResponse

* chore: PR cleanup

* chore: PR cleanup
2023-01-09 16:53:49 +00:00
Carol (Nichols || Goulding) afaaedc758
test: Reproduce the test for querier caching 2023-01-04 10:15:35 -05:00
Carol (Nichols || Goulding) 9f7dede433
refactor: Reorganize tests to make deleting half the tests easier
When we switch to RPC write mode, the with_kafka module can be deleted
and the kafkaless_rpc_write module can be unwrapped.
2023-01-04 10:15:35 -05:00
Carol (Nichols || Goulding) c464487bb2
test: Ability to set up a querier2 that doesn't connect to any ingesters 2023-01-04 10:06:57 -05:00
Carol (Nichols || Goulding) 08ceb4ee48
test: Check catalog for new Parquet files to know when data is persisted 2023-01-04 10:06:57 -05:00
Carol (Nichols || Goulding) e49bee0c26
test: Make test ingester2 instances either persist very quickly or not at all 2023-01-04 10:06:57 -05:00
Carol (Nichols || Goulding) 96029654ab
test: Add a shared MiniCluster for version 2 services 2023-01-04 10:06:56 -05:00
Andrew Lamb dbe52f1ca1
chore: Upgrade datafusion (#6467)
* chore: Update datafusion

* fix: Update for new apis

* chore: Update expected plan

* fix: Update for new config construction

* chore: update clippy

* fix: Fix error codes

* fix: update another test

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-03 15:29:11 +00:00
dependabot[bot] 0aacef3c59
chore(deps): Bump once_cell from 1.16.0 to 1.17.0 (#6473)
* chore(deps): Bump once_cell from 1.16.0 to 1.17.0

Bumps [once_cell](https://github.com/matklad/once_cell) from 1.16.0 to 1.17.0.
- [Release notes](https://github.com/matklad/once_cell/releases)
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md)
- [Commits](https://github.com/matklad/once_cell/compare/v1.16.0...v1.17.0)

---
updated-dependencies:
- dependency-name: once_cell
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Change once_cell version specifier to major.minor for less churn

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
2023-01-02 17:07:15 +00:00
Nga Tran d27e137c39
chore: add debug info for the investigation (#6472) 2022-12-29 23:49:29 +00:00
Carol (Nichols || Goulding) 7c6ccdb6d7
fix: Use keys and values functions. Thanks clippy! 2022-12-21 14:32:35 -05:00
dependabot[bot] ed17069087
chore(deps): Bump num_cpus from 1.14.0 to 1.15.0 (#6451)
Bumps [num_cpus](https://github.com/seanmonstar/num_cpus) from 1.14.0 to 1.15.0.
- [Release notes](https://github.com/seanmonstar/num_cpus/releases)
- [Changelog](https://github.com/seanmonstar/num_cpus/blob/master/CHANGELOG.md)
- [Commits](https://github.com/seanmonstar/num_cpus/compare/v1.14.0...v1.15.0)

---
updated-dependencies:
- dependency-name: num_cpus
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-21 15:17:55 +00:00
Andrew Lamb e1059a9009
feat: FlightSQL Milestone 2 basic FlightSQL client and FlightSQL server implementation and plumbing (#6398)
* feat: Add basic Flight and FlightSQL client into IOx codebase

Basic flight end to end test

* fix: Apply suggestions from code review

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-20 17:34:00 +00:00
Andrew Lamb 9b22ede3f0
refactor: Make arrow flight client return `futures::Streams` (#6438)
* refactor: Make arrow flight client use futures::Streams

* refactor: concision
2022-12-19 17:09:26 +00:00