Commit Graph

8883 Commits (fc162b9dc2ada841d75626d4716b1dd75c07afcd)

Author SHA1 Message Date
Marko Mikulicic 99daa13897
test: Test dotenvy regression (#5461) 2022-08-24 09:39:55 +00:00
Marko Mikulicic 4beb721a9a
fix: Revert Bump dotenvy from 0.15.1 to 0.15.2 (#5450) (#5455)
This reverts commit 84acbd2fad.

Closes #5454

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-24 09:10:09 +00:00
Nga Tran 3220c6f88b
feat: add file_count_threshold for comapcting cold partitions (#5456)
* feat: file file_count_threshold for comapcting cold partitions to make it consistent with the hot case and help set up to avoid oom easier

* chore: remove unecessary commments
2022-08-23 20:12:21 +00:00
Dom e1cbc23a0f
Merge pull request #5452 from influxdata/dom/router-sequencer-types
refactor(router): use KafkaPartition in sequencer
2022-08-23 15:17:14 +01:00
Dom Dwyer 9b920f1cbb refactor(router): use KafkaPartition in sequencer
The Sequencer (which will be renamed shortly) is a type that represents
a single sequencer/shard/kafka partition in the router.

In order to minimise confusion with all the various IDs floating around,
we have a KafkaPartition - this commit changes the Sequencer to return
the Kafka partition index as a typed value, rather than a usize to help
eliminate any inconsistencies.

As a side effect of these conversion changes, I've tightened up the
casting to ensure we assert on any overflows - we juggle a lot of
numeric types!
2022-08-23 16:02:11 +02:00
kodiakhq[bot] 9bd2b9aa12
Merge pull request #5451 from influxdata/dom/router-sharding-api
feat: sharder API definition
2022-08-23 12:19:34 +00:00
kodiakhq[bot] 8edd886bb9
Merge branch 'main' into dom/router-sharding-api 2022-08-23 12:12:39 +00:00
dependabot[bot] 84acbd2fad
chore(deps): Bump dotenvy from 0.15.1 to 0.15.2 (#5450)
Bumps [dotenvy](https://github.com/allan2/dotenvy) from 0.15.1 to 0.15.2.
- [Release notes](https://github.com/allan2/dotenvy/releases)
- [Changelog](https://github.com/allan2/dotenvy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/allan2/dotenvy/commits/v0.15.2)

---
updated-dependencies:
- dependency-name: dotenvy
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-23 11:24:42 +00:00
Dom Dwyer 57bbe6b216 feat: sharder API definition
This commit adds a gRPC endpoint for callers to map (table, namespace)
tuples to Sequencer IDs, using the logic internal to the router.

Reference:
    https://github.com/influxdata/influxdb_iox/pull/5447#pullrequestreview-1080574538
2022-08-23 13:21:59 +02:00
Dom 8cc6724714
Merge pull request #5427 from influxdata/dom/kafka-remove-dml-merging
feat: simple RecordAggregator for write buffer
2022-08-22 15:24:36 +01:00
Dom 12abba5d37
Merge branch 'main' into dom/kafka-remove-dml-merging 2022-08-22 12:04:44 +01:00
Dom Dwyer 8b054c14a8 test: update batching tests for new aggregator
Previously aggregated writes were merged into a single Kafka Record -
this meant that all merged ops would be placed into the same Record, and
therefore receive the same sequence number once published to Kafka.

The new aggregator batches at the Record level, therefore aggregated
writes now get their own distinct sequence number. This commit updates
the batching tests to reflect this new sequence number assignment
behaviour.
2022-08-22 12:59:43 +02:00
Dom Dwyer 6d6fc9a08b test: reduce timestamp precision for comparisons
Reduce the precision of timestamps in tests before comparing the DML
metadata objects.

This allows tests to accept different timestamp precisions, such as when
ops pass "through" Kafka vs. files, etc.
2022-08-22 12:58:03 +02:00
Dom Dwyer 312def5acd refactor: assert writes partitioned
The previous aggregator impl would assert that writes had been
partitioned before aggregating them (or rather, that the DML write had a
partition key assigned).

This should be true for all writes passing through the write buffer,
irrespective of which aggregator is used, therefore this assert is moved
"up" into the write buffer itself.
2022-08-22 12:52:37 +02:00
Dom Dwyer a66d16576d refactor: use dyn TimeProvider in RecordAggregator
For ease of integration with the existing tests, use dyn TimeProvider in
the RecordAggregator.
2022-08-22 12:50:50 +02:00
Dom Dwyer 37727105b5 refactor: remove redundant timestamp conversions
Removes the existing, copy-pasted timestamp conversion code to remove
redundant conversions.
2022-08-22 11:06:36 +02:00
Marco Neumann 064606380b
feat: refresh policy for caches (#5431)
* feat: refresh policy for caches

For #5318 we want to have a policy that refreshes keys before they are
too old. I initially tried to fold both TTL and the refresh system into
a single policy but than decided that they will basically be two
policies in one with a harder-to-test interface. Semantically TTL and
refresh are also a bit different (but will usually be used together):

- **TTL:** Prevents that a users gets data that is too old. It is some kind
  of "soft correctness". In some sense this is related to the "remove
  if" policy where some part of the system knows for sure (or with
  reasonable likelyhood) that a cache entry is outdated. Note that TTL's
  primary job is NOT to clean up memory from old keys (even though it
  indirectly does that). There is no reason cached entries should be
  removed except for correctness (TTL and remove-if) or resource
  pressure -- and the latter is handled by the LRU policy.
- **Refresh:** While TTL is some kind of deadline, we often have good
  reason to refresh the key before we pull the plug, namely when an
  entry is used and a bit old (but not too old). The concrete mechanism
  to archive this is flexible. At the moment the policy is rather
  simple -- just start a refresh task if a key is old and we receive a
  GET request -- but can be extended in the future.

This also adds some integration tests for TTL+refresh. There will be
follow-up changes to test the interaction with LRU as well, althouh I am
pretty certain that there won't be any surprises due to the excessive
testing we have in place for the policy backend itself as well as all
the policies.

This change also does NOT integrate the refresh with the querier for the
sake of keeping the changeset "small" (i.e. it is already rather large).

* docs: improve

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-08-22 08:45:22 +00:00
Stuart Carnie 2d795bd604
chore: cargo update (#5439)
* pest 2.2.1 → 2.3.0
* serde 1.0.143 → 1.0.144
* serde-json 1.0.83 → 1.0.85

pest_meta and pest_generator 2.2.1 were yanked
2022-08-22 05:15:48 +00:00
Andrew Lamb 35f99fe940
fix: fix intermittent failures in `data::tests::persist` (#5437)
* fix: fix intermittent failures in data::tests::persist

* fix: tweak comments and message

* fix: space
2022-08-19 21:16:00 +00:00
dependabot[bot] ed38b01e91
chore(deps): Bump sqlparser from 0.20.0 to 0.21.0 (#5429)
Bumps [sqlparser](https://github.com/sqlparser-rs/sqlparser-rs) from 0.20.0 to 0.21.0.
- [Release notes](https://github.com/sqlparser-rs/sqlparser-rs/releases)
- [Changelog](https://github.com/sqlparser-rs/sqlparser-rs/blob/main/CHANGELOG.md)
- [Commits](https://github.com/sqlparser-rs/sqlparser-rs/compare/v0.20.0...v0.21.0)

---
updated-dependencies:
- dependency-name: sqlparser
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-19 10:41:38 +00:00
Marco Neumann 5ce0618b8f
chore: `cargo update` (#5432)
```
cpufeatures v0.2.2 -> v0.2.3
plotters v0.3.2 -> v0.3.3
plotters-svg v0.3.2 -> v0.3.3
```

`plotters v0.3.2` was yanked.
2022-08-19 09:56:23 +00:00
Stuart Carnie b4e5895d7a
feat: Add influxdb_influxql_parser crate (#5415)
* feat: Add crate; parse quoted identifiers

* chore: Run cargo hakari tasks

* chore: satisfy linter

* chore: Use `test_helpers::Result`

* feat: Add all InfluxQL keywords

* chore: Update influxdb_influxql_parser/src/lib.rs

Co-authored-by: Marco Neumann <marco@crepererum.net>

* chore: PR feedback

* chore: PR Feedback, remove Result<()>

* chore: Update Cargo.lock

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
2022-08-18 23:09:45 +00:00
Dom Dwyer 59c2d84d1e refactor: use RecordAggregator
Replaces the DmlAggregator with the simpler RecordAggregator.

Metrics gathered as part of #5323 shows there is practically no benefit
to the additional complexity of the DmlAggregator over the simpler
RecordAggregator impl.
2022-08-18 17:12:23 +02:00
Dom Dwyer 30e23f6e82 feat: simple RecordAggregator for write buffer
This commit adds a new write buffer aggregator used by rskafka to
increase the size of Kafka messages on the wire. The Kafka write buffer
impl is the only impl to perform aggregation.

This Aggregator impl maps IOx-specific DML operations to rskafka Records
with no additional processing - it can be thought of as an IOx-specific
adaptor over rskafka's RecordAggregator.

By delegating batching of Record instances to rskakfa's simple
RecordAggregator, we minimise code complexity / bug surface area / LoC.
2022-08-18 11:42:58 +01:00
Marco Neumann d75df2b610
chore: `cargo update` (#5426)
```
bumpalo v3.10.0 -> v3.11.0
either v1.7.0 -> v1.8.0
iana-time-zone v0.1.45 -> v0.1.46
rustix v0.35.8 -> v0.35.9
```

`rustix` is important because `0.35.8` was yanked.
2022-08-18 08:53:00 +00:00
kodiakhq[bot] 8eb3a79d7f
Merge pull request #5348 from influxdata/cn/upgrade-l0-metrics
feat: Add metrics on the size of files created by ingestion and used for compaction
2022-08-17 16:08:59 +00:00
kodiakhq[bot] 2b3ca54168
Merge branch 'main' into cn/upgrade-l0-metrics 2022-08-17 16:01:42 +00:00
Luke Bond f4443f0b3a
feat: import schema override (#5420)
* chore: struct for overrides of import schema conflicts

* chore: import schema override shouldn't support tags

* feat: import schema merge can take an override schema

* fix: schema override in test had superfluous tag

* chore: test for batch schema merge with override in import schema

* feat: import schema merge now takes override schema
2022-08-17 14:59:50 +00:00
Marco Neumann 3dca9c1b43
feat: async sleep with `TimeProvider` (#5417)
* feat: async sleep with `TimeProvider`

This is helpful to mock "ticked" loops or to control certain async time
periods.

Will be used to test the refresh policy developed in #5318.

* refactor: use a single `TimeProvider::sleep` impl

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-17 14:50:35 +00:00
Marco Neumann f34f99c5ed
refactor: port LRU cache backend to policy framework (#5406)
* refactor: port LRU cache backend to policy framework

Closes #5320.

* test: extend `test_oversized_entries`

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-17 14:43:24 +00:00
Andrew Lamb 7f0ae53d6f
chore: Update to (almost) released object_store 0.4.0 (#5419)
* chore: update object_store

* chore: update hakari config

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-08-17 13:44:48 +00:00
Andrew Lamb bd4b708055
chore: Update datafusion pin + other dependencies (#5418)
* chore: Update datafusion pin

* chore: Update other depdecies

* fix: Update for changes in API
2022-08-17 10:42:37 +00:00
dependabot[bot] 2e638fe19c
chore(deps): Bump libc from 0.2.131 to 0.2.132 (#5414)
Bumps [libc](https://github.com/rust-lang/libc) from 0.2.131 to 0.2.132.
- [Release notes](https://github.com/rust-lang/libc/releases)
- [Commits](https://github.com/rust-lang/libc/compare/0.2.131...0.2.132)

---
updated-dependencies:
- dependency-name: libc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-17 08:40:57 +00:00
dependabot[bot] 78665d3092
chore(deps): Bump once_cell from 1.13.0 to 1.13.1 (#5413)
Bumps [once_cell](https://github.com/matklad/once_cell) from 1.13.0 to 1.13.1.
- [Release notes](https://github.com/matklad/once_cell/releases)
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md)
- [Commits](https://github.com/matklad/once_cell/compare/v1.13.0...v1.13.1)

---
updated-dependencies:
- dependency-name: once_cell
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-17 08:31:46 +00:00
Marco Neumann 7e1ad40522
refactor: lift `Send` requirement from `ChangeRequest` (#5404)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-17 08:24:44 +00:00
pierwill 51141f2c78
docs: Edit catalog API docs (#5409)
* docs: Edit Catalog docs string

* docs: Edit top-level catalog module doc

* docs: Mark `sealed` trait w/ `doc(hidden)`

* docs: Edit catalog transaction docs

* docs: Edit Catolog trait docs

* docs: Edit `RepoCollection` docs

Clarify concept of repository.

Add links.

* docs: Add link to `Transaction`

Co-authored-by: pierwill <pierwill@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-16 21:36:29 +00:00
kodiakhq[bot] a6f28d0709
Merge pull request #5411 from influxdata/dom/table-scoped-errs
feat: table name in schema validation errors
2022-08-16 17:07:31 +00:00
Dom Dwyer 180ff9f681 feat: table name in schema validation errors
Scopes all schema validation errors to include the table name in the
error output.
2022-08-16 19:00:44 +02:00
Marco Neumann 7e97620b37
chore: `cargo-update` (#5405)
```
anyhow v1.0.60 -> v1.0.61
bytemuck v1.11.0 -> v1.12.0
hdrhistogram v7.5.0 -> v7.5.1
iana-time-zone v0.1.44 -> v0.1.45
memmap2 v0.5.5 -> v0.5.7
os_str_bytes v6.2.0 -> v6.3.0
```

`iana-time-zone` is important because `0.1.44` was yanked.
2022-08-16 12:51:46 +00:00
Luke Bond 10fee5535a
feat: import schema updates iox catalog (#5385)
* feat: import schema updates iox catalog

- renamed import/schema module to aggregate_tsm_schema to not conflic
  with schema crate
- fetch schema from iox catalog, and validate/merge/create as needed

chore: add catalog dsn config to import schema command
chore: import schema command connects to catalog
chore: import schema merge validation errors return non-zero code
chore: simplified and tidies import update catalog code

chore: tests and refactoring of import schema catalog update

* chore: require retention on ns creation in import

* chore: fixed bad test in import schema validation

* chore: friendlier errors & more tests in import schema catalog update
2022-08-16 11:05:27 +00:00
Andrew Lamb b60e0beee2
fix: read_buffer benchmarks array type creation (#5401)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-16 10:28:27 +00:00
Marco Neumann 49ab568ca8
refactor: convert `remove_if` feature to policy framework (#5398)
* refactor: allow `ChangeRequest` to carry a lifetime

Let's not restrict our change functions to `'static` because this would
require us to clone loads of data to achieve predicate-based
`remove_if`.

* refactor: convert `remove_if` feature to policy framework

Decided to drop the "shared" functionality. We only use the small
`remove_if` bit which is way easier to reason about.

For #5320.

* refactor: address review comments

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-16 08:23:27 +00:00
Andrew Lamb 0a7e6919f2
chore: do not build benchmark binary for lib targets (#5400)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-16 08:11:23 +00:00
dependabot[bot] 724759862f
chore(deps): Bump pin-project from 1.0.11 to 1.0.12 (#5402)
Bumps [pin-project](https://github.com/taiki-e/pin-project) from 1.0.11 to 1.0.12.
- [Release notes](https://github.com/taiki-e/pin-project/releases)
- [Changelog](https://github.com/taiki-e/pin-project/blob/main/CHANGELOG.md)
- [Commits](https://github.com/taiki-e/pin-project/compare/v1.0.11...v1.0.12)

---
updated-dependencies:
- dependency-name: pin-project
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-16 08:04:25 +00:00
Marco Neumann 0ccefa0d0c
refactor: port TTL backend to policy framework (#5396)
* refactor: port TTL backend to policy framework

Note that this is "just" a port, it does NOT change how TTL works. This
will be done in #5318.

Helps with #5320.

* fix: ensure inner backend is empty

* test: add some smoke test
2022-08-15 16:48:16 +00:00
Carol (Nichols || Goulding) ef716a5b90
fix: Remove compaction level attribute from the compaction_input_file_bytes metric 2022-08-15 10:50:04 -04:00
Carol (Nichols || Goulding) a9ed32df89
fix: Remove compaction_counter as it's now redundant with the compaction_input_file_bytes histogram 2022-08-15 10:23:29 -04:00
Carol (Nichols || Goulding) af95ce7ca6
feat: Add a histogram tracking sizes of files used as inputs to compaction
Fixes #5348.
2022-08-15 10:13:54 -04:00
Carol (Nichols || Goulding) cd6c809fe0
fix: Change metric tracking sizes of files selected for compaction to a histogram
Connects to #5348.
2022-08-15 10:13:54 -04:00
Carol (Nichols || Goulding) ed44817ed1
feat: Add a histogram of ingested (new L0) Parquet file sizes
Connects to #5348.
2022-08-15 10:13:54 -04:00